studio.

AI is not one thing

The phrase "AI" flattens an enormous range of systems into a single word. A model with 8 billion parameters running on a laptop and a model with 400 billion parameters running in a data centre are both called AI, but they differ in capability the way a bicycle differs from a freight train. Before you can make good decisions about which tools to use, you need a vocabulary for the dimensions along which models vary.

Parameters and scale

A model's parameter count, measured in billions, is a rough proxy for its capacity. More parameters generally means the model can handle more nuance, longer reasoning chains, and more complex instructions. But more parameters also means more compute, more cost, and more energy. A 7-billion-parameter model handles straightforward classification and extraction well. A 70-billion-parameter model handles multi-step reasoning and subtle stylistic requests. The largest models, above 200 billion parameters, manage complex creative and analytical tasks, though they are expensive to run and often slower. The right model is not the biggest one. It is the smallest one that reliably does what you need.

Open and closed models

Some models publish their weights, the numerical values that define the model's behaviour, so anyone can download, inspect, modify, and run them. These are called open-weight models. Others keep their weights private and only offer access through an API. Both approaches have trade-offs. Open-weight models give you full control: you can run them on your own hardware, audit their behaviour, fine-tune them on your data, and ensure that no information leaves your environment. Closed models are typically more capable at the frontier and easier to start with, since you send a request to an API and get a response, but you have less control over where your data goes and no ability to inspect or modify the model itself.

Context window size

The context window is the total amount of text a model can process in a single interaction. It is measured in tokens, where each token represents roughly three-quarters of a word. Some models have context windows of 8,000 tokens (about 12 pages). Others support 128,000 tokens (a short book) or even a million. A larger context window lets you include more reference material, longer conversation histories, or bigger documents in a single request. But larger windows are slower and more expensive. Models also do not attend equally to all parts of a long context; information in the middle of a very long input is often less well-processed than information at the beginning or end.

Modalities

Early language models processed only text. Current models increasingly handle multiple modalities: text, images, audio, video, and code. A multimodal model can look at a photograph and describe it, listen to a recording and transcribe it, or read a chart and extract the data. This matters because real work rarely involves only text. If your workflow involves scanned documents, diagrams, audio recordings, or screenshots, you need a model that can process those inputs directly rather than requiring you to convert everything to text first.

Reasoning and chain-of-thought

Some models have been specifically trained to "think step by step", breaking a problem into intermediate steps and reasoning through them before producing a final answer. These reasoning models are substantially better at tasks that require logic, mathematics, planning, or multi-step analysis. They take longer to respond because they generate this intermediate reasoning, and they cost more because they produce more tokens. For simple tasks like reformatting text, translating a sentence, or classifying a document, a reasoning model is overkill. For complex tasks like analysing an argument, debugging a process, or planning a project, it can be the difference between a useful answer and a plausible-sounding wrong one.

How to evaluate a model for your needs

Rather than asking "which model is best", ask: what is the task, how complex is it, how sensitive is the data, what is the budget, and how fast does the response need to be? A small open-weight model running locally is ideal for processing sensitive documents quickly and cheaply. A large closed model via API is ideal for complex one-off analysis where quality matters more than cost. A mid-range reasoning model is ideal for tasks that require structured thinking but do not involve sensitive data. There is no single answer. There is a portfolio of tools, and the skill is matching the right tool to the right task.

Examples

Choosing scale for the task

You need to classify 10,000 customer support emails into five categories. A small, fast model (7 to 8 billion parameters) running locally handles this in minutes at negligible cost, with 95% accuracy. Sending the same task to a large frontier model via API would take hours, cost significantly more, and produce only marginally better classification, perhaps 97%. The extra 2% is not worth the 50x cost increase for this task.

When reasoning matters

You ask a standard model to identify logical inconsistencies in a policy document. It rephrases the document fluently but misses a contradiction between section 3 and section 7. A reasoning model, given the same prompt, produces intermediate steps: "Section 3 states X. Section 7 states Y. X and Y cannot both be true because..." The reasoning overhead is worth it when the task requires genuine analysis, not just fluent text production.