studio.

Three places to run a model

There are fundamentally three places where a language model can run. First: on a public cloud, operated by the model provider, where you send your data to their servers and they send back a response. Second: on infrastructure managed by your organisation, whether your own servers or a private cloud where you control the hardware and the network. Third: on a local machine such as your laptop, your workstation, or a dedicated device in your office. Each location has different implications for speed, cost, capability, privacy, and control. Most organisations will use a combination of all three.

Public cloud: capability and convenience

Public cloud APIs give you access to the most capable models without managing any infrastructure. You make a request, you get a response, you pay per token. This is the easiest way to start. But every request sends your data to an external server. The provider's terms of service determine what happens to that data: whether it is logged, whether it is used for training, how long it is retained. For non-sensitive work, this trade-off is usually acceptable. For anything involving personal data, client information, trade secrets, or legally privileged content, you need to read those terms very carefully, or choose a different location.

Organisational infrastructure: control and compliance

Running a model on your organisation's own servers means your data never leaves your network. You control the hardware, the software, the access policies, and the logs. This is the right choice when data sensitivity requires it, and in many industries it is the only choice that satisfies regulatory requirements. The trade-off is capability and cost. Running large models requires expensive hardware with high-end GPUs and large amounts of memory. The most capable models may not be available in open-weight versions that you can self-host. And you need someone to maintain the infrastructure. But for organisations that already have server infrastructure, adding a language model is an incremental step, not a transformation.

Local machines: privacy at the edge

The smallest models can run on a modern laptop with no internet connection required and no data transmitted anywhere. This is the ultimate privacy guarantee: the model runs in memory on your machine, processes your text, and the data never exists anywhere else. The trade-off is capability. Models that run on a laptop are significantly less capable than the largest cloud-hosted models. They handle focused, well-defined tasks (classification, extraction, simple generation) much better than open-ended complex reasoning. For sensitive work that requires only moderate model capability, local execution is an underused option.

The portfolio approach

The most effective approach is not to choose one location but to use all three strategically. Route sensitive data to local or self-hosted models. Route complex analytical tasks to the most capable cloud models. Route high-volume routine tasks to the fastest and cheapest option that meets the quality bar. The practical skill is building an environment where different models at different locations handle different tasks, routed by data sensitivity and task requirements.

Examples

Routing by sensitivity

Your organisation processes two types of documents: public marketing materials and internal HR records. Marketing text goes to a cloud API because it is not sensitive and you want the best possible quality for public-facing copy. HR records are processed by a model running on your own server, so that employee performance reviews and salary data never leave your network. Same workflow, different routing, based on data classification.