Skip to content

Local LLM Usage

LARGESTACK supports local LLMs through two patterns.

Pattern A — Native Ollama provider, chat-only

Use this when you need local/private chat or RAG without tool calls.

ollama serve
ollama pull llama3.2
export LARGESTACK_OLLAMA_BASE_URL=http://localhost:11434
from largestack import Agent

agent = Agent(
    name="local-chat",
    llm="ollama/llama3.2",
    instructions="Reply concisely.",
    cost_budget=0.0,
)

The native OllamaProvider is chat-only in this release.

Pattern B — LiteLLM/OpenAI-compatible local endpoint

Use this when you need a unified gateway across cloud and local models. Tool calling depends on the local model and proxy support.

pip install largestack[litellm]
# Configure LiteLLM/Ollama according to your proxy setup.
from largestack import Agent

agent = Agent(
    name="local-router",
    llm="litellm/ollama/llama3.1",
    instructions="Use the local model.",
)

Production rule

Before relying on local tool automation in production, run an end-to-end test with your exact model, proxy, schema, and tool-calling configuration — behavior varies between local models and setups.