Ollama
Ollama is popular choice for running LLMs locally.
Prerequisites
- You have already Ollama installed on your machine.
- Ollama is up and running on your machine.
Caution
Currently we have only experimented with Ollama on Apple Silicon with bunch of 7b - 12b parameter models. gemma3:12b
so far is the only model that can produce barely acceptable results, and it's way behind in terms of quality and latency compared to frontier models.
That being said we encourage you to give it a try and report any issues, especially for those who has 128GB+ vRAM to run bigger models.
Usage
Assuming you have gemma3:12b
model pulled via ollama pull gemma3:12b
, you can run it with:
opsmate run --context cli-lite -m gemma3:12b "how many cores on the machine"
We strongly recommend you to use cli-lite
context for running 7b - 12b parameter small models. You can find the prompt of cli-lite
context in cli_lite.py.
To find all the ollama models you can run:
opsmate list-models --provider ollama
Behind the scene it fetches the list of models from http://localhost:11434/v1/models
.
If you have a remote ollama server, you can point to the remote server with:
# by default it's http://localhost:11434/v1
export OLLAMA_BASE_URL=http://$YOUR_REMOTE_SERVER:11434/v1
Further Exploration
The cli-lite
context is far from optimal. To test your own prompt, you can create your own context in side ~/.opsmate/contexts
directory. The contexts in the directory will be loaded automatically by opsmate on startup.