1) Export the model’s current Modelfile
ollama show --modelfile llama3.2 > Modelfile
(Replace llama3.2 with your model name.) (Ollama 文档)
That output typically includes a comment like:
“To build a new Modelfile based on this one, replace the FROM line with: FROM llama3.2:latest” (Ollama 文档)
2) Edit the FROM line
Open Modelfile and change the top line to the exact tag you want, e.g.
FROM llama3.2:latest
(or whatever specific tag you’re using).
3) Add your context size override (and anything else)
Add:
PARAMETER num_ctx 8192
PARAMETER is the official way to set runtime defaults like context length inside a Modelfile. (Ollama 文档)
4) Create a new derived model name
ollama create llama3.2-8k -f ./Modelfile
Now you have a separate model entry with that default.
Example Modelfile (minimal)
FROM llama3.2:latest
PARAMETER num_ctx 8192
A couple practical tips
- Don’t just crank num_ctx blindly: higher context increases memory usage and can slow inference; also you can’t exceed what the base model supports in practice.
- If you want to verify what got baked in, you can re-check:
ollama show --modelfile llama3.2-8k
If you paste the output of ollama show --modelfile <your-model> (just the first ~30 lines), I can tell you exactly what to keep vs. what’s safe to delete, and suggest a sensible num_ctx for your hardware.