Why Responses Might Be Slow
Model Size
Larger models produce better results but are slower. Try a smaller model for faster responses:
- Fast: llama3.2:1b, phi3:mini
- Balanced: llama3.2:3b, mistral
- Quality: llama3.1:8b, codellama:13b
Hardware Limitations
Running on CPU is much slower than GPU. If you have an NVIDIA GPU, make sure Ollama is using it.
Context Size
Larger context windows require more memory and processing. Try reducing the context size in settings if you don't need long conversation history.
Optimization Tips
- Close other memory-intensive applications
- Use a smaller model for quick questions
- Start new conversations periodically to reset context
- Consider upgrading RAM if consistently running out