Why Responses Might Be Slow

Model Size

Larger models produce better results but are slower. Try a smaller model for faster responses:

  • Fast: llama3.2:1b, phi3:mini
  • Balanced: llama3.2:3b, mistral
  • Quality: llama3.1:8b, codellama:13b

Hardware Limitations

Running on CPU is much slower than GPU. If you have an NVIDIA GPU, make sure Ollama is using it.

Context Size

Larger context windows require more memory and processing. Try reducing the context size in settings if you don't need long conversation history.

Optimization Tips

  • Close other memory-intensive applications
  • Use a smaller model for quick questions
  • Start new conversations periodically to reset context
  • Consider upgrading RAM if consistently running out