GrepAI Embeddings with Ollama
This skill covers using Ollama as the embedding provider for GrepAI, enabling 100% private, local code search.
When to Use This Skill
- Setting up private, local embeddings
- Choosing the right Ollama model
- Optimizing Ollama performance
- Troubleshooting Ollama connection issues
Why Ollama?
| Advantage |
Description |
| ๐ Privacy |
Code never leaves your machine |
| ๐ฐ Free |
No API costs or usage limits |
| โก Speed |
No network latency |
| ๐ Offline |
Works without internet |
| ๐ง Control |
Choose your model |
Prerequisites
- Ollama installed and running
- An embedding model downloaded
brew install ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama serve
ollama pull nomic-embed-text
Configuration
Basic Configuration
embedder:
provider: ollama
model: nomic-embed-text
endpoint: http://localhost:11434
With Custom Endpoint
embedder:
provider: ollama
model: nomic-embed-text
endpoint: http://192.168.1.100:11434
With Explicit Dimensions
embedder:
provider: ollama
model: nomic-embed-text
endpoint: http://localhost:11434
dimensions: 768
Available Models
Recommended: nomic-embed-text
ollama pull nomic-embed-text
| Property |
Value |
| Dimensions |
768 |
| Size |
~274 MB |
| Speed |
Fast |
| Quality |
Excellent for code |
| Language |
English-optimized |
Configuration:
embedder:
provider: ollama
model: nomic-embed-text
Multilingual: nomic-embed-text-v2-moe
ollama pull nomic-embed-text-v2-moe
| Property |
Value |
| Dimensions |
768 |
| Size |
~500 MB |
| Speed |
Medium |
| Quality |
Excellent |
| Language |
Multilingual |
Best for codebases with non-English comments/documentation.
Configuration:
embedder:
provider: ollama
model: nomic-embed-text-v2-moe
High Quality: bge-m3
ollama pull bge-m3
| Property |
Value |
| Dimensions |
1024 |
| Size |
~1.2 GB |
| Speed |
Slower |
| Quality |
Very high |
| Language |
Multilingual |
Best for large, complex codebases where accuracy is critical.
Configuration:
embedder:
provider: ollama
model: bge-m3
dimensions: 1024
Maximum Quality: mxbai-embed-large
ollama pull mxbai-embed-large
| Property |
Value |
| Dimensions |
1024 |
| Size |
~670 MB |
| Speed |
Medium |
| Quality |
Highest |
| Language |
English |
Configuration:
embedder:
provider: ollama
model: mxbai-embed-large
dimensions: 1024
Model Comparison
| Model |
Dims |
Size |
Speed |
Quality |
Use Case |
nomic-embed-text |
768 |
274MB |
โกโกโก |
โญโญโญ |
General use |
nomic-embed-text-v2-moe |
768 |
500MB |
โกโก |
โญโญโญโญ |
Multilingual |
bge-m3 |
1024 |
1.2GB |
โก |
โญโญโญโญโญ |
Large codebases |
mxbai-embed-large |
1024 |
670MB |
โกโก |
โญโญโญโญโญ |
Maximum accuracy |
Performance Optimization
Memory Management
Models load into RAM. Ensure sufficient memory:
| Model |
RAM Required |
nomic-embed-text |
~500 MB |
nomic-embed-text-v2-moe |
~800 MB |
bge-m3 |
~1.5 GB |
mxbai-embed-large |
~1 GB |
GPU Acceleration
Ollama automatically uses:
- macOS: Metal (Apple Silicon)
- Linux/Windows: CUDA (NVIDIA GPUs)
Check GPU usage:
ollama ps
Keeping Model Loaded
By default, Ollama unloads models after 5 minutes of inactivity. Keep loaded:
curl http://localhost:11434/api/generate -d '{
"model": "nomic-embed-text",
"keep_alive": -1
}'
Verifying Connection
Check Ollama is Running
curl http://localhost:11434/api/tags
List Available Models
ollama list
Test Embedding
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "function authenticate(user, password)"
}'
Running Ollama as a Service
macOS (launchd)
Ollama app runs automatically on login.
Linux (systemd)
sudo systemctl enable ollama
sudo systemctl start ollama
sudo systemctl status ollama
Manual Background
nohup ollama serve > /dev/null 2>&1 &
Remote Ollama Server
Run Ollama on a powerful server and connect remotely:
On the Server
OLLAMA_HOST=0.0.0.0 ollama serve
On the Client
embedder:
provider: ollama
model: nomic-embed-text
endpoint: http://server-ip:11434
Common Issues
โ Problem: Connection refused
โ
Solution:
ollama serve
โ Problem: Model not found
โ
Solution:
ollama pull nomic-embed-text
โ Problem: Slow embedding generation
โ
Solutions:
- Use a smaller model (
nomic-embed-text)
- Ensure GPU is being used (
ollama ps)
- Close memory-intensive applications
- Consider a remote server with better hardware
โ Problem: Out of memory
โ
Solutions:
- Use a smaller model
- Close other applications
- Upgrade RAM
- Use remote Ollama server
โ Problem: Embeddings differ after model update
โ
Solution: Re-index after model updates:
rm .grepai/index.gob
grepai watch
Best Practices
- Start with
nomic-embed-text: Best balance of speed/quality
- Keep Ollama running: Background service recommended
- Match dimensions: Don't mix models with different dimensions
- Re-index on model change: Delete index and re-run watch
- Monitor memory: Embedding models use significant RAM
Output Format
Successful Ollama configuration:
โ
Ollama Embedding Provider Configured
Provider: Ollama
Model: nomic-embed-text
Endpoint: http://localhost:11434
Dimensions: 768 (auto-detected)
Status: Connected
Model Info:
- Size: 274 MB
- Loaded: Yes
- GPU: Apple Metal