MiniLM · Apache 2.0
Ultra-lightweight embedding model at 23M params. Fastest option for local embeddings. Lower quality than nomic-embed but practically free in VRAM. Ideal when running alongside large coding models and every MB of VRAM matters. The tiny model for builders who need RAG without the VRAM cost.
all-MiniLM-L6-v2 (23M) requires 256 MB VRAM at recommended quality (FP16). On NVIDIA GeForce RTX 4070 Ti 12GB, expect approximately 12000 tok/s at FP16. For the best experience, Starter AI Desktop ($582) is recommended.
— OwnRig methodology, data updated 2026-03-01
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| full | FP16 | 256 MB | 0.09 GB |
Performance data for all-MiniLM-L6-v2 across different hardware.
| Device | Quantization | Speed | Rating | Fits in VRAM |
|---|---|---|---|---|
| NVIDIA GeForce RTX 3060 12GB | FP16 | — | Excellent | ✓ |
| NVIDIA GeForce RTX 4090 | FP16 | — | Excellent | ✓ |
| NVIDIA GeForce RTX 4060 8GB | FP16 | 8500 tok/s | Excellent | ✓ |
| NVIDIA GeForce RTX 4070 Ti 12GB | FP16 | 12000 tok/s | Excellent | ✓ |
| NVIDIA GeForce RTX 3080 10GB | FP16 | 5000 tok/s | Excellent | ✓ |
| Apple M3 Pro (18GB Unified) | FP16 | 1200 tok/s | Good | ✓ |
all-MiniLM-L6-v2 is commonly used with AnythingLLM, Open WebUI.
Complete PC builds that can run all-MiniLM-L6-v2.
Data confidence: verified. Last updated: 2026-03-01. Source