Phi · MIT
Punches above its weight — a 3.8B model that rivals many 7B models on reasoning benchmarks. MIT license. Excellent for resource-constrained setups or as a fast secondary model.
Phi-3 Mini 3.8B Instruct (3.82B) requires 3 GB VRAM at recommended quality (Q5_K_M). At efficient quality (Q4_K_M), it fits in 2.6 GB VRAM, making it compatible with the NVIDIA GeForce RTX 3060 12GB. On NVIDIA GeForce RTX 4090, expect approximately 130 tok/s at Q8_0. For the best experience, Starter AI Desktop ($582) is recommended.
— OwnRig methodology, data updated 2026-03-01
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| full | Q8_0 | 4.5 GB | 3.8 GB |
| recommended | Q5_K_M | 3 GB | 2.3 GB |
| efficient | Q4_K_M | 2.6 GB | 1.9 GB |
KV cache VRAM at Q5_K_M quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 102 MB | 3.1 GB |
| 4K | 102 MB | 3.1 GB |
| 8K | 307 MB | 3.3 GB |
| 16K | 512 MB | 3.5 GB |
| 32K | 1 GB | 4 GB |
| 64K | 2 GB | 5 GB |
Performance data for Phi-3 Mini 3.8B Instruct across different hardware.
| Device | Quantization | Speed | Rating | Fits in VRAM |
|---|---|---|---|---|
| NVIDIA GeForce RTX 3060 12GB | Q8_0 | 60 tok/s | Excellent | ✓ |
| NVIDIA GeForce RTX 4090 | Q8_0 | 130 tok/s | Excellent | ✓ |
| NVIDIA GeForce RTX 4070 Super | Q8_0 | 95 tok/s | Excellent | ✓ |
| NVIDIA GeForce RTX 4060 8GB | Q5_K_M | 52 tok/s | Excellent | ✓ |
| NVIDIA GeForce RTX 4070 Ti 12GB | Q8_0 | 78 tok/s | Excellent | ✓ |
| NVIDIA GeForce RTX 3080 10GB | Q8_0 | 130 tok/s | Excellent | ✓ |
| Apple M3 Pro (18GB Unified) | Q8_0 | 32 tok/s | Good | ✓ |
Phi-3 Mini 3.8B Instruct is commonly used with Continue, LM Studio. For an AI coding workflow, pair it with an embedding model like nomic-embed-text for local RAG.
Complete PC builds that can run Phi-3 Mini 3.8B Instruct.
Data confidence: verified. Last updated: 2026-03-01. Source