DeepSeek · DeepSeek License
Massive MoE model rivaling GPT-4 class. Only ~37B parameters active per token despite 671B total. Requires multi-GPU or very large unified memory (128GB+ Apple Silicon at Q2/Q3). Not for casual home use — included for completeness and to show what the high end looks like.
DeepSeek V3 (671B) requires 360 GB VRAM at recommended quality (FP16). On Apple M4 Max (128GB Unified), expect approximately 3 tok/s at Q2_K.
— OwnRig methodology, data updated 2026-03-15
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| full | FP16 | 360 GB | 335 GB |
| efficient | Q4_K_M | 180 GB | 168 GB |
| compressed | Q3_K_M | 145 GB | 135 GB |
| compressed | Q2_K | 115 GB | 108 GB |
KV cache VRAM at FP16 quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 1 GB | 361 GBexceeds 24 GB |
| 4K | 2 GB | 362 GBexceeds 24 GB |
| 8K | 4.1 GB | 364.1 GBexceeds 24 GB |
| 16K | 8.2 GB | 368.2 GBexceeds 24 GB |
| 32K | 16.4 GB | 376.4 GBexceeds 24 GB |
| 64K | 32.8 GB | 392.8 GBexceeds 24 GB |
Performance data for DeepSeek V3 across different hardware.
| Device | Quantization | Speed | Rating | Fits in VRAM |
|---|---|---|---|---|
| NVIDIA GeForce RTX 3060 12GB | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 4060 Ti 16GB | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 4070 Ti Super | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 4070 Super | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 4080 Super | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 4090 | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 3090 | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 5080 | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 5090 | Q2_K | — | Not Viable | ✗ (offload) |
| Apple M4 Pro (24GB Unified) | Q2_K | — | Not Viable | ✗ (offload) |
| Apple M4 Pro (48GB) | Q2_K | — | Not Viable | ✗ (offload) |
| Apple M4 Max (36GB Unified) | Q2_K | — | Not Viable | ✗ (offload) |
| Apple M4 Max (64GB Unified) | Q2_K | — | Not Viable | ✗ (offload) |
| Apple M4 Max (128GB Unified) | Q2_K | 3 tok/s | Marginal | ✓ |
| NVIDIA GeForce RTX 4060 8GB | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 4070 Ti 12GB | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 3080 10GB | Q2_K | — | Not Viable | ✗ (offload) |
| Apple M3 Pro (18GB Unified) | Q2_K | — | Not Viable | ✗ (offload) |
Data confidence: estimated. Last updated: 2026-03-15. Source