DeepSeek · MIT
Distilled reasoning model with strong coding and chat capabilities.
DeepSeek R1 Distill Qwen 32B (32.5B) requires 28 GB VRAM at recommended quality (Q6_K). At efficient quality (Q4_K_M), it fits in 19 GB VRAM, making it compatible with the NVIDIA GeForce RTX 4070 Ti Super. On NVIDIA GeForce RTX 5090, expect approximately 42 tok/s at Q5_K_M. For the best experience, High-End Home AI Server ($3,842) is recommended.
— OwnRig methodology, data updated 2026-03-15
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| full | Q8_0 | 36 GB | 32 GB |
| recommended | Q6_K | 28 GB | 24 GB |
| recommended | Q5_K_M | 24 GB | 20 GB |
| efficient | Q4_K_M | 19 GB | 16 GB |
| compressed | Q3_K_M | 15.5 GB | 12.5 GB |
KV cache VRAM at Q6_K quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 512 MB | 28.5 GBexceeds 24 GB |
| 4K | 1 GB | 29 GBexceeds 24 GB |
| 8K | 2 GB | 30 GBexceeds 24 GB |
| 16K | 4.1 GB | 32.1 GBexceeds 24 GB |
| 32K | 8.2 GB | 36.2 GBexceeds 24 GB |
Performance data for DeepSeek R1 Distill Qwen 32B across different hardware.
| Device | Quantization | Speed | Rating | Fits in VRAM |
|---|---|---|---|---|
| NVIDIA GeForce RTX 4090 | Q4_K_M | 24 tok/s | Good | ✓ |
| NVIDIA GeForce RTX 4070 Ti Super | Q3_K_M | 15 tok/s | Acceptable | ✓ |
| Apple M4 Max (64GB Unified) | Q4_K_M | 17 tok/s | Good | ✓ |
| Apple M4 Max (128GB Unified) | Q5_K_M | 16 tok/s | Good | ✓ |
| NVIDIA GeForce RTX 5090 | Q5_K_M | 42 tok/s | Excellent | ✓ |
| NVIDIA GeForce RTX 4060 8GB | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 4070 Ti 12GB | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 3080 10GB | Q2_K | — | Not Viable | ✗ (offload) |
| Apple M3 Pro (18GB Unified) | Q3_K_M | — | Not Viable | ✗ (offload) |
DeepSeek R1 Distill Qwen 32B is commonly used with Cursor, Continue, Aider, Open WebUI, LM Studio. For an AI coding workflow, pair it with an embedding model like nomic-embed-text for local RAG.
Complete PC builds that can run DeepSeek R1 Distill Qwen 32B.

2x NVIDIA GeForce RTX 3090 24GB (Used) + NVLink Bridge · 128GB DDR5-5600 (4x32GB)

NVIDIA GeForce RTX 3090 24GB (Used) · 64GB DDR5-5600 (2x32GB)

NVIDIA GeForce RTX 5090 32GB · 64GB DDR5-6000 (2x32GB)
Data confidence: estimated. Last updated: 2026-03-15. Source