Qwen · Apache 2.0
Reasoning-focused model — uses chain-of-thought natively. Builders pair this with a coding model for complex architecture decisions. Approaches o1-mini on reasoning benchmarks. Same size as Qwen 2.5 Coder 32B, so they compete for VRAM — run one at a time, not concurrently.
QwQ 32B Preview (32.5B) requires 21.9 GB VRAM at recommended quality (Q5_K_M). At efficient quality (Q4_K_M), it fits in 18.4 GB VRAM, making it compatible with the NVIDIA GeForce RTX 4090. On NVIDIA GeForce RTX 4090, expect approximately 24 tok/s at Q4_K_M. For the best experience, AMD AI Powerhouse ($1,818) is recommended.
— OwnRig methodology, data updated 2026-03-01
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| recommended | Q5_K_M | 21.9 GB | 19.5 GB |
| efficient | Q4_K_M | 18.4 GB | 16.3 GB |
| compressed | Q3_K_M | 14.8 GB | 12.7 GB |
| compressed | Q2_K | 11.6 GB | 9.8 GB |
KV cache VRAM at Q5_K_M quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 410 MB | 22.3 GB |
| 4K | 819 MB | 22.7 GB |
| 8K | 1.5 GB | 23.4 GB |
| 16K | 3.1 GB | 25 GBexceeds 24 GB |
| 32K | 6.1 GB | 28 GBexceeds 24 GB |
| 64K | 12.3 GB | 34.2 GBexceeds 24 GB |
| 128K | 24.6 GB | 46.5 GBexceeds 24 GB |
Performance data for QwQ 32B Preview across different hardware.
| Device | Quantization | Speed | Rating | Fits in VRAM |
|---|---|---|---|---|
| NVIDIA GeForce RTX 4090 | Q4_K_M | 24 tok/s | Good | ✓ |
| Apple M4 Max (64GB Unified) | Q5_K_M | 17 tok/s | Good | ✓ |
| Apple M4 Max (128GB Unified) | Q8_0 | 14 tok/s | Good | ✓ |
| NVIDIA GeForce RTX 4060 8GB | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 4070 Ti 12GB | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 3080 10GB | Q2_K | — | Not Viable | ✗ (offload) |
| Apple M3 Pro (18GB Unified) | Q3_K_M | — | Not Viable | ✗ (offload) |
QwQ 32B Preview is commonly used with Open WebUI, LM Studio. For an AI coding workflow, pair it with an embedding model like nomic-embed-text for local RAG.
Complete PC builds that can run QwQ 32B Preview.

NVIDIA GeForce RTX 4090 · 64GB DDR5-5600 (2x32GB)

2x NVIDIA GeForce RTX 3090 (Used) · 128GB DDR5-5600 (4x32GB)

NVIDIA GeForce RTX 4090 · 64GB DDR5-6000 (2x32GB)

2x NVIDIA GeForce RTX 3090 24GB (Used) + NVLink Bridge · 128GB DDR5-5600 (4x32GB)

Apple M4 Max 128GB (Mac Studio)

NVIDIA GeForce RTX 3090 24GB (Used) · 64GB DDR5-5600 (2x32GB)
Data confidence: estimated. Last updated: 2026-03-01. Source