Qwen · Apache 2.0
The coding model that defines the builder workflow. Matches GPT-4 on HumanEval. This is what Cursor and Continue.dev users run locally when they want to eliminate API dependency. Apache 2.0 license. The cornerstone of the 'Full AI Builder' profile.
Qwen 2.5 Coder 32B Instruct (32.5B) requires 21.9 GB VRAM at recommended quality (Q5_K_M). At efficient quality (Q4_K_M), it fits in 18.4 GB VRAM, making it compatible with the NVIDIA GeForce RTX 4060 Ti 16GB. On NVIDIA GeForce RTX 5090, expect approximately 45 tok/s at Q5_K_M. For the best experience, AMD AI Powerhouse ($1,818) is recommended.
— OwnRig methodology, data updated 2026-03-01
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| recommended | Q5_K_M | 21.9 GB | 19.5 GB |
| efficient | Q4_K_M | 18.4 GB | 16.3 GB |
| compressed | Q3_K_M | 14.8 GB | 12.7 GB |
| compressed | Q2_K | 11.6 GB | 9.8 GB |
KV cache VRAM at Q5_K_M quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 410 MB | 22.3 GB |
| 4K | 819 MB | 22.7 GB |
| 8K | 1.5 GB | 23.4 GB |
| 16K | 3.1 GB | 25 GBexceeds 24 GB |
| 32K | 6.1 GB | 28 GBexceeds 24 GB |
| 64K | 12.3 GB | 34.2 GBexceeds 24 GB |
| 128K | 24.6 GB | 46.5 GBexceeds 24 GB |
Performance data for Qwen 2.5 Coder 32B Instruct across different hardware.
| Device | Quantization | Speed | Rating | Fits in VRAM |
|---|---|---|---|---|
| NVIDIA GeForce RTX 4060 Ti 16GB | Q3_K_M | 10 tok/s | Acceptable | ✓ |
| NVIDIA GeForce RTX 4070 Ti Super | Q3_K_M | 16 tok/s | Acceptable | ✓ |
| NVIDIA GeForce RTX 4090 | Q4_K_M | 25 tok/s | Good | ✓ |
| NVIDIA GeForce RTX 3090 | Q4_K_M | 18 tok/s | Good | ✓ |
| Apple M4 Pro (24GB Unified) | Q4_K_M | 10 tok/s | Acceptable | ✓ |
| Apple M4 Max (36GB Unified) | Q5_K_M | 18 tok/s | Good | ✓ |
| Apple M4 Max (64GB Unified) | Q5_K_M | 18 tok/s | Good | ✓ |
| Apple M4 Max (128GB Unified) | Q8_0 | 15 tok/s | Good | ✓ |
| NVIDIA GeForce RTX 5090 | Q5_K_M | 45 tok/s | Excellent | ✓ |
| Apple M4 Pro (48GB) | Q4_K_M | 10 tok/s | Acceptable | ✓ |
| NVIDIA GeForce RTX 4080 Super | Q3_K_M | 18 tok/s | Acceptable | ✓ |
| NVIDIA GeForce RTX 4060 8GB | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 4070 Ti 12GB | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 3080 10GB | Q2_K | — | Not Viable | ✗ (offload) |
| Apple M3 Pro (18GB Unified) | Q3_K_M | — | Not Viable | ✗ (offload) |
Qwen 2.5 Coder 32B Instruct is commonly used with Cursor, Continue, Aider, Windsurf, Codex CLI. For an AI coding workflow, pair it with an embedding model like nomic-embed-text for local RAG.
Complete PC builds that can run Qwen 2.5 Coder 32B Instruct.

NVIDIA GeForce RTX 4090 · 64GB DDR5-5600 (2x32GB)
AMD Radeon RX 7900 XTX 24GB · 32GB DDR5-5600 (2x16GB)

2x NVIDIA GeForce RTX 3090 (Used) · 128GB DDR5-5600 (4x32GB)

NVIDIA GeForce RTX 4090 · 64GB DDR5-6000 (2x32GB)

2x NVIDIA GeForce RTX 3090 24GB (Used) + NVLink Bridge · 128GB DDR5-5600 (4x32GB)

Apple M4 Max 128GB (Mac Studio)

NVIDIA GeForce RTX 4060 Ti 16GB · 32GB DDR5-5600 (2x16GB)

NVIDIA GeForce RTX 3090 24GB (Used) · 64GB DDR5-5600 (2x32GB)

NVIDIA GeForce RTX 5090 32GB · 64GB DDR5-6000 (2x32GB)

NVIDIA GeForce RTX 4060 Ti 16GB · 32GB DDR5-5600 (2x16GB)
Data confidence: verified. Last updated: 2026-03-01. Source