Llama · Llama 2 Community License
Meta's dedicated 34B coding model. Still competitive for code generation but being surpassed by newer models like Qwen 2.5 Coder 32B. Shorter context window (16K) is a limitation for large codebases.
Code Llama 34B Instruct (33.7B) requires 22.7 GB VRAM at recommended quality (Q5_K_M). At efficient quality (Q4_K_M), it fits in 19 GB VRAM, making it compatible with the NVIDIA GeForce RTX 4090. On NVIDIA GeForce RTX 4090, expect approximately 22 tok/s at Q4_K_M. For the best experience, AMD AI Powerhouse ($1,818) is recommended.
— OwnRig methodology, data updated 2026-03-01
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| recommended | Q5_K_M | 22.7 GB | 20.2 GB |
| efficient | Q4_K_M | 19 GB | 16.9 GB |
| compressed | Q3_K_M | 15.3 GB | 13.1 GB |
| compressed | Q2_K | 12 GB | 10.1 GB |
KV cache VRAM at Q5_K_M quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 410 MB | 23.1 GB |
| 4K | 819 MB | 23.5 GB |
| 8K | 1.5 GB | 24.2 GBexceeds 24 GB |
| 16K | 3.1 GB | 25.8 GBexceeds 24 GB |
Performance data for Code Llama 34B Instruct across different hardware.
| Device | Quantization | Speed | Rating | Fits in VRAM |
|---|---|---|---|---|
| NVIDIA GeForce RTX 4090 | Q4_K_M | 22 tok/s | Good | ✓ |
| Apple M4 Max (36GB Unified) | Q4_K_M | 14 tok/s | Good | ✓ |
| NVIDIA GeForce RTX 4060 8GB | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 4070 Ti 12GB | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 3080 10GB | Q2_K | — | Not Viable | ✗ (offload) |
| Apple M3 Pro (18GB Unified) | Q3_K_M | — | Not Viable | ✗ (offload) |
Code Llama 34B Instruct is commonly used with Continue, Aider. For an AI coding workflow, pair it with an embedding model like nomic-embed-text for local RAG.
Complete PC builds that can run Code Llama 34B Instruct.
Data confidence: verified. Last updated: 2026-03-01. Source