Cohere · CC-BY-NC-4.0
Cohere's RAG-optimized model with strong reasoning and long-context support.
Command R 35B (35B) requires 30 GB VRAM at recommended quality (Q6_K). At efficient quality (Q4_K_M), it fits in 20 GB VRAM. For the best experience, High-End Home AI Server ($3,842) is recommended.
— OwnRig methodology, data updated 2026-03-15
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| full | Q8_0 | 38.5 GB | 35 GB |
| recommended | Q6_K | 30 GB | 26 GB |
| recommended | Q5_K_M | 25 GB | 21.5 GB |
| efficient | Q4_K_M | 20 GB | 17.5 GB |
| compressed | Q3_K_M | 16 GB | 13.5 GB |
KV cache VRAM at Q6_K quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 614 MB | 30.6 GBexceeds 24 GB |
| 4K | 1.1 GB | 31.1 GBexceeds 24 GB |
| 8K | 2.2 GB | 32.2 GBexceeds 24 GB |
| 16K | 4.5 GB | 34.5 GBexceeds 24 GB |
| 32K | 9 GB | 39 GBexceeds 24 GB |
| 64K | 17.9 GB | 47.9 GBexceeds 24 GB |
Performance data for Command R 35B across different hardware.
| Device | Quantization | Speed | Rating | Fits in VRAM |
|---|---|---|---|---|
| NVIDIA GeForce RTX 4060 8GB | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 4070 Ti 12GB | Q2_K | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 3080 10GB | Q2_K | — | Not Viable | ✗ (offload) |
| Apple M3 Pro (18GB Unified) | Q3_K_M | — | Not Viable | ✗ (offload) |
Data confidence: estimated. Last updated: 2026-03-15. Source