Gemma · Gemma License
Google's 27B model with excellent knowledge distillation. Strong reasoning and coding capabilities at a size that fits on a single 24GB GPU at Q4. Limited to 8K context.
Gemma 2 27B Instruct (27.23B) requires 18.5 GB VRAM at recommended quality (Q5_K_M). At efficient quality (Q4_K_M), it fits in 15.5 GB VRAM, making it compatible with the NVIDIA GeForce RTX 4060 Ti 16GB. On NVIDIA GeForce RTX 4090, expect approximately 22 tok/s at Q4_K_M. For the best experience, AMD AI Powerhouse ($1,818) is recommended.
— OwnRig methodology, data updated 2026-03-01
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| recommended | Q5_K_M | 18.5 GB | 16.3 GB |
| efficient | Q4_K_M | 15.5 GB | 13.6 GB |
| compressed | Q3_K_M | 12.5 GB | 10.6 GB |
| compressed | Q2_K | 9.8 GB | 8.2 GB |
KV cache VRAM at Q5_K_M quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 307 MB | 18.8 GB |
| 4K | 614 MB | 19.1 GB |
| 8K | 1.3 GB | 19.8 GB |
Performance data for Gemma 2 27B Instruct across different hardware.
| Device | Quantization | Speed | Rating | Fits in VRAM |
|---|---|---|---|---|
| NVIDIA GeForce RTX 4090 | Q4_K_M | 22 tok/s | Good | ✓ |
| NVIDIA GeForce RTX 4060 Ti 16GB | Q4_K_M | 12 tok/s | Acceptable | ✓ |
| Apple M4 Max (36GB Unified) | Q5_K_M | 15 tok/s | Good | ✓ |
| NVIDIA GeForce RTX 4060 8GB | Q3_K_M | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 4070 Ti 12GB | Q3_K_M | — | Not Viable | ✗ (offload) |
| NVIDIA GeForce RTX 3080 10GB | Q3_K_M | — | Not Viable | ✗ (offload) |
| Apple M3 Pro (18GB Unified) | Q4_K_M | — | Not Viable | ✗ (offload) |
Complete PC builds that can run Gemma 2 27B Instruct.

NVIDIA GeForce RTX 4090 · 64GB DDR5-5600 (2x32GB)

NVIDIA GeForce RTX 4090 · 64GB DDR5-6000 (2x32GB)

2x NVIDIA GeForce RTX 3090 24GB (Used) + NVLink Bridge · 128GB DDR5-5600 (4x32GB)

NVIDIA GeForce RTX 3090 24GB (Used) · 64GB DDR5-5600 (2x32GB)
Data confidence: verified. Last updated: 2026-03-01. Source