Gemma 4 31B
Gemma · Apache 2.0
Google's flagship open-weight model. Dense 30.7B parameters with 256K context. Benchmarks: 89.2% AIME 2026, 85.2% MMLU Pro, 84.3% GPQA Diamond, 80.0% LiveCodeBench v6, 86.4% agentic tool use. Supports text, image, and video input. Fits on a single RTX 4090 at Q4 or dual 16 GB GPUs. Direct successor to Gemma 3 27B with substantially better reasoning. Apache 2.0 licensed.
- Parameters
- 30.7B
- Architecture
- Dense
- Context
- 256,000 tokens
- Released
- 2026-04-02
- Engines
- llama.cpp, ollama, vLLM
- Builder Tools
- Claude Code, Codex CLI, Continue, Cursor, LM Studio, Ollama, Open WebUI, Windsurf
Parameters
30.7B
VRAM
28 GB
Context
250K
Formats
5
GPUs
33
Gemma 4 31B (30.7B) requires 28 GB VRAM at recommended quality (Q6_K). At efficient quality (Q4_K_M), it fits in 21 GB VRAM, making it compatible with the AMD Radeon RX 7900 XTX. On NVIDIA Grace Blackwell Ultra GB300, expect approximately 183 tok/s at Q8_0. For the best experience, High-End Home AI Server ($3,842) is recommended.
Source: OwnRig methodology
28 GB
Q6_K
26.73 GB
250K tokens
Chat
VRAM Requirements
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| full | Q8_0 | 34 GB | 32.64 GB |
| recommended | Q6_K | 28 GB | 26.73 GB |
| recommended | Q5_K_M | 24 GB | 22.61 GB |
| efficient | Q4_K_M | 21 GB | 19.6 GB |
| compressed | Q3_K_M | 16 GB | 14.59 GB |
Context Length Impact
KV cache VRAM at Q6_K quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 307 MB | 28.3 GBexceeds 24 GB |
| 4K | 614 MB | 28.6 GBexceeds 24 GB |
| 8K | 1.1 GB | 29.1 GBexceeds 24 GB |
| 16K | 2.2 GB | 30.2 GBexceeds 24 GB |
| 32K | 4.5 GB | 32.5 GBexceeds 24 GB |
| 64K | 9 GB | 37 GBexceeds 24 GB |
| 128K | 17.9 GB | 45.9 GBexceeds 24 GB |
Compatible GPUs
33 devicesShowing 33 of 33 entries
Builder Context
Gemma 4 31B is commonly used with Claude Code, Codex CLI, Continue, Cursor, LM Studio, Ollama, Open WebUI, Windsurf. For an AI coding workflow, pair it with an embedding model like nomic-embed-text for local RAG.
Frequently Asked Questions
- How much VRAM does Gemma 4 31B need?
- Gemma 4 31B requires 28 GB VRAM at recommended quality (Q6_K). At lower quality settings, it can fit in as little as 16 GB.
- What is the best GPU for Gemma 4 31B?
- The NVIDIA Grace Blackwell Ultra GB300 delivers the best performance for Gemma 4 31B, achieving 183 tok/s at Q8_0 with an excellent rating.
- Can I run Gemma 4 31B on an RTX 4060 Ti?
- Gemma 4 31B at Q3_K_M requires 28 GB VRAM, which exceeds the RTX 4060 Ti's 16 GB. Consider a lower quantization or a GPU with more VRAM.
- What quantization should I use for Gemma 4 31B?
- For the best quality, use Q6_K (28 GB VRAM). If your GPU has limited VRAM, Q3_K_M (16 GB) is the most efficient option with acceptable quality.
- Is Gemma 4 31B good for coding?
- Yes. Gemma 4 31B is used with Claude Code, Codex CLI, Continue, Cursor, LM Studio, Ollama, Open WebUI, Windsurf for local AI coding. For the best coding experience, pair it with an embedding model for local RAG.
Related Guides
Data confidence: estimated. Source
VRAM requirements are calculated from model parameters and may vary by inference engine, context length, and batch size. Performance estimates are based on community benchmarks and should be verified for your specific configuration.Gemma is a trademark of its respective owner. OwnRig is not affiliated with or endorsed by the model creator.