How much VRAM does Gemma 4 31B need?

Gemma 4 31B requires 28 GB VRAM at recommended quality (Q6_K). At lower quality settings, it can fit in as little as 16 GB.

What is the best GPU for Gemma 4 31B?

The NVIDIA Grace Blackwell Ultra GB300 delivers the best performance for Gemma 4 31B, achieving 183 tok/s at Q8_0 with an excellent rating.

Can I run Gemma 4 31B on an RTX 4060 Ti?

Gemma 4 31B at Q3_K_M requires 28 GB VRAM, which exceeds the RTX 4060 Ti's 16 GB. Consider a lower quantization or a GPU with more VRAM.

What quantization should I use for Gemma 4 31B?

For the best quality, use Q6_K (28 GB VRAM). If your GPU has limited VRAM, Q3_K_M (16 GB) is the most efficient option with acceptable quality.

Is Gemma 4 31B good for coding?

Yes. Gemma 4 31B is used with Claude Code, Codex CLI, Continue, Cursor, LM Studio, Ollama, Open WebUI, Windsurf for local AI coding. For the best coding experience, pair it with an embedding model for local RAG.

ChatCodingReasoningMulti-purpose30.7B

Chat

Gemma 4 31B

Gemma · Apache 2.0

Google's flagship open-weight model. Dense 30.7B parameters with 256K context. Benchmarks: 89.2% AIME 2026, 85.2% MMLU Pro, 84.3% GPQA Diamond, 80.0% LiveCodeBench v6, 86.4% agentic tool use. Supports text, image, and video input. Fits on a single RTX 4090 at Q4 or dual 16 GB GPUs. Direct successor to Gemma 3 27B with substantially better reasoning. Apache 2.0 licensed.

Parameters: 30.7B
Architecture: Dense
Context: 256,000 tokens
Released: 2026-04-02
Engines: llama.cpp, ollama, vLLM
Builder Tools: Claude Code, Codex CLI, Continue, Cursor, LM Studio, Ollama, Open WebUI, Windsurf

Parameters

30.7B

VRAM

28 GB

Context

250K

Formats

GPUs

Gemma 4 31B (30.7B) requires 28 GB VRAM at recommended quality (Q6_K). At efficient quality (Q4_K_M), it fits in 21 GB VRAM, making it compatible with the AMD Radeon RX 7900 XTX. On NVIDIA Grace Blackwell Ultra GB300, expect approximately 183 tok/s at Q8_0. For the best experience, High-End Home AI Server ($3,842) is recommended.

Source: OwnRig methodology

VRAM (Recommended)

28 GB

Quantization

Q6_K

File Size

26.73 GB

Max Context

250K tokens

Primary Use

Chat

Memory

VRAM Requirements

Quality	Quantization	VRAM	File Size
full	Q8_0	34 GB	32.64 GB
recommended	Q6_K	28 GB	26.73 GB
recommended	Q5_K_M	24 GB	22.61 GB
efficient	Q4_K_M	21 GB	19.6 GB
compressed	Q3_K_M	16 GB	14.59 GB

Scaling

Context Length Impact

KV cache VRAM at Q6_K quality. Longer context = more memory.

Context	KV Cache	Total VRAM
2K	307 MB	28.3 GBexceeds 24 GB
4K	614 MB	28.6 GBexceeds 24 GB
8K	1.1 GB	29.1 GBexceeds 24 GB
16K	2.2 GB	30.2 GBexceeds 24 GB
32K	4.5 GB	32.5 GBexceeds 24 GB
64K	9 GB	37 GBexceeds 24 GB
128K	17.9 GB	45.9 GBexceeds 24 GB

Compatible GPUs

33 devices


NVIDIA Grace Blackwell Ultra GB300	Q8_0	183 tok/s	Excellent
NVIDIA GeForce RTX 3090	Q4_K_M	35 tok/s	Good
NVIDIA GeForce RTX 4090	Q4_K_M	38 tok/s	Good
NVIDIA GeForce RTX 5080	Q3_K_M	22 tok/s	Good
NVIDIA GeForce RTX 5090	Q6_K	50 tok/s	Good
NVIDIA RTX PRO 6000 Blackwell	Q8_0	41 tok/s	Good
NVIDIA RTX PRO 6000 Blackwell Max-Q	Q8_0	41 tok/s	Good
AMD Radeon RX 7900 XTX	Q4_K_M	36 tok/s	Good
Apple M4 Max (36GB Unified)	Q6_K	15 tok/s	Acceptable
Apple M4 Ultra (192GB)	Q8_0	18 tok/s	Acceptable
AMD Radeon Pro W7900	Q8_0	19 tok/s	Acceptable
NVIDIA GeForce RTX 4070 Ti Super	Q3_K_M	15 tok/s	Acceptable
NVIDIA GeForce RTX 4080 Super	Q3_K_M	16 tok/s	Acceptable
NVIDIA RTX 4090 Laptop (150-175W)	Q3_K_M	11 tok/s	Acceptable
AMD Radeon RX 9070	Q3_K_M	14 tok/s	Acceptable
Apple M3 Pro (18GB Unified)	Q3_K_M	7 tok/s	Marginal
Apple M4 Max (128GB Unified)	Q8_0	12 tok/s	Marginal
Apple M4 Max (64GB Unified)	Q8_0	12 tok/s	Marginal
Apple M4 Pro (24GB Unified)	Q4_K_M	10 tok/s	Marginal
Apple M4 Pro (48GB)	Q8_0	6 tok/s	Marginal
NVIDIA GeForce RTX 4060 Ti 16GB	Q3_K_M	6 tok/s	Marginal
AMD Radeon RX 9060 XT 16GB	Q3_K_M	7 tok/s	Marginal
NVIDIA GeForce RTX 5060 Ti 16GB	Q3_K_M	7 tok/s	Marginal
Apple M4 (16GB Unified)	Q3_K_M	1 tok/s	Not viable
Apple M1 (8GB Unified)	Q3_K_M	–	Not viable
Apple M1 (16GB Unified)	Q3_K_M	–	Not viable
Apple M1 Pro (16GB Unified)	Q3_K_M	–	Not viable
Apple M2 (8GB Unified)	Q3_K_M	–	Not viable
Apple M2 (16GB Unified)	Q3_K_M	–	Not viable
Apple M2 Pro (16GB Unified)	Q3_K_M	–	Not viable
Apple M3 (8GB Unified)	Q3_K_M	–	Not viable
Apple M3 (16GB Unified)	Q3_K_M	–	Not viable
AMD Radeon RX 9060 XT 8GB	Q3_K_M	–	Not viable

Showing 33 of 33 entries

Builder Context

Gemma 4 31B is commonly used with Claude Code, Codex CLI, Continue, Cursor, LM Studio, Ollama, Open WebUI, Windsurf. For an AI coding workflow, pair it with an embedding model like nomic-embed-text for local RAG.

FAQ

Frequently Asked Questions

How much VRAM does Gemma 4 31B need?: Gemma 4 31B requires 28 GB VRAM at recommended quality (Q6_K). At lower quality settings, it can fit in as little as 16 GB.
What is the best GPU for Gemma 4 31B?: The NVIDIA Grace Blackwell Ultra GB300 delivers the best performance for Gemma 4 31B, achieving 183 tok/s at Q8_0 with an excellent rating.
Can I run Gemma 4 31B on an RTX 4060 Ti?: Gemma 4 31B at Q3_K_M requires 28 GB VRAM, which exceeds the RTX 4060 Ti's 16 GB. Consider a lower quantization or a GPU with more VRAM.
What quantization should I use for Gemma 4 31B?: For the best quality, use Q6_K (28 GB VRAM). If your GPU has limited VRAM, Q3_K_M (16 GB) is the most efficient option with acceptable quality.
Is Gemma 4 31B good for coding?: Yes. Gemma 4 31B is used with Claude Code, Codex CLI, Continue, Cursor, LM Studio, Ollama, Open WebUI, Windsurf for local AI coding. For the best coding experience, pair it with an embedding model for local RAG.

Related Guides

Tutorial

Running Gemma 4 locally: which GPU you actually need

Gemma 4 VRAM requirements for every variant: E2B, E4B, 26B-A4B, and 31B. Which GPUs can run each, what quantization to use, and the honest call on RTX 4060 vs RTX 4090.

All models

Data confidence: estimated. Source

VRAM requirements are calculated from model parameters and may vary by inference engine, context length, and batch size. Performance estimates are based on community benchmarks and should be verified for your specific configuration.Gemma is a trademark of its respective owner. OwnRig is not affiliated with or endorsed by the model creator.