What AI models can NVIDIA RTX PRO 6000 Blackwell Max-Q run?

The NVIDIA RTX PRO 6000 Blackwell Max-Q can run 63 AI models. Top performers include all-MiniLM-L6-v2, nomic-embed-text v1.5, Llama 3.2 1B Instruct. See the full compatibility table above for speeds and quality ratings.

Is NVIDIA RTX PRO 6000 Blackwell Max-Q good for AI coding?

Yes. With 96 GB, the NVIDIA RTX PRO 6000 Blackwell Max-Q supports the Full AI Builder tier: concurrent coding + reasoning + embeddings.

How much VRAM does NVIDIA RTX PRO 6000 Blackwell Max-Q have?

The NVIDIA RTX PRO 6000 Blackwell Max-Q has 96 GB of GDDR7 VRAM with 1800 GB/s bandwidth.

Can NVIDIA RTX PRO 6000 Blackwell Max-Q run 70B models?

Yes. The NVIDIA RTX PRO 6000 Blackwell Max-Q can run 70B parameter models in memory at quantized quality.

Is NVIDIA RTX PRO 6000 Blackwell Max-Q worth it for AI?

At $7,000, the NVIDIA RTX PRO 6000 Blackwell Max-Q offers 96 GB GDDR7 VRAM and runs 63 AI models. It handles local AI inference well.

Desktop GPUupcoming

Desktop GPU

NVIDIA RTX PRO 6000 Blackwell Max-Q

96 GB GDDR7 · 1800 GB/s

From

$7,000

Estimated street price

VRAM

96 GB

Bandwidth

1800 GB/s

TDP

300W

Models

Tier

Full

The NVIDIA RTX PRO 6000 Blackwell Max-Q with 96 GB GDDR7 VRAM can handle 63 AI models across embedding, ai_building, coding. Best performance: all-MiniLM-L6-v2 at 2760 tok/s (excellent). For AI coding workflows, it supports the Full AI Builder tier, supporting concurrent coding + reasoning + embeddings. Current price: approximately $7,000.

Source: OwnRig methodology

VRAM

96 GB

Bandwidth

1800 GB/s

Memory Type

GDDR7

TDP

300W

Form Factor

2-slot, 280mm

Builder Capability: Full AI Builder

Supports concurrent coding + reasoning + embeddings. Can run 70B models at quantized precision.

Software

Inference Backends

The software stacks that matter most for real-world inference on this device.

CUDA

production

Primary workstation inference backend. Optimized for multi-GPU tensor parallelism.

Vulkan

stable

Fallback backend for llama.cpp and related local runtimes.

What it can run

63 models


all-MiniLM-L6-v2	FP16	2760 tok/s	Excellent
Arcee Trinity Mini 26B	Q8_0	75 tok/s	Excellent
Arcee Trinity Nano 6B	Q8_0	318 tok/s	Excellent
Code Llama 34B Instruct	Q5_K_M	44 tok/s	Good
Codestral 22B	Q5_K_M	67 tok/s	Good
Command R 35B	Q8_0	26 tok/s	Acceptable
DeepSeek Coder V2 Lite 16B	Q8_0	58 tok/s	Good
DeepSeek R1	Q2_K	–	Not viable
DeepSeek R1 Distill Qwen 32B	Q8_0	28 tok/s	Acceptable
DeepSeek R1 Distill Qwen 7B	Q8_0	119 tok/s	Excellent
DeepSeek V3	Q2_K	–	Not viable
FLUX.1 Dev	FP16	–	Excellent
Gemma 2 27B Instruct	Q5_K_M	54 tok/s	Good
Gemma 2 9B Instruct	Q8_0	98 tok/s	Excellent
Gemma 3 12B	Q8_0	74 tok/s	Good
Gemma 3 27B	Q8_0	33 tok/s	Good
Gemma 3 4B	Q8_0	210 tok/s	Excellent
Gemma 4 26B-A4B	Q8_0	279 tok/s	Excellent
Gemma 4 31B	Q8_0	41 tok/s	Good
Gemma 4 E2B	Q8_0	271 tok/s	Excellent
Gemma 4 E4B	Q8_0	168 tok/s	Excellent
GigaChat Lightning 10B	Q8_0	299 tok/s	Excellent
InternLM 2.5 7B Chat	Q8_0	117 tok/s	Excellent
Llama 3.1 70B Instruct	Q5_K_M	21 tok/s	Acceptable
Llama 3.1 8B Instruct	Q8_0	112 tok/s	Excellent
Llama 3.2 11B Vision	Q8_0	82 tok/s	Excellent
Llama 3.2 1B Instruct	Q8_0	433 tok/s	Excellent
Llama 3.2 3B Instruct	Q8_0	239 tok/s	Excellent
Llama 3.3 70B Instruct	Q8_0	13 tok/s	Acceptable
Llama 4 Scout	Q5_K_M	87 tok/s	Excellent
LLaVA 1.6 13B	Q5_K_M	114 tok/s	Excellent
Mistral 7B Instruct v0.3	Q8_0	125 tok/s	Excellent
Mistral Large 2 123B	Q5_K_M	12 tok/s	Acceptable
Mistral Small 24B Instruct	Q8_0	38 tok/s	Good
Mixtral 8x7B Instruct	Q5_K_M	115 tok/s	Excellent
nomic-embed-text v1.5	FP16	1840 tok/s	Excellent
NVIDIA Nemotron-3-super-120B-A12B	Q4_K_M	145 tok/s	Excellent
Phi-3 Medium 14B Instruct	Q8_0	64 tok/s	Good
Phi-3 Mini 3.8B Instruct	Q8_0	236 tok/s	Excellent
Phi-4 14B	Q8_0	62 tok/s	Good
Phi-4 Mini	Q8_0	236 tok/s	Excellent
Qwen 2.5 14B Instruct	Q8_0	61 tok/s	Good
Qwen 2.5 72B Instruct	Q4_K_M	24 tok/s	Acceptable
Qwen 2.5 7B Instruct	Q8_0	119 tok/s	Excellent
Qwen 2.5 Coder 32B Instruct	Q5_K_M	46 tok/s	Good
Qwen 2.5 Coder 7B Instruct	Q8_0	119 tok/s	Excellent
Qwen3-14B Instruct	Q8_0	64 tok/s	Good
Qwen3-30B-A3B	Q8_0	256 tok/s	Excellent
Qwen3-32B Instruct	Q8_0	29 tok/s	Acceptable
Qwen3-8B Instruct	Q8_0	110 tok/s	Excellent
Qwen3.5-122B-A10B	Q8_0	90 tok/s	Excellent
Qwen3.5-27B	Q8_0	33 tok/s	Good
Qwen3.5-397B (MoE)	Q2_K	–	Not viable
Qwen3.6-27B	Q8_0	33 tok/s	Good
Qwen3.6-35B-A3B	Q5_K_M	256 tok/s	Excellent
QwQ 32B Preview	Q5_K_M	46 tok/s	Good
Stable Diffusion 3 Medium	FP16	–	Excellent
Stable Diffusion 3.5 Large	FP16	–	Excellent
Stable Diffusion XL 1.0	FP16	–	Excellent
StarCoder 2 15B	Q8_0	58 tok/s	Good
Whisper Large V3	FP16	–	Excellent
Whisper Large V3 Turbo	FP16	–	Excellent
Yi 1.5 34B Chat	Q8_0	27 tok/s	Acceptable

Showing 63 of 63 entries

Ready to Buy

Available in these Machines

Desktop

HP Z8 Fury G6i (2× RTX PRO 6000 Max-Q, 192 GB)

$35,000

Desktop

HP Z8 Fury G6i (4× RTX PRO 6000 Max-Q, 384 GB)

$90,000

FAQ

Frequently Asked Questions

What AI models can NVIDIA RTX PRO 6000 Blackwell Max-Q run?: The NVIDIA RTX PRO 6000 Blackwell Max-Q can run 63 AI models. Top performers include all-MiniLM-L6-v2, nomic-embed-text v1.5, Llama 3.2 1B Instruct. See the full compatibility table above for speeds and quality ratings.
Is NVIDIA RTX PRO 6000 Blackwell Max-Q good for AI coding?: Yes. With 96 GB, the NVIDIA RTX PRO 6000 Blackwell Max-Q supports the Full AI Builder tier: concurrent coding + reasoning + embeddings.
How much VRAM does NVIDIA RTX PRO 6000 Blackwell Max-Q have?: The NVIDIA RTX PRO 6000 Blackwell Max-Q has 96 GB of GDDR7 VRAM with 1800 GB/s bandwidth.
Can NVIDIA RTX PRO 6000 Blackwell Max-Q run 70B models?: Yes. The NVIDIA RTX PRO 6000 Blackwell Max-Q can run 70B parameter models in memory at quantized quality.
Is NVIDIA RTX PRO 6000 Blackwell Max-Q worth it for AI?: At $7,000, the NVIDIA RTX PRO 6000 Blackwell Max-Q offers 96 GB GDDR7 VRAM and runs 63 AI models. It handles local AI inference well.

Own this GPU?

See every AI model it supports, expected performance, and how to build around it.

Check my rig

All GPUs