NVIDIA RTX PRO 6000 Blackwell Max-Q
96 GB GDDR7 Β· 1800 GB/s
From
$7,000
Estimated street price
VRAM
96 GB
Bandwidth
1800 GB/s
TDP
300W
Models
63
Tier
Full
The NVIDIA RTX PRO 6000 Blackwell Max-Q with 96 GB GDDR7 VRAM can handle 63 AI models across embedding, ai_building, coding. Best performance: all-MiniLM-L6-v2 at 2760 tok/s (excellent). For AI coding workflows, it supports the Full AI Builder tier, supporting concurrent coding + reasoning + embeddings. Current price: approximately $7,000.
Source: OwnRig methodology
96 GB
1800 GB/s
GDDR7
300W
2-slot, 280mm
Builder Capability: Full AI Builder
Supports concurrent coding + reasoning + embeddings. Can run 70B models at quantized precision.
Inference Backends
The software stacks that matter most for real-world inference on this device.
CUDA
productionPrimary workstation inference backend. Optimized for multi-GPU tensor parallelism.
Vulkan
stableFallback backend for llama.cpp and related local runtimes.
What it can run
63 models| all-MiniLM-L6-v2 | FP16 | 2760 tok/s | Excellent |
| Arcee Trinity Mini 26B | Q8_0 | 75 tok/s | Excellent |
| Arcee Trinity Nano 6B | Q8_0 | 318 tok/s | Excellent |
| Code Llama 34B Instruct | Q5_K_M | 44 tok/s | Good |
| Codestral 22B | Q5_K_M | 67 tok/s | Good |
| Command R 35B | Q8_0 | 26 tok/s | Acceptable |
| DeepSeek Coder V2 Lite 16B | Q8_0 | 58 tok/s | Good |
| DeepSeek R1 | Q2_K | β | Not viable |
| DeepSeek R1 Distill Qwen 32B | Q8_0 | 28 tok/s | Acceptable |
| DeepSeek R1 Distill Qwen 7B | Q8_0 | 119 tok/s | Excellent |
| DeepSeek V3 | Q2_K | β | Not viable |
| FLUX.1 Dev | FP16 | β | Excellent |
| Gemma 2 27B Instruct | Q5_K_M | 54 tok/s | Good |
| Gemma 2 9B Instruct | Q8_0 | 98 tok/s | Excellent |
| Gemma 3 12B | Q8_0 | 74 tok/s | Good |
| Gemma 3 27B | Q8_0 | 33 tok/s | Good |
| Gemma 3 4B | Q8_0 | 210 tok/s | Excellent |
| Gemma 4 26B-A4B | Q8_0 | 279 tok/s | Excellent |
| Gemma 4 31B | Q8_0 | 41 tok/s | Good |
| Gemma 4 E2B | Q8_0 | 271 tok/s | Excellent |
| Gemma 4 E4B | Q8_0 | 168 tok/s | Excellent |
| GigaChat Lightning 10B | Q8_0 | 299 tok/s | Excellent |
| InternLM 2.5 7B Chat | Q8_0 | 117 tok/s | Excellent |
| Llama 3.1 70B Instruct | Q5_K_M | 21 tok/s | Acceptable |
| Llama 3.1 8B Instruct | Q8_0 | 112 tok/s | Excellent |
| Llama 3.2 11B Vision | Q8_0 | 82 tok/s | Excellent |
| Llama 3.2 1B Instruct | Q8_0 | 433 tok/s | Excellent |
| Llama 3.2 3B Instruct | Q8_0 | 239 tok/s | Excellent |
| Llama 3.3 70B Instruct | Q8_0 | 13 tok/s | Acceptable |
| Llama 4 Scout | Q5_K_M | 87 tok/s | Excellent |
| LLaVA 1.6 13B | Q5_K_M | 114 tok/s | Excellent |
| Mistral 7B Instruct v0.3 | Q8_0 | 125 tok/s | Excellent |
| Mistral Large 2 123B | Q5_K_M | 12 tok/s | Acceptable |
| Mistral Small 24B Instruct | Q8_0 | 38 tok/s | Good |
| Mixtral 8x7B Instruct | Q5_K_M | 115 tok/s | Excellent |
| nomic-embed-text v1.5 | FP16 | 1840 tok/s | Excellent |
| NVIDIA Nemotron-3-super-120B-A12B | Q4_K_M | 145 tok/s | Excellent |
| Phi-3 Medium 14B Instruct | Q8_0 | 64 tok/s | Good |
| Phi-3 Mini 3.8B Instruct | Q8_0 | 236 tok/s | Excellent |
| Phi-4 14B | Q8_0 | 62 tok/s | Good |
| Phi-4 Mini | Q8_0 | 236 tok/s | Excellent |
| Qwen 2.5 14B Instruct | Q8_0 | 61 tok/s | Good |
| Qwen 2.5 72B Instruct | Q4_K_M | 24 tok/s | Acceptable |
| Qwen 2.5 7B Instruct | Q8_0 | 119 tok/s | Excellent |
| Qwen 2.5 Coder 32B Instruct | Q5_K_M | 46 tok/s | Good |
| Qwen 2.5 Coder 7B Instruct | Q8_0 | 119 tok/s | Excellent |
| Qwen3-14B Instruct | Q8_0 | 64 tok/s | Good |
| Qwen3-30B-A3B | Q8_0 | 256 tok/s | Excellent |
| Qwen3-32B Instruct | Q8_0 | 29 tok/s | Acceptable |
| Qwen3-8B Instruct | Q8_0 | 110 tok/s | Excellent |
| Qwen3.5-122B-A10B | Q8_0 | 90 tok/s | Excellent |
| Qwen3.5-27B | Q8_0 | 33 tok/s | Good |
| Qwen3.5-397B (MoE) | Q2_K | β | Not viable |
| Qwen3.6-27B | Q8_0 | 33 tok/s | Good |
| Qwen3.6-35B-A3B | Q5_K_M | 256 tok/s | Excellent |
| QwQ 32B Preview | Q5_K_M | 46 tok/s | Good |
| Stable Diffusion 3 Medium | FP16 | β | Excellent |
| Stable Diffusion 3.5 Large | FP16 | β | Excellent |
| Stable Diffusion XL 1.0 | FP16 | β | Excellent |
| StarCoder 2 15B | Q8_0 | 58 tok/s | Good |
| Whisper Large V3 | FP16 | β | Excellent |
| Whisper Large V3 Turbo | FP16 | β | Excellent |
| Yi 1.5 34B Chat | Q8_0 | 27 tok/s | Acceptable |
Showing 63 of 63 entries
Available in these Machines
Frequently Asked Questions
- What AI models can NVIDIA RTX PRO 6000 Blackwell Max-Q run?
- The NVIDIA RTX PRO 6000 Blackwell Max-Q can run 63 AI models. Top performers include all-MiniLM-L6-v2, nomic-embed-text v1.5, Llama 3.2 1B Instruct. See the full compatibility table above for speeds and quality ratings.
- Is NVIDIA RTX PRO 6000 Blackwell Max-Q good for AI coding?
- Yes. With 96 GB, the NVIDIA RTX PRO 6000 Blackwell Max-Q supports the Full AI Builder tier: concurrent coding + reasoning + embeddings.
- How much VRAM does NVIDIA RTX PRO 6000 Blackwell Max-Q have?
- The NVIDIA RTX PRO 6000 Blackwell Max-Q has 96 GB of GDDR7 VRAM with 1800 GB/s bandwidth.
- Can NVIDIA RTX PRO 6000 Blackwell Max-Q run 70B models?
- Yes. The NVIDIA RTX PRO 6000 Blackwell Max-Q can run 70B parameter models in memory at quantized quality.
- Is NVIDIA RTX PRO 6000 Blackwell Max-Q worth it for AI?
- At $7,000, the NVIDIA RTX PRO 6000 Blackwell Max-Q offers 96 GB GDDR7 VRAM and runs 63 AI models. It handles local AI inference well.
Own this GPU?
See every AI model it supports, expected performance, and how to build around it.