NVIDIA RTX PRO 6000 Blackwell
96 GB GDDR7 Β· 1800 GB/s
From
$7,500
Estimated street price
VRAM
96 GB
Bandwidth
1800 GB/s
TDP
600W
Models
63
Tier
Full
The NVIDIA RTX PRO 6000 Blackwell with 96 GB GDDR7 VRAM can handle 63 AI models across embedding, ai_building, coding. Best performance: all-MiniLM-L6-v2 at 3000 tok/s (excellent). For AI coding workflows, it supports the Full AI Builder tier, supporting concurrent coding + reasoning + embeddings. Current price: approximately $7,500.
Source: OwnRig methodology
96 GB
1800 GB/s
GDDR7
600W
3-slot, 330mm
Builder Capability: Full AI Builder
Supports concurrent coding + reasoning + embeddings. Can run 70B models at quantized precision.
Inference Backends
The software stacks that matter most for real-world inference on this device.
CUDA
productionPrimary high-performance backend for NVIDIA workstation inference.
Vulkan
stableFallback backend for llama.cpp and related local runtimes.
What it can run
63 models| all-MiniLM-L6-v2 | FP16 | 3000 tok/s | Excellent |
| Arcee Trinity Mini 26B | Q8_0 | 75 tok/s | Excellent |
| Arcee Trinity Nano 6B | Q8_0 | 318 tok/s | Excellent |
| Code Llama 34B Instruct | Q5_K_M | 48 tok/s | Good |
| Codestral 22B | Q5_K_M | 73 tok/s | Good |
| Command R 35B | Q8_0 | 28 tok/s | Acceptable |
| DeepSeek Coder V2 Lite 16B | Q8_0 | 63 tok/s | Good |
| DeepSeek R1 | Q2_K | β | Not viable |
| DeepSeek R1 Distill Qwen 32B | Q8_0 | 30 tok/s | Acceptable |
| DeepSeek R1 Distill Qwen 7B | Q8_0 | 129 tok/s | Excellent |
| DeepSeek V3 | Q2_K | β | Not viable |
| FLUX.1 Dev | FP16 | β | Excellent |
| Gemma 2 27B Instruct | Q5_K_M | 59 tok/s | Good |
| Gemma 2 9B Instruct | Q8_0 | 106 tok/s | Excellent |
| Gemma 3 12B | Q8_0 | 80 tok/s | Good |
| Gemma 3 27B | Q8_0 | 36 tok/s | Good |
| Gemma 3 4B | Q8_0 | 228 tok/s | Excellent |
| Gemma 4 26B-A4B | Q8_0 | 279 tok/s | Excellent |
| Gemma 4 31B | Q8_0 | 41 tok/s | Good |
| Gemma 4 E2B | Q8_0 | 271 tok/s | Excellent |
| Gemma 4 E4B | Q8_0 | 168 tok/s | Excellent |
| GigaChat Lightning 10B | Q8_0 | 325 tok/s | Excellent |
| InternLM 2.5 7B Chat | Q8_0 | 127 tok/s | Excellent |
| Llama 3.1 70B Instruct | Q5_K_M | 23 tok/s | Acceptable |
| Llama 3.1 8B Instruct | Q8_0 | 122 tok/s | Excellent |
| Llama 3.2 11B Vision | Q8_0 | 89 tok/s | Excellent |
| Llama 3.2 1B Instruct | Q8_0 | 471 tok/s | Excellent |
| Llama 3.2 3B Instruct | Q8_0 | 260 tok/s | Excellent |
| Llama 3.3 70B Instruct | Q8_0 | 14 tok/s | Acceptable |
| Llama 4 Scout | Q5_K_M | 95 tok/s | Excellent |
| LLaVA 1.6 13B | Q5_K_M | 124 tok/s | Excellent |
| Mistral 7B Instruct v0.3 | Q8_0 | 136 tok/s | Excellent |
| Mistral Large 2 123B | Q5_K_M | 13 tok/s | Acceptable |
| Mistral Small 24B Instruct | Q8_0 | 41 tok/s | Good |
| Mixtral 8x7B Instruct | Q5_K_M | 125 tok/s | Excellent |
| nomic-embed-text v1.5 | FP16 | 2000 tok/s | Excellent |
| NVIDIA Nemotron-3-super-120B-A12B | Q4_K_M | 158 tok/s | Excellent |
| Phi-3 Medium 14B Instruct | Q8_0 | 70 tok/s | Good |
| Phi-3 Mini 3.8B Instruct | Q8_0 | 257 tok/s | Excellent |
| Phi-4 14B | Q8_0 | 67 tok/s | Good |
| Phi-4 Mini | Q8_0 | 257 tok/s | Excellent |
| Qwen 2.5 14B Instruct | Q8_0 | 66 tok/s | Good |
| Qwen 2.5 72B Instruct | Q4_K_M | 26 tok/s | Acceptable |
| Qwen 2.5 7B Instruct | Q8_0 | 129 tok/s | Excellent |
| Qwen 2.5 Coder 32B Instruct | Q5_K_M | 50 tok/s | Good |
| Qwen 2.5 Coder 7B Instruct | Q8_0 | 129 tok/s | Excellent |
| Qwen3-14B Instruct | Q8_0 | 70 tok/s | Good |
| Qwen3-30B-A3B | Q8_0 | 278 tok/s | Excellent |
| Qwen3-32B Instruct | Q8_0 | 31 tok/s | Good |
| Qwen3-8B Instruct | Q8_0 | 120 tok/s | Excellent |
| Qwen3.5-122B-A10B | Q8_0 | 98 tok/s | Excellent |
| Qwen3.5-27B | Q8_0 | 36 tok/s | Good |
| Qwen3.5-397B (MoE) | Q2_K | β | Not viable |
| Qwen3.6-27B | Q8_0 | 36 tok/s | Good |
| Qwen3.6-35B-A3B | Q5_K_M | 278 tok/s | Excellent |
| QwQ 32B Preview | Q5_K_M | 50 tok/s | Good |
| Stable Diffusion 3 Medium | FP16 | β | Excellent |
| Stable Diffusion 3.5 Large | FP16 | β | Excellent |
| Stable Diffusion XL 1.0 | FP16 | β | Excellent |
| StarCoder 2 15B | Q8_0 | 63 tok/s | Good |
| Whisper Large V3 | FP16 | β | Excellent |
| Whisper Large V3 Turbo | FP16 | β | Excellent |
| Yi 1.5 34B Chat | Q8_0 | 29 tok/s | Acceptable |
Showing 63 of 63 entries
Available in these Machines
Buy Used
Prices and availability vary. Inspect hardware before purchasing. Some links may be affiliate links.
Frequently Asked Questions
- What AI models can NVIDIA RTX PRO 6000 Blackwell run?
- The NVIDIA RTX PRO 6000 Blackwell can run 63 AI models. Top performers include all-MiniLM-L6-v2, nomic-embed-text v1.5, Llama 3.2 1B Instruct. See the full compatibility table above for speeds and quality ratings.
- Is NVIDIA RTX PRO 6000 Blackwell good for AI coding?
- Yes. With 96 GB, the NVIDIA RTX PRO 6000 Blackwell supports the Full AI Builder tier: concurrent coding + reasoning + embeddings.
- How much VRAM does NVIDIA RTX PRO 6000 Blackwell have?
- The NVIDIA RTX PRO 6000 Blackwell has 96 GB of GDDR7 VRAM with 1800 GB/s bandwidth.
- Can NVIDIA RTX PRO 6000 Blackwell run 70B models?
- Yes. The NVIDIA RTX PRO 6000 Blackwell can run 70B parameter models in memory at quantized quality.
- Is NVIDIA RTX PRO 6000 Blackwell worth it for AI?
- At $7,500, the NVIDIA RTX PRO 6000 Blackwell offers 96 GB GDDR7 VRAM and runs 63 AI models. It handles local AI inference well.
Own this GPU?
See every AI model it supports, expected performance, and how to build around it.