NVIDIA Grace Blackwell Ultra GB300
288 GB HBM3e Β· 8000 GB/s
From
$30,000
Estimated street price
VRAM
288 GB
Bandwidth
8000 GB/s
TDP
1200W
Models
64
Tier
Datacenter-Class
The NVIDIA Grace Blackwell Ultra GB300 with 288 GB HBM3e VRAM can handle 64 AI models across embedding, ai_building, coding. Best performance: all-MiniLM-L6-v2 at 3000 tok/s (excellent). For AI coding workflows, it supports the Full AI Builder tier, supporting concurrent coding + reasoning + embeddings. Current price: approximately $30,000.
Source: OwnRig methodology
288 GB
8000 GB/s
HBM3e
1200W
Integrated module
Builder Capability: Datacenter-Class AI Workstation
Runs very large models at high precision with room for long context windows. Best suited to Linux-first, DGX-style professional deployments rather than a typical consumer PC build.
Inference Backends
The software stacks that matter most for real-world inference on this device.
CUDA
productionPrimary datacenter inference backend for NVIDIA's GB300 platform.
What it can run
64 models| all-MiniLM-L6-v2 | FP16 | 3000 tok/s | Excellent |
| Arcee Trinity Large Thinking 400B | Q4_K_M | 41 tok/s | Excellent |
| Arcee Trinity Mini 26B | Q8_0 | 332 tok/s | Excellent |
| Arcee Trinity Nano 6B | Q8_0 | 1411 tok/s | Excellent |
| Code Llama 34B Instruct | Q5_K_M | 135 tok/s | Excellent |
| Codestral 22B | Q5_K_M | 180 tok/s | Excellent |
| Command R 35B | Q8_0 | 110 tok/s | Excellent |
| DeepSeek Coder V2 Lite 16B | Q8_0 | 210 tok/s | Excellent |
| DeepSeek R1 | Q4_K_M | 20 tok/s | Good |
| DeepSeek R1 Distill Qwen 32B | Q8_0 | 120 tok/s | Excellent |
| DeepSeek R1 Distill Qwen 7B | Q8_0 | 360 tok/s | Excellent |
| DeepSeek V3 | Q4_K_M | 22 tok/s | Good |
| FLUX.1 Dev | FP16 | 15 tok/s | Excellent |
| Gemma 2 27B Instruct | Q5_K_M | 145 tok/s | Excellent |
| Gemma 2 9B Instruct | Q8_0 | 320 tok/s | Excellent |
| Gemma 3 12B | Q8_0 | 250 tok/s | Excellent |
| Gemma 3 27B | Q8_0 | 130 tok/s | Excellent |
| Gemma 3 4B | Q8_0 | 500 tok/s | Excellent |
| Gemma 4 26B-A4B | Q8_0 | 500 tok/s | Excellent |
| Gemma 4 31B | Q8_0 | 183 tok/s | Excellent |
| Gemma 4 E2B | Q8_0 | 500 tok/s | Excellent |
| Gemma 4 E4B | Q8_0 | 500 tok/s | Excellent |
| GigaChat Lightning 10B | Q8_0 | 320 tok/s | Excellent |
| InternLM 2.5 7B Chat | Q8_0 | 350 tok/s | Excellent |
| Llama 3.1 70B Instruct | Q5_K_M | 65 tok/s | Excellent |
| Llama 3.1 8B Instruct | Q8_0 | 350 tok/s | Excellent |
| Llama 3.2 11B Vision | Q8_0 | 260 tok/s | Excellent |
| Llama 3.2 1B Instruct | Q8_0 | 800 tok/s | Excellent |
| Llama 3.2 3B Instruct | Q8_0 | 650 tok/s | Excellent |
| Llama 3.3 70B Instruct | Q8_0 | 55 tok/s | Excellent |
| Llama 4 Scout | Q8_0 | 40 tok/s | Excellent |
| LLaVA 1.6 13B | Q5_K_M | 270 tok/s | Excellent |
| Mistral 7B Instruct v0.3 | Q8_0 | 380 tok/s | Excellent |
| Mistral Large 2 123B | Q8_0 | 30 tok/s | Good |
| Mistral Small 24B Instruct | Q8_0 | 150 tok/s | Excellent |
| Mixtral 8x7B Instruct | Q5_K_M | 100 tok/s | Excellent |
| nomic-embed-text v1.5 | FP16 | 2000 tok/s | Excellent |
| NVIDIA Nemotron-3-super-120B-A12B | Q4_K_M | 180 tok/s | Excellent |
| Phi-3 Medium 14B Instruct | Q8_0 | 230 tok/s | Excellent |
| Phi-3 Mini 3.8B Instruct | Q8_0 | 550 tok/s | Excellent |
| Phi-4 14B | Q8_0 | 220 tok/s | Excellent |
| Phi-4 Mini | Q8_0 | 580 tok/s | Excellent |
| Qwen 2.5 14B Instruct | Q8_0 | 220 tok/s | Excellent |
| Qwen 2.5 72B Instruct | Q4_K_M | 60 tok/s | Excellent |
| Qwen 2.5 7B Instruct | Q8_0 | 360 tok/s | Excellent |
| Qwen 2.5 Coder 32B Instruct | Q5_K_M | 140 tok/s | Excellent |
| Qwen 2.5 Coder 7B Instruct | Q8_0 | 360 tok/s | Excellent |
| Qwen3-14B Instruct | Q8_0 | 230 tok/s | Excellent |
| Qwen3-30B-A3B | Q8_0 | 145 tok/s | Excellent |
| Qwen3-32B Instruct | Q8_0 | 120 tok/s | Excellent |
| Qwen3-8B Instruct | Q8_0 | 340 tok/s | Excellent |
| Qwen3.5-122B-A10B | Q8_0 | 200 tok/s | Excellent |
| Qwen3.5-27B | Q8_0 | 150 tok/s | Excellent |
| Qwen3.5-397B (MoE) | Q4_K_M | 120 tok/s | Excellent |
| Qwen3.6-27B | Q8_0 | 150 tok/s | Excellent |
| Qwen3.6-35B-A3B | Q5_K_M | 145 tok/s | Excellent |
| QwQ 32B Preview | Q5_K_M | 140 tok/s | Excellent |
| Stable Diffusion 3 Medium | FP16 | 20 tok/s | Excellent |
| Stable Diffusion 3.5 Large | FP16 | 12 tok/s | Excellent |
| Stable Diffusion XL 1.0 | FP16 | 18 tok/s | Excellent |
| StarCoder 2 15B | Q8_0 | 210 tok/s | Excellent |
| Whisper Large V3 | FP16 | 450 tok/s | Excellent |
| Whisper Large V3 Turbo | FP16 | 600 tok/s | Excellent |
| Yi 1.5 34B Chat | Q8_0 | 110 tok/s | Excellent |
Showing 64 of 64 entries
Available in these Machines
Buy Used
Prices and availability vary. Inspect hardware before purchasing. Some links may be affiliate links.
Frequently Asked Questions
- What AI models can NVIDIA Grace Blackwell Ultra GB300 run?
- The NVIDIA Grace Blackwell Ultra GB300 can run 64 AI models. Top performers include all-MiniLM-L6-v2, nomic-embed-text v1.5, Arcee Trinity Nano 6B. See the full compatibility table above for speeds and quality ratings.
- Is NVIDIA Grace Blackwell Ultra GB300 good for AI coding?
- Yes. With 288 GB, the NVIDIA Grace Blackwell Ultra GB300 supports the Full AI Builder tier: concurrent coding + reasoning + embeddings.
- How much VRAM does NVIDIA Grace Blackwell Ultra GB300 have?
- The NVIDIA Grace Blackwell Ultra GB300 has 288 GB of HBM3e VRAM with 8000 GB/s bandwidth.
- Can NVIDIA Grace Blackwell Ultra GB300 run 70B models?
- Yes. The NVIDIA Grace Blackwell Ultra GB300 can run 70B parameter models in memory at quantized quality.
- Is NVIDIA Grace Blackwell Ultra GB300 worth it for AI?
- At $30,000, the NVIDIA Grace Blackwell Ultra GB300 offers 288 GB HBM3e VRAM and runs 64 AI models. It handles local AI inference well.
Own this GPU?
See every AI model it supports, expected performance, and how to build around it.