What AI models can NVIDIA Grace Blackwell Ultra GB300 run?

The NVIDIA Grace Blackwell Ultra GB300 can run 64 AI models. Top performers include all-MiniLM-L6-v2, nomic-embed-text v1.5, Arcee Trinity Nano 6B. See the full compatibility table above for speeds and quality ratings.

Is NVIDIA Grace Blackwell Ultra GB300 good for AI coding?

Yes. With 288 GB, the NVIDIA Grace Blackwell Ultra GB300 supports the Full AI Builder tier: concurrent coding + reasoning + embeddings.

How much VRAM does NVIDIA Grace Blackwell Ultra GB300 have?

The NVIDIA Grace Blackwell Ultra GB300 has 288 GB of HBM3e VRAM with 8000 GB/s bandwidth.

Can NVIDIA Grace Blackwell Ultra GB300 run 70B models?

Yes. The NVIDIA Grace Blackwell Ultra GB300 can run 70B parameter models in memory at quantized quality.

Is NVIDIA Grace Blackwell Ultra GB300 worth it for AI?

At $30,000, the NVIDIA Grace Blackwell Ultra GB300 offers 288 GB HBM3e VRAM and runs 64 AI models. It handles local AI inference well.

Desktop GPU

NVIDIA Grace Blackwell Ultra GB300

288 GB HBM3e · 8000 GB/s

From

$30,000

Estimated street price

VRAM

288 GB

Bandwidth

8000 GB/s

TDP

1200W

Models

Tier

Datacenter-Class

The NVIDIA Grace Blackwell Ultra GB300 with 288 GB HBM3e VRAM can handle 64 AI models across embedding, ai_building, coding. Best performance: all-MiniLM-L6-v2 at 3000 tok/s (excellent). For AI coding workflows, it supports the Full AI Builder tier, supporting concurrent coding + reasoning + embeddings. Current price: approximately $30,000.

Source: OwnRig methodology

VRAM

288 GB

Bandwidth

8000 GB/s

Memory Type

HBM3e

TDP

1200W

Form Factor

Integrated module

Builder Capability: Datacenter-Class AI Workstation

Runs very large models at high precision with room for long context windows. Best suited to Linux-first, DGX-style professional deployments rather than a typical consumer PC build.

Software

Inference Backends

The software stacks that matter most for real-world inference on this device.

CUDA

production

Primary datacenter inference backend for NVIDIA's GB300 platform.

What it can run

64 models


all-MiniLM-L6-v2	FP16	3000 tok/s	Excellent
Arcee Trinity Large Thinking 400B	Q4_K_M	41 tok/s	Excellent
Arcee Trinity Mini 26B	Q8_0	332 tok/s	Excellent
Arcee Trinity Nano 6B	Q8_0	1411 tok/s	Excellent
Code Llama 34B Instruct	Q5_K_M	135 tok/s	Excellent
Codestral 22B	Q5_K_M	180 tok/s	Excellent
Command R 35B	Q8_0	110 tok/s	Excellent
DeepSeek Coder V2 Lite 16B	Q8_0	210 tok/s	Excellent
DeepSeek R1	Q4_K_M	20 tok/s	Good
DeepSeek R1 Distill Qwen 32B	Q8_0	120 tok/s	Excellent
DeepSeek R1 Distill Qwen 7B	Q8_0	360 tok/s	Excellent
DeepSeek V3	Q4_K_M	22 tok/s	Good
FLUX.1 Dev	FP16	15 tok/s	Excellent
Gemma 2 27B Instruct	Q5_K_M	145 tok/s	Excellent
Gemma 2 9B Instruct	Q8_0	320 tok/s	Excellent
Gemma 3 12B	Q8_0	250 tok/s	Excellent
Gemma 3 27B	Q8_0	130 tok/s	Excellent
Gemma 3 4B	Q8_0	500 tok/s	Excellent
Gemma 4 26B-A4B	Q8_0	500 tok/s	Excellent
Gemma 4 31B	Q8_0	183 tok/s	Excellent
Gemma 4 E2B	Q8_0	500 tok/s	Excellent
Gemma 4 E4B	Q8_0	500 tok/s	Excellent
GigaChat Lightning 10B	Q8_0	320 tok/s	Excellent
InternLM 2.5 7B Chat	Q8_0	350 tok/s	Excellent
Llama 3.1 70B Instruct	Q5_K_M	65 tok/s	Excellent
Llama 3.1 8B Instruct	Q8_0	350 tok/s	Excellent
Llama 3.2 11B Vision	Q8_0	260 tok/s	Excellent
Llama 3.2 1B Instruct	Q8_0	800 tok/s	Excellent
Llama 3.2 3B Instruct	Q8_0	650 tok/s	Excellent
Llama 3.3 70B Instruct	Q8_0	55 tok/s	Excellent
Llama 4 Scout	Q8_0	40 tok/s	Excellent
LLaVA 1.6 13B	Q5_K_M	270 tok/s	Excellent
Mistral 7B Instruct v0.3	Q8_0	380 tok/s	Excellent
Mistral Large 2 123B	Q8_0	30 tok/s	Good
Mistral Small 24B Instruct	Q8_0	150 tok/s	Excellent
Mixtral 8x7B Instruct	Q5_K_M	100 tok/s	Excellent
nomic-embed-text v1.5	FP16	2000 tok/s	Excellent
NVIDIA Nemotron-3-super-120B-A12B	Q4_K_M	180 tok/s	Excellent
Phi-3 Medium 14B Instruct	Q8_0	230 tok/s	Excellent
Phi-3 Mini 3.8B Instruct	Q8_0	550 tok/s	Excellent
Phi-4 14B	Q8_0	220 tok/s	Excellent
Phi-4 Mini	Q8_0	580 tok/s	Excellent
Qwen 2.5 14B Instruct	Q8_0	220 tok/s	Excellent
Qwen 2.5 72B Instruct	Q4_K_M	60 tok/s	Excellent
Qwen 2.5 7B Instruct	Q8_0	360 tok/s	Excellent
Qwen 2.5 Coder 32B Instruct	Q5_K_M	140 tok/s	Excellent
Qwen 2.5 Coder 7B Instruct	Q8_0	360 tok/s	Excellent
Qwen3-14B Instruct	Q8_0	230 tok/s	Excellent
Qwen3-30B-A3B	Q8_0	145 tok/s	Excellent
Qwen3-32B Instruct	Q8_0	120 tok/s	Excellent
Qwen3-8B Instruct	Q8_0	340 tok/s	Excellent
Qwen3.5-122B-A10B	Q8_0	200 tok/s	Excellent
Qwen3.5-27B	Q8_0	150 tok/s	Excellent
Qwen3.5-397B (MoE)	Q4_K_M	120 tok/s	Excellent
Qwen3.6-27B	Q8_0	150 tok/s	Excellent
Qwen3.6-35B-A3B	Q5_K_M	145 tok/s	Excellent
QwQ 32B Preview	Q5_K_M	140 tok/s	Excellent
Stable Diffusion 3 Medium	FP16	20 tok/s	Excellent
Stable Diffusion 3.5 Large	FP16	12 tok/s	Excellent
Stable Diffusion XL 1.0	FP16	18 tok/s	Excellent
StarCoder 2 15B	Q8_0	210 tok/s	Excellent
Whisper Large V3	FP16	450 tok/s	Excellent
Whisper Large V3 Turbo	FP16	600 tok/s	Excellent
Yi 1.5 34B Chat	Q8_0	110 tok/s	Excellent

Showing 64 of 64 entries

Ready to Buy

Available in these Machines

Desktop

Dell Pro Max Desktop (NVIDIA GB300)

$30,000

Buy Used

Prices and availability vary. Inspect hardware before purchasing. Some links may be affiliate links.

eBay Marketplace r/hardwareswap

FAQ

Frequently Asked Questions

What AI models can NVIDIA Grace Blackwell Ultra GB300 run?: The NVIDIA Grace Blackwell Ultra GB300 can run 64 AI models. Top performers include all-MiniLM-L6-v2, nomic-embed-text v1.5, Arcee Trinity Nano 6B. See the full compatibility table above for speeds and quality ratings.
Is NVIDIA Grace Blackwell Ultra GB300 good for AI coding?: Yes. With 288 GB, the NVIDIA Grace Blackwell Ultra GB300 supports the Full AI Builder tier: concurrent coding + reasoning + embeddings.
How much VRAM does NVIDIA Grace Blackwell Ultra GB300 have?: The NVIDIA Grace Blackwell Ultra GB300 has 288 GB of HBM3e VRAM with 8000 GB/s bandwidth.
Can NVIDIA Grace Blackwell Ultra GB300 run 70B models?: Yes. The NVIDIA Grace Blackwell Ultra GB300 can run 70B parameter models in memory at quantized quality.
Is NVIDIA Grace Blackwell Ultra GB300 worth it for AI?: At $30,000, the NVIDIA Grace Blackwell Ultra GB300 offers 288 GB HBM3e VRAM and runs 64 AI models. It handles local AI inference well.

Own this GPU?

See every AI model it supports, expected performance, and how to build around it.

Check my rig

All GPUs