NVIDIA
Desktop GPU
Desktop GPU

NVIDIA GeForce RTX 5090

32 GB GDDR7 Β· 1792 GB/s

From

$2,199

Estimated street price

VRAM

32 GB

Bandwidth

1792 GB/s

TDP

575W

Models

31

Tier

Power

The NVIDIA GeForce RTX 5090 with 32 GB GDDR7 VRAM can handle 31 AI models across reasoning, coding, ai_coding. Best performance: Arcee Trinity Nano 6B at 316 tok/s (excellent). For AI coding workflows, it supports the Power AI Coding tier, running 32B coding models at good quality. Current price: approximately $2,199.

Source: OwnRig methodology

VRAM

32 GB

Bandwidth

1792 GB/s

Memory Type

GDDR7

TDP

575W

Form Factor

3-slot, 340mm

Builder Capability: Power AI Coding

Runs 32B coding models at good quality. Can handle coding model + embeddings concurrently.

Software

Inference Backends

The software stacks that matter most for real-world inference on this device.

CUDA

production

Primary high-performance backend for NVIDIA inference workloads.

Vulkan

stable

Fallback backend for llama.cpp and related local runtimes.

What it can run

31 models
Arcee Trinity Mini 26BQ8_074 tok/sExcellent
Arcee Trinity Nano 6BQ8_0316 tok/sExcellent
DeepSeek R1Q2_K1 tok/sNot viable
DeepSeek R1 Distill Qwen 32BQ5_K_M42 tok/sExcellent
DeepSeek V3Q2_K–Not viable
Gemma 3 27BQ5_K_M35 tok/sExcellent
Gemma 4 26B-A4BQ8_0278 tok/sExcellent
Gemma 4 31BQ6_K50 tok/sGood
Gemma 4 E2BQ8_0270 tok/sExcellent
Gemma 4 E4BQ8_0167 tok/sExcellent
GigaChat Lightning 10BQ8_0143 tok/sGood
Llama 3.1 70B InstructQ4_K_M9 tok/sMarginal
Llama 3.1 8B InstructQ8_0170 tok/sExcellent
Llama 3.2 11B VisionQ8_0130 tok/sExcellent
Llama 3.2 1B InstructQ8_0300 tok/sExcellent
Llama 3.2 3B InstructQ8_0200 tok/sExcellent
Llama 3.3 70B InstructQ4_K_M8 tok/sMarginal
Mistral Large 2 123BQ3_K_M4 tok/sMarginal
Mistral Small 24B InstructQ5_K_M55 tok/sExcellent
NVIDIA Nemotron-3-super-120B-A12BQ2_K23 tok/sMarginal
Phi-4 MiniQ8_0185 tok/sExcellent
Qwen 2.5 Coder 32B InstructQ5_K_M45 tok/sExcellent
Qwen3-32B InstructQ4_K_M44 tok/sExcellent
Qwen3.5-122B-A10BQ3_K_M98 tok/sGood
Qwen3.5-27BQ8_039 tok/sGood
Qwen3.5-397B (MoE)Q2_K–Not viable
Qwen3.6-27BQ8_039 tok/sGood
Stable Diffusion 3 MediumFP16–Excellent
Stable Diffusion 3.5 LargeFP16–Excellent
Stable Diffusion XL 1.0FP16–Excellent
Whisper Large V3 TurboFP16–Excellent

Showing 31 of 31 entries

Curated Builds

Featured in Builds

Buy Used

Prices and availability vary. Inspect hardware before purchasing. Some links may be affiliate links.

FAQ

Frequently Asked Questions

What AI models can NVIDIA GeForce RTX 5090 run?
The NVIDIA GeForce RTX 5090 can run 31 AI models. Top performers include Arcee Trinity Nano 6B, Llama 3.2 1B Instruct, Gemma 4 26B-A4B. See the full compatibility table above for speeds and quality ratings.
Is NVIDIA GeForce RTX 5090 good for AI coding?
Yes. With 32 GB, the NVIDIA GeForce RTX 5090 supports the Power AI Coding tier: large coding models at good quality.
How much VRAM does NVIDIA GeForce RTX 5090 have?
The NVIDIA GeForce RTX 5090 has 32 GB of GDDR7 VRAM with 1792 GB/s bandwidth.
Can NVIDIA GeForce RTX 5090 run 70B models?
Yes. The NVIDIA GeForce RTX 5090 can run 70B parameter models in VRAM at quantized quality.
Is NVIDIA GeForce RTX 5090 worth it for AI?
At $2,199, the NVIDIA GeForce RTX 5090 offers 32 GB VRAM and runs 31 AI models. It handles local AI inference well.

Own this GPU?

See every AI model it supports, expected performance, and how to build around it.

Check my rig