NVIDIA GeForce RTX 5090
32 GB GDDR7 Β· 1792 GB/s
From
$2,199
Estimated street price
VRAM
32 GB
Bandwidth
1792 GB/s
TDP
575W
Models
31
Tier
Power
The NVIDIA GeForce RTX 5090 with 32 GB GDDR7 VRAM can handle 31 AI models across reasoning, coding, ai_coding. Best performance: Arcee Trinity Nano 6B at 316 tok/s (excellent). For AI coding workflows, it supports the Power AI Coding tier, running 32B coding models at good quality. Current price: approximately $2,199.
Source: OwnRig methodology
32 GB
1792 GB/s
GDDR7
575W
3-slot, 340mm
Builder Capability: Power AI Coding
Runs 32B coding models at good quality. Can handle coding model + embeddings concurrently.
Inference Backends
The software stacks that matter most for real-world inference on this device.
CUDA
productionPrimary high-performance backend for NVIDIA inference workloads.
Vulkan
stableFallback backend for llama.cpp and related local runtimes.
What it can run
31 models| Arcee Trinity Mini 26B | Q8_0 | 74 tok/s | Excellent |
| Arcee Trinity Nano 6B | Q8_0 | 316 tok/s | Excellent |
| DeepSeek R1 | Q2_K | 1 tok/s | Not viable |
| DeepSeek R1 Distill Qwen 32B | Q5_K_M | 42 tok/s | Excellent |
| DeepSeek V3 | Q2_K | β | Not viable |
| Gemma 3 27B | Q5_K_M | 35 tok/s | Excellent |
| Gemma 4 26B-A4B | Q8_0 | 278 tok/s | Excellent |
| Gemma 4 31B | Q6_K | 50 tok/s | Good |
| Gemma 4 E2B | Q8_0 | 270 tok/s | Excellent |
| Gemma 4 E4B | Q8_0 | 167 tok/s | Excellent |
| GigaChat Lightning 10B | Q8_0 | 143 tok/s | Good |
| Llama 3.1 70B Instruct | Q4_K_M | 9 tok/s | Marginal |
| Llama 3.1 8B Instruct | Q8_0 | 170 tok/s | Excellent |
| Llama 3.2 11B Vision | Q8_0 | 130 tok/s | Excellent |
| Llama 3.2 1B Instruct | Q8_0 | 300 tok/s | Excellent |
| Llama 3.2 3B Instruct | Q8_0 | 200 tok/s | Excellent |
| Llama 3.3 70B Instruct | Q4_K_M | 8 tok/s | Marginal |
| Mistral Large 2 123B | Q3_K_M | 4 tok/s | Marginal |
| Mistral Small 24B Instruct | Q5_K_M | 55 tok/s | Excellent |
| NVIDIA Nemotron-3-super-120B-A12B | Q2_K | 23 tok/s | Marginal |
| Phi-4 Mini | Q8_0 | 185 tok/s | Excellent |
| Qwen 2.5 Coder 32B Instruct | Q5_K_M | 45 tok/s | Excellent |
| Qwen3-32B Instruct | Q4_K_M | 44 tok/s | Excellent |
| Qwen3.5-122B-A10B | Q3_K_M | 98 tok/s | Good |
| Qwen3.5-27B | Q8_0 | 39 tok/s | Good |
| Qwen3.5-397B (MoE) | Q2_K | β | Not viable |
| Qwen3.6-27B | Q8_0 | 39 tok/s | Good |
| Stable Diffusion 3 Medium | FP16 | β | Excellent |
| Stable Diffusion 3.5 Large | FP16 | β | Excellent |
| Stable Diffusion XL 1.0 | FP16 | β | Excellent |
| Whisper Large V3 Turbo | FP16 | β | Excellent |
Showing 31 of 31 entries
Featured in Builds
Buy Used
Prices and availability vary. Inspect hardware before purchasing. Some links may be affiliate links.
Frequently Asked Questions
- What AI models can NVIDIA GeForce RTX 5090 run?
- The NVIDIA GeForce RTX 5090 can run 31 AI models. Top performers include Arcee Trinity Nano 6B, Llama 3.2 1B Instruct, Gemma 4 26B-A4B. See the full compatibility table above for speeds and quality ratings.
- Is NVIDIA GeForce RTX 5090 good for AI coding?
- Yes. With 32 GB, the NVIDIA GeForce RTX 5090 supports the Power AI Coding tier: large coding models at good quality.
- How much VRAM does NVIDIA GeForce RTX 5090 have?
- The NVIDIA GeForce RTX 5090 has 32 GB of GDDR7 VRAM with 1792 GB/s bandwidth.
- Can NVIDIA GeForce RTX 5090 run 70B models?
- Yes. The NVIDIA GeForce RTX 5090 can run 70B parameter models in VRAM at quantized quality.
- Is NVIDIA GeForce RTX 5090 worth it for AI?
- At $2,199, the NVIDIA GeForce RTX 5090 offers 32 GB VRAM and runs 31 AI models. It handles local AI inference well.
Own this GPU?
See every AI model it supports, expected performance, and how to build around it.
Related Guides
Buying Guide
How to Choose Your First AI GPU
A data-backed buying guide to choosing the right GPU for running AI models locally. VRAM explained, budget tiers compared, and specific GPU recommendations with compatible models.
Tutorial
The Complete Guide to Running LLMs Locally
Run large language models locally: hardware needs, Ollama and llama.cpp, model picks by use case, and quantization.
Explainer
VRAM: The Only Spec That Matters for AI
VRAM for local AI: what it is, why models need it, how quantization cuts requirements, and a VRAM table for major models.
Roundup
Best AI Hardware for Developers in 2026
Best AI GPUs in 2026: RTX 4060 Ti to RTX 5090, Apple Silicon M4 Max. Picks by budget, use case, and dev workflow. Complete build specs included.
Explainer
How we test: OwnRig's benchmark methodology
How OwnRig measures tokens per second, rates model compatibility, and keeps hardware data current. Our methodology, tools, and known limitations.
Buying Guide
Best GPUs for Stable Diffusion, Flux, and SD3 in 2026
GPU requirements for SDXL, Stable Diffusion 3 Medium, SD 3.5 Large, and FLUX.1 Dev. Per-GPU performance verdicts for RTX 4060 Ti, RTX 4070, RTX 4090, and Apple Silicon.