NVIDIA GeForce RTX 4090
24 GB GDDR6X Β· 1008 GB/s
From
$1,799
Estimated street price
VRAM
24 GB
Bandwidth
1008 GB/s
TDP
450W
Models
59
Tier
Power
The NVIDIA GeForce RTX 4090 with 24 GB GDDR6X VRAM can handle 58 AI models across embedding, ai_building, coding. Best performance: Llama 3.2 1B Instruct at 250 tok/s (excellent). For AI coding workflows, it supports the Power AI Coding tier, running 32B coding models at good quality. Current price: approximately $1,799.
Source: OwnRig methodology
24 GB
1008 GB/s
GDDR6X
450W
3-slot, 336mm
Builder Capability: Power AI Coding
Runs 32B coding models at good quality. Can handle coding model + embeddings concurrently.
Inference Backends
The software stacks that matter most for real-world inference on this device.
CUDA
productionPrimary high-performance backend for NVIDIA inference workloads.
Vulkan
stableFallback backend for llama.cpp and related local runtimes.
What it can run
58 models| all-MiniLM-L6-v2 | FP16 | β | Excellent |
| Arcee Trinity Mini 26B | Q5_K_M | 62 tok/s | Excellent |
| Arcee Trinity Nano 6B | Q8_0 | 178 tok/s | Excellent |
| Code Llama 34B Instruct | Q4_K_M | 22 tok/s | Good |
| Codestral 22B | Q5_K_M | 35 tok/s | Excellent |
| DeepSeek Coder V2 Lite 16B | Q5_K_M | 55 tok/s | Excellent |
| DeepSeek R1 | Q2_K | 1 tok/s | Not viable |
| DeepSeek R1 Distill Qwen 32B | Q4_K_M | 24 tok/s | Good |
| DeepSeek R1 Distill Qwen 7B | Q4_K_M | 92 tok/s | Excellent |
| DeepSeek V3 | Q2_K | β | Not viable |
| FLUX.1 Dev | FP16 | β | Excellent |
| Gemma 2 27B Instruct | Q4_K_M | 22 tok/s | Good |
| Gemma 2 9B Instruct | Q8_0 | 80 tok/s | Excellent |
| Gemma 3 12B | Q5_K_M | 75 tok/s | Excellent |
| Gemma 3 27B | Q4_K_M | 22 tok/s | Good |
| Gemma 4 26B-A4B | Q5_K_M | 229 tok/s | Excellent |
| Gemma 4 31B | Q4_K_M | 38 tok/s | Good |
| Gemma 4 E2B | Q8_0 | 152 tok/s | Excellent |
| Gemma 4 E4B | Q8_0 | 94 tok/s | Excellent |
| GigaChat Lightning 10B | Q8_0 | 110 tok/s | Good |
| InternLM 2.5 7B Chat | Q8_0 | 88 tok/s | Excellent |
| Llama 3.1 70B Instruct | Q3_K_M | 5 tok/s | Marginal |
| Llama 3.1 8B Instruct | Q8_0 | 95 tok/s | Excellent |
| Llama 3.2 11B Vision | Q8_0 | 95 tok/s | Excellent |
| Llama 3.2 1B Instruct | Q8_0 | 250 tok/s | Excellent |
| Llama 3.2 3B Instruct | Q8_0 | 170 tok/s | Excellent |
| Llama 3.3 70B Instruct | Q3_K_M | 6 tok/s | Marginal |
| LLaVA 1.6 13B | Q5_K_M | 30 tok/s | Good |
| Mistral 7B Instruct v0.3 | Q8_0 | 90 tok/s | Excellent |
| Mistral Large 2 123B | Q2_K | 3 tok/s | Marginal |
| Mistral Small 24B Instruct | Q5_K_M | 32 tok/s | Good |
| Mixtral 8x7B Instruct | Q3_K_M | 35 tok/s | Good |
| nomic-embed-text v1.5 | FP16 | β | Excellent |
| NVIDIA Nemotron-3-super-120B-A12B | Q2_K | 18 tok/s | Marginal |
| Phi-3 Medium 14B Instruct | Q8_0 | 55 tok/s | Excellent |
| Phi-3 Mini 3.8B Instruct | Q8_0 | 130 tok/s | Excellent |
| Phi-4 14B | Q5_K_M | 58 tok/s | Excellent |
| Phi-4 Mini | Q8_0 | 160 tok/s | Excellent |
| Qwen 2.5 14B Instruct | Q5_K_M | 55 tok/s | Excellent |
| Qwen 2.5 7B Instruct | Q8_0 | 88 tok/s | Excellent |
| Qwen 2.5 Coder 32B Instruct | Q4_K_M | 25 tok/s | Good |
| Qwen 2.5 Coder 7B Instruct | Q8_0 | 90 tok/s | Excellent |
| Qwen3-14B Instruct | Q8_0 | 41 tok/s | Good |
| Qwen3-30B-A3B | Q5_K_M | 25 tok/s | Good |
| Qwen3-32B Instruct | Q5_K_M | 25 tok/s | Good |
| Qwen3-32B Instruct | Q4_K_M | 30 tok/s | Good |
| Qwen3-8B Instruct | Q8_0 | 83 tok/s | Excellent |
| Qwen3.5-122B-A10B | Q3_K_M | 19 tok/s | Marginal |
| Qwen3.5-27B | Q5_K_M | 40 tok/s | Good |
| Qwen3.5-397B (MoE) | Q2_K | β | Not viable |
| Qwen3.6-27B | Q5_K_M | 40 tok/s | Good |
| Qwen3.6-35B-A3B | Q4_K_M | 25 tok/s | Good |
| QwQ 32B Preview | Q4_K_M | 24 tok/s | Good |
| Stable Diffusion 3 Medium | FP16 | β | Excellent |
| Stable Diffusion 3.5 Large | FP16 | β | Excellent |
| Stable Diffusion XL 1.0 | FP16 | β | Excellent |
| StarCoder 2 15B | Q8_0 | 50 tok/s | Excellent |
| Whisper Large V3 | FP16 | β | Excellent |
| Whisper Large V3 Turbo | FP16 | β | Excellent |
Showing 59 of 59 entries
Available in these Machines
Featured in Builds
Buy Used
Prices and availability vary. Inspect hardware before purchasing. Some links may be affiliate links.
Frequently Asked Questions
- What AI models can NVIDIA GeForce RTX 4090 run?
- The NVIDIA GeForce RTX 4090 can run 58 AI models. Top performers include Llama 3.2 1B Instruct, Gemma 4 26B-A4B, Arcee Trinity Nano 6B. See the full compatibility table above for speeds and quality ratings.
- Is NVIDIA GeForce RTX 4090 good for AI coding?
- Yes. With 24 GB, the NVIDIA GeForce RTX 4090 supports the Power AI Coding tier: large coding models at good quality.
- How much VRAM does NVIDIA GeForce RTX 4090 have?
- The NVIDIA GeForce RTX 4090 has 24 GB of GDDR6X VRAM with 1008 GB/s bandwidth.
- Can NVIDIA GeForce RTX 4090 run 70B models?
- 70B models can run on the NVIDIA GeForce RTX 4090 with CPU offloading, but performance will be reduced. Consider a GPU with 48GB+ VRAM for full-speed 70B inference.
- Is NVIDIA GeForce RTX 4090 worth it for AI?
- At $1,799, the NVIDIA GeForce RTX 4090 offers 24 GB VRAM and runs 58 AI models. It handles local AI inference well.
Own this GPU?
See every AI model it supports, expected performance, and how to build around it.
Related Guides
Buying Guide
How to Choose Your First AI GPU
A data-backed buying guide to choosing the right GPU for running AI models locally. VRAM explained, budget tiers compared, and specific GPU recommendations with compatible models.
Tutorial
The Complete Guide to Running LLMs Locally
Run large language models locally: hardware needs, Ollama and llama.cpp, model picks by use case, and quantization.
Explainer
VRAM: The Only Spec That Matters for AI
VRAM for local AI: what it is, why models need it, how quantization cuts requirements, and a VRAM table for major models.
Roundup
Best AI Hardware for Developers in 2026
Best AI GPUs in 2026: RTX 4060 Ti to RTX 5090, Apple Silicon M4 Max. Picks by budget, use case, and dev workflow. Complete build specs included.
Explainer
Mac vs Windows for Local AI: A Beginner's Honest Take
No tribal wars: when Apple Silicon is the easy path, when a Windows desktop with an NVIDIA GPU wins, what unified memory means, and how to pick without drowning in forum fights.
Explainer
How we test: OwnRig's benchmark methodology
How OwnRig measures tokens per second, rates model compatibility, and keeps hardware data current. Our methodology, tools, and known limitations.
Tutorial
Running Gemma 4 locally: which GPU you actually need
Gemma 4 VRAM requirements for every variant: E2B, E4B, 26B-A4B, and 31B. Which GPUs can run each, what quantization to use, and the honest call on RTX 4060 vs RTX 4090.
Buying Guide
Best GPUs for Stable Diffusion, Flux, and SD3 in 2026
GPU requirements for SDXL, Stable Diffusion 3 Medium, SD 3.5 Large, and FLUX.1 Dev. Per-GPU performance verdicts for RTX 4060 Ti, RTX 4070, RTX 4090, and Apple Silicon.