NVIDIA GeForce RTX 4080 Super
16 GB GDDR6X Β· 736 GB/s
From
$979
Estimated street price
VRAM
16 GB
Bandwidth
736 GB/s
TDP
320W
Models
26
Tier
Capable
The NVIDIA GeForce RTX 4080 Super with 16 GB GDDR6X VRAM can handle 26 AI models across chat, coding, ai_coding. Best performance: Gemma 4 26B-A4B at 251 tok/s (excellent). For AI coding workflows, it supports the Capable AI Coding tier, handling single model workflows well. Current price: approximately $979.
Source: OwnRig methodology
16 GB
736 GB/s
GDDR6X
320W
3-slot, 304mm
Builder Capability: Capable AI Coding
Runs 16-22B coding models comfortably, or 32B at reduced quality. Handles single model workflows well.
Inference Backends
The software stacks that matter most for real-world inference on this device.
CUDA
productionPrimary high-performance backend for NVIDIA inference workloads.
Vulkan
stableFallback backend for llama.cpp and related local runtimes.
What it can run
26 models| Arcee Trinity Mini 26B | Q3_K_M | 70 tok/s | Excellent |
| Arcee Trinity Nano 6B | Q8_0 | 130 tok/s | Excellent |
| DeepSeek V3 | Q2_K | β | Not viable |
| Gemma 3 27B | Q3_K_M | 14 tok/s | Acceptable |
| Gemma 4 26B-A4B | Q3_K_M | 251 tok/s | Excellent |
| Gemma 4 31B | Q3_K_M | 16 tok/s | Acceptable |
| Gemma 4 E2B | Q8_0 | 111 tok/s | Excellent |
| Gemma 4 E4B | Q8_0 | 68 tok/s | Excellent |
| GigaChat Lightning 10B | Q8_0 | 82 tok/s | Acceptable |
| Llama 3.1 8B Instruct | Q8_0 | 82 tok/s | Excellent |
| Llama 3.2 11B Vision | Q8_0 | 68 tok/s | Excellent |
| Llama 3.2 1B Instruct | Q8_0 | 200 tok/s | Excellent |
| Llama 3.2 3B Instruct | Q8_0 | 140 tok/s | Excellent |
| Mistral 7B Instruct v0.3 | Q8_0 | 78 tok/s | Excellent |
| NVIDIA Nemotron-3-super-120B-A12B | Q2_K | β | Not viable |
| Phi-4 14B | Q5_K_M | 48 tok/s | Excellent |
| Phi-4 Mini | Q8_0 | 130 tok/s | Excellent |
| Qwen 2.5 Coder 32B Instruct | Q3_K_M | 18 tok/s | Acceptable |
| Qwen3-14B Instruct | Q8_0 | 34 tok/s | Good |
| Qwen3.5-122B-A10B | Q3_K_M | β | Not viable |
| Qwen3.5-27B | Q3_K_M | 38 tok/s | Acceptable |
| Qwen3.5-397B (MoE) | Q2_K | β | Not viable |
| Qwen3.6-27B | Q3_K_M | 38 tok/s | Acceptable |
| Stable Diffusion 3.5 Large | FP16 | β | Excellent |
| Stable Diffusion XL 1.0 | FP16 | β | Excellent |
| Whisper Large V3 Turbo | FP16 | β | Excellent |
Showing 26 of 26 entries
Buy Used
Prices and availability vary. Inspect hardware before purchasing. Some links may be affiliate links.
Frequently Asked Questions
- What AI models can NVIDIA GeForce RTX 4080 Super run?
- The NVIDIA GeForce RTX 4080 Super can run 26 AI models. Top performers include Gemma 4 26B-A4B, Llama 3.2 1B Instruct, Llama 3.2 3B Instruct. See the full compatibility table above for speeds and quality ratings.
- Is NVIDIA GeForce RTX 4080 Super good for AI coding?
- Yes. With 16 GB, the NVIDIA GeForce RTX 4080 Super handles single-model coding workflows well at the Capable tier.
- How much VRAM does NVIDIA GeForce RTX 4080 Super have?
- The NVIDIA GeForce RTX 4080 Super has 16 GB of GDDR6X VRAM with 736 GB/s bandwidth.
- Can NVIDIA GeForce RTX 4080 Super run 70B models?
- 70B models can run on the NVIDIA GeForce RTX 4080 Super with CPU offloading, but performance will be reduced. Consider a GPU with 48GB+ VRAM for full-speed 70B inference.
- Is NVIDIA GeForce RTX 4080 Super worth it for AI?
- At $979, the NVIDIA GeForce RTX 4080 Super offers 16 GB VRAM and runs 26 AI models. It works for smaller models and experimentation.
Own this GPU?
See every AI model it supports, expected performance, and how to build around it.