NVIDIA GeForce RTX 3080 10GB
10 GB GDDR6X Β· 760 GB/s
From
$399
Estimated street price
VRAM
10 GB
Bandwidth
760 GB/s
TDP
320W
Models
53
Tier
Limited
The NVIDIA GeForce RTX 3080 10GB with 10 GB GDDR6X VRAM can handle 53 AI models across embedding, ai_building, coding. Best performance: all-MiniLM-L6-v2 at 5000 tok/s (excellent). Current price: approximately $399.
Source: OwnRig methodology
10 GB
760 GB/s
GDDR6X
320W
3-slot, 285mm
Builder Capability: Limited
Insufficient VRAM for most AI coding workflows.
Inference Backends
The software stacks that matter most for real-world inference on this device.
CUDA
productionPrimary high-performance backend for NVIDIA inference workloads.
Vulkan
stableFallback backend for llama.cpp and related local runtimes.
What it can run
53 models| all-MiniLM-L6-v2 | FP16 | 5000 tok/s | Excellent |
| Arcee Trinity Mini 26B | Q3_K_M | 11 tok/s | Not viable |
| Arcee Trinity Nano 6B | Q8_0 | 134 tok/s | Excellent |
| Code Llama 34B Instruct | Q2_K | β | Not viable |
| Codestral 22B | Q3_K_M | β | Not viable |
| Command R 35B | Q2_K | β | Not viable |
| DeepSeek Coder V2 Lite 16B | Q4_K_M | 55 tok/s | Excellent |
| DeepSeek R1 Distill Qwen 32B | Q2_K | β | Not viable |
| DeepSeek R1 Distill Qwen 7B | Q5_K_M | 48 tok/s | Excellent |
| DeepSeek V3 | Q2_K | β | Not viable |
| FLUX.1 Dev | Q4_K_M | β | Not viable |
| Gemma 2 27B Instruct | Q3_K_M | β | Not viable |
| Gemma 2 9B Instruct | Q4_K_M | 45 tok/s | Excellent |
| Gemma 3 12B | Q3_K_M | 28 tok/s | Acceptable |
| Gemma 3 27B | Q3_K_M | β | Not viable |
| Gemma 3 4B | Q5_K_M | 75 tok/s | Excellent |
| Gemma 4 E2B | Q8_0 | 114 tok/s | Excellent |
| Gemma 4 E4B | Q8_0 | 70 tok/s | Excellent |
| GigaChat Lightning 10B | Q4_K_M | 72 tok/s | Acceptable |
| InternLM 2.5 7B Chat | Q5_K_M | 50 tok/s | Excellent |
| Llama 3.1 70B Instruct | Q2_K | β | Not viable |
| Llama 3.1 8B Instruct | Q5_K_M | 50 tok/s | Excellent |
| Llama 3.2 1B Instruct | Q8_0 | 180 tok/s | Excellent |
| Llama 3.2 3B Instruct | Q8_0 | 140 tok/s | Excellent |
| Llama 3.3 70B Instruct | Q2_K | β | Not viable |
| LLaVA 1.6 13B | Q3_K_M | 18 tok/s | Acceptable |
| Mistral 7B Instruct v0.3 | Q5_K_M | 48 tok/s | Excellent |
| Mistral Small 24B Instruct | Q2_K | β | Not viable |
| Mixtral 8x7B Instruct | Q2_K | β | Not viable |
| nomic-embed-text v1.5 | Q8_0 | 2500 tok/s | Excellent |
| NVIDIA Nemotron-3-super-120B-A12B | Q2_K | β | Not viable |
| Phi-3 Medium 14B Instruct | Q3_K_M | 32 tok/s | Acceptable |
| Phi-3 Mini 3.8B Instruct | Q8_0 | 130 tok/s | Excellent |
| Phi-4 14B | Q3_K_M | 26 tok/s | Acceptable |
| Phi-4 Mini | Q8_0 | 120 tok/s | Excellent |
| Qwen 2.5 14B Instruct | Q3_K_M | 24 tok/s | Acceptable |
| Qwen 2.5 72B Instruct | Q2_K | β | Not viable |
| Qwen 2.5 7B Instruct | Q5_K_M | 52 tok/s | Excellent |
| Qwen 2.5 Coder 32B Instruct | Q2_K | β | Not viable |
| Qwen 2.5 Coder 7B Instruct | Q5_K_M | 50 tok/s | Excellent |
| Qwen3-14B Instruct | Q4_K_M | 21 tok/s | Acceptable |
| Qwen3-8B Instruct | Q8_0 | 32 tok/s | Good |
| Qwen3.5-27B | Q3_K_M | 7 tok/s | Not viable |
| Qwen3.5-397B (MoE) | Q2_K | β | Not viable |
| Qwen3.6-27B | Q3_K_M | β | Not viable |
| QwQ 32B Preview | Q2_K | β | Not viable |
| Stable Diffusion 3 Medium | FP16 | β | Excellent |
| Stable Diffusion 3.5 Large | Q8_0 | β | Acceptable |
| Stable Diffusion XL 1.0 | FP16 | β | Excellent |
| StarCoder 2 15B | Q3_K_M | 22 tok/s | Acceptable |
| Whisper Large V3 | Q5_K_M | β | Excellent |
| Whisper Large V3 Turbo | FP16 | β | Excellent |
| Yi 1.5 34B Chat | Q2_K | β | Not viable |
Showing 53 of 53 entries
Buy Used
Prices and availability vary. Inspect hardware before purchasing. Some links may be affiliate links.
Frequently Asked Questions
- What AI models can NVIDIA GeForce RTX 3080 10GB run?
- The NVIDIA GeForce RTX 3080 10GB can run 53 AI models. Top performers include all-MiniLM-L6-v2, nomic-embed-text v1.5, Llama 3.2 1B Instruct. See the full compatibility table above for speeds and quality ratings.
- Is NVIDIA GeForce RTX 3080 10GB good for AI coding?
- With 10 GB, the NVIDIA GeForce RTX 3080 10GB has limited VRAM for AI coding workflows.
- How much VRAM does NVIDIA GeForce RTX 3080 10GB have?
- The NVIDIA GeForce RTX 3080 10GB has 10 GB of GDDR6X VRAM with 760 GB/s bandwidth.
- Can NVIDIA GeForce RTX 3080 10GB run 70B models?
- 70B models can run on the NVIDIA GeForce RTX 3080 10GB with CPU offloading, but performance will be reduced. Consider a GPU with 48GB+ VRAM for full-speed 70B inference.
- Is NVIDIA GeForce RTX 3080 10GB worth it for AI?
- At $399, the NVIDIA GeForce RTX 3080 10GB offers 10 GB VRAM and runs 53 AI models. It works for smaller models and experimentation.
Own this GPU?
See every AI model it supports, expected performance, and how to build around it.