
10 GB · 760 GB/s
$399
Updated 2026-03-15
The NVIDIA GeForce RTX 3080 10GB with 10 GB GDDR6X VRAM can handle 42 AI models across chat, coding, ai_coding. Best performance: all-MiniLM-L6-v2 at 5000 tok/s (excellent). Current price: approximately $399.
— OwnRig methodology, data updated 2026-03-15
Insufficient VRAM for most AI coding workflows.
| Model | Quant | Speed | Rating | Notes |
|---|---|---|---|---|
| Llama 3.2 1B Instruct | Q8_0 | 180 tok/s | Excellent | 760 GB/s bandwidth makes tiny models fly. Full Q8_0 quality with negligible VRAM. |
| Llama 3.2 3B Instruct | Q8_0 | 140 tok/s | Excellent | Very fast at 760 GB/s. Full quality with plenty of headroom. |
| Phi-4 Mini | Q8_0 | 120 tok/s | Excellent | 3.8B model benefits greatly from 760 GB/s. Excellent quality and speed. |
| Phi-3 Mini 3.8B Instruct | Q8_0 | 130 tok/s | Excellent | Very fast at 760 GB/s. Full quality with headroom. |
| nomic-embed-text v1.5 | Q8_0 | 2500 tok/s | Excellent | Tiny embedding model. Negligible VRAM, runs concurrently with any model. |
| all-MiniLM-L6-v2 | FP16 | 5000 tok/s | Excellent | 23M params. Practically free in VRAM. Ultra-fast embeddings. |
| Whisper Large V3 | Q5_K_M | — | Excellent | Transcription model. 1.5GB VRAM. Real-time capable at 760 GB/s. |
| Whisper Large V3 Turbo | FP16 | — | Excellent | Distilled Whisper. 1.6GB VRAM. Real-time transcription, very fast. |
| Llama 3.1 8B Instruct | Q5_K_M | 50 tok/s | Excellent | 760 GB/s bandwidth makes 8B models very fast despite tight 10GB VRAM. |
| Mistral 7B Instruct v0.3 | Q5_K_M | 48 tok/s | Excellent | Q5_K_M fits in 10GB. 760 GB/s delivers excellent throughput. |
| Qwen 2.5 7B Instruct | Q5_K_M | 52 tok/s | Excellent | Strong 7B model. Fits comfortably at Q5. Very fast at 760 GB/s. |
| Gemma 2 9B Instruct | Q4_K_M | 45 tok/s | Excellent | Q4_K_M fits in 10GB. 760 GB/s makes 9B feel snappy. |
| InternLM 2.5 7B Chat | Q5_K_M | 50 tok/s | Excellent | 7B fits at Q5. 760 GB/s delivers excellent speed. |
| DeepSeek R1 Distill Qwen 7B | Q5_K_M | 48 tok/s | Excellent | R1 distilled reasoning. Q5 fits in 10GB. Very fast inference. |
| Qwen 2.5 Coder 7B Instruct | Q5_K_M | 50 tok/s | Excellent | Coding model. Q5 fits. 760 GB/s makes code completion feel instant. |
| Gemma 3 4B | Q5_K_M | 75 tok/s | Excellent | 4B model. Plenty of headroom. Very fast at 760 GB/s. |
| Gemma 3 12B | Q3_K_M | 28 tok/s | Acceptable | Tight fit at Q3_K_M (~7.5GB). 760 GB/s helps despite compression. |
| Phi-3 Medium 14B Instruct | Q3_K_M | 32 tok/s | Acceptable | 14B at Q3_K_M fits in 10GB. Tight but usable. Bandwidth helps. |
| Phi-4 14B | Q3_K_M | 26 tok/s | Acceptable | 14B at Q3 fits. Tight VRAM. 760 GB/s keeps it usable. |
| Qwen 2.5 14B Instruct | Q3_K_M | 24 tok/s | Acceptable | Q3_K_M fits in 10GB. Tight. Bandwidth helps throughput. |
| StarCoder 2 15B | Q3_K_M | 22 tok/s | Acceptable | 15B at Q3 fits. Tight. Code completion usable. |
| DeepSeek Coder V2 Lite 16B | Q4_K_M | 55 tok/s | Excellent | MoE — only 2.4B active per token. Q4 9.1GB fits well. Excellent coding speed. |
| LLaVA 1.6 13B | Q3_K_M | 18 tok/s | Acceptable | Q4 needs ~8GB, Q3_K_M ~6.5GB fits. Multimodal vision+text. |
| Stable Diffusion XL 1.0 | FP16 | — | Excellent | ~6.5GB VRAM. ~8-12 sec per 1024x1024 at 30 steps. 760 GB/s helps. Excellent. |
| Stable Diffusion 3.5 Large | Q8_0 | — | Acceptable | Q8_0 9GB fits barely in 10GB. Tight. Good image quality. |
| Stable Diffusion 3 Medium | FP16 | — | Excellent | ~4GB VRAM. Plenty of headroom. Fast image generation. |
| FLUX.1 Dev | Q4_K_M | — | Not Viable | Q4 7.2GB technically fits but image gen overhead and quality tradeoffs make it not viable for 10GB. Offloading too slow. |
| Gemma 2 27B Instruct | Q3_K_M | — | Not Viable | Q3 10.3GB exceeds 10GB. 24B+ models don't fit. |
| Gemma 3 27B | Q3_K_M | — | Not Viable | Q3 13.3GB exceeds 10GB. Large models don't fit. |
| Codestral 22B | Q3_K_M | — | Not Viable | Q3 10.3GB exceeds 10GB with context. Doesn't fit. |
| Mistral Small 24B Instruct | Q2_K | — | Not Viable | 24B exceeds 10GB even at Q2. Doesn't fit. |
| Yi 1.5 34B Chat | Q2_K | — | Not Viable | 34B doesn't fit in 10GB. |
| Code Llama 34B Instruct | Q2_K | — | Not Viable | 34B doesn't fit in 10GB. |
| Command R 35B | Q2_K | — | Not Viable | 35B doesn't fit in 10GB. |
| QwQ 32B Preview | Q2_K | — | Not Viable | 32B doesn't fit in 10GB. |
| Qwen 2.5 Coder 32B Instruct | Q2_K | — | Not Viable | 32B doesn't fit in 10GB. |
| DeepSeek R1 Distill Qwen 32B | Q2_K | — | Not Viable | 32B doesn't fit in 10GB. |
| Llama 3.1 70B Instruct | Q2_K | — | Not Viable | 70B doesn't fit in 10GB. |
| Llama 3.3 70B Instruct | Q2_K | — | Not Viable | 70B doesn't fit in 10GB. |
| Qwen 2.5 72B Instruct | Q2_K | — | Not Viable | 72B doesn't fit in 10GB. |
| Mixtral 8x7B Instruct | Q2_K | — | Not Viable | MoE 46.7B total. Q2 16.4GB exceeds 10GB. Doesn't fit. |
| DeepSeek V3 | Q2_K | — | Not Viable | 671B MoE. Q2 115GB. Doesn't fit. Requires multi-GPU or 128GB+. |
Prices and availability vary. Inspect hardware before purchasing.
Generation: Ampere. Last updated: 2026-03-15.