8 GB GDDR7 Β· 448 GB/s
From
$299
Estimated street price
VRAM
8 GB
Bandwidth
448 GB/s
TDP
145W
Models
52
Tier
Limited
The NVIDIA GeForce RTX 5060 8GB with 8 GB GDDR7 VRAM can handle 52 AI models across embedding, ai_building, coding. Best performance: all-MiniLM-L6-v2 at 9775 tok/s (excellent). Current price: approximately $299.
Source: OwnRig methodology
8 GB
448 GB/s
GDDR7
145W
2-slot, 241mm
Insufficient VRAM for most AI coding workflows.
The software stacks that matter most for real-world inference on this device.
CUDA
productionPrimary high-performance backend for NVIDIA inference workloads.
Vulkan
stableFallback backend for llama.cpp and related local runtimes.
| all-MiniLM-L6-v2 | FP16 | 9775 tok/s | Excellent |
| Arcee Trinity Nano 6B | Q8_0 | 55 tok/s | Excellent |
| Code Llama 34B Instruct | Q2_K | β | Not viable |
| Codestral 22B | Q3_K_M | β | Not viable |
| Command R 35B | Q2_K | β | Not viable |
| DeepSeek Coder V2 Lite 16B | Q3_K_M | 52 tok/s | Good |
| DeepSeek R1 Distill Qwen 32B | Q2_K | β | Not viable |
| DeepSeek R1 Distill Qwen 7B | Q4_K_M | 37 tok/s | Good |
| DeepSeek V3 | Q2_K | β | Not viable |
| FLUX.1 Dev | Q4_K_M | β | Marginal |
| Gemma 2 27B Instruct | Q3_K_M | β | Not viable |
| Gemma 2 9B Instruct | Q4_K_M | 32 tok/s | Good |
| Gemma 3 12B | Q3_K_M | 21 tok/s | Marginal |
| Gemma 3 27B | Q3_K_M | β | Not viable |
| Gemma 3 4B | Q5_K_M | 63 tok/s | Excellent |
| Gemma 4 E2B | Q8_0 | 47 tok/s | Good |
| Gemma 4 E4B | Q6_K | 37 tok/s | Good |
| GigaChat Lightning 10B | Q4_K_M | 74 tok/s | Acceptable |
| InternLM 2.5 7B Chat | Q4_K_M | 35 tok/s | Good |
| Llama 3.1 70B Instruct | Q2_K | β | Not viable |
| Llama 3.1 8B Instruct | Q4_K_M | 37 tok/s | Good |
| Llama 3.2 1B Instruct | Q8_0 | 109 tok/s | Excellent |
| Llama 3.2 3B Instruct | Q8_0 | 75 tok/s | Excellent |
| Llama 3.3 70B Instruct | Q2_K | β | Not viable |
| LLaVA 1.6 13B | Q3_K_M | 25 tok/s | Marginal |
| Mistral 7B Instruct v0.3 | Q4_K_M | 36 tok/s | Good |
| Mistral Small 24B Instruct | Q3_K_M | β | Not viable |
| Mixtral 8x7B Instruct | Q4_K_M | β | Not viable |
| nomic-embed-text v1.5 | Q8_0 | 4830 tok/s | Excellent |
| NVIDIA Nemotron-3-super-120B-A12B | Q2_K | β | Not viable |
| Phi-3 Medium 14B Instruct | Q3_K_M | 23 tok/s | Marginal |
| Phi-3 Mini 3.8B Instruct | Q5_K_M | 60 tok/s | Excellent |
| Phi-4 14B | Q3_K_M | 22 tok/s | Marginal |
| Phi-4 Mini | Q5_K_M | 63 tok/s | Excellent |
| Qwen 2.5 14B Instruct | Q3_K_M | 20 tok/s | Marginal |
| Qwen 2.5 72B Instruct | Q2_K | β | Not viable |
| Qwen 2.5 7B Instruct | Q4_K_M | 35 tok/s | Good |
| Qwen 2.5 Coder 32B Instruct | Q2_K | β | Not viable |
| Qwen 2.5 Coder 7B Instruct | Q4_K_M | 36 tok/s | Good |
| Qwen3-14B Instruct | Q3_K_M | 21 tok/s | Acceptable |
| Qwen3-8B Instruct | Q5_K_M | 28 tok/s | Acceptable |
| Qwen3.5-27B | Q3_K_M | β | Not viable |
| Qwen3.5-397B (MoE) | Q2_K | β | Not viable |
| Qwen3.6-27B | Q3_K_M | β | Not viable |
| QwQ 32B Preview | Q2_K | β | Not viable |
| Stable Diffusion 3 Medium | FP16 | β | Good |
| Stable Diffusion 3.5 Large | Q8_0 | β | Not viable |
| Stable Diffusion XL 1.0 | FP16 | β | Good |
| StarCoder 2 15B | Q3_K_M | 18 tok/s | Marginal |
| Whisper Large V3 | Q5_K_M | β | Excellent |
| Whisper Large V3 Turbo | FP16 | β | Excellent |
| Yi 1.5 34B Chat | Q2_K | β | Not viable |
Showing 52 of 52 entries
Prices and availability vary. Inspect hardware before purchasing. Some links may be affiliate links.
See every AI model it supports, expected performance, and how to build around it.