16 GB GDDR7 Β· 448 GB/s
From
$429
Estimated street price
VRAM
16 GB
Bandwidth
448 GB/s
TDP
180W
Models
35
Tier
Capable
The NVIDIA GeForce RTX 5060 Ti 16GB with 16 GB GDDR7 VRAM can handle 35 AI models across coding, ai_coding, ai_building. Best performance: Llama 3.2 1B Instruct at 134 tok/s (excellent). For AI coding workflows, it supports the Capable AI Coding tier, handling single model workflows well. Current price: approximately $429.
Source: OwnRig methodology
16 GB
448 GB/s
GDDR7
180W
2-slot, 241mm
Runs 16-22B coding models comfortably, or 32B at reduced quality. Handles single model workflows well.
The software stacks that matter most for real-world inference on this device.
CUDA
productionPrimary high-performance backend for NVIDIA inference workloads.
Vulkan
stableFallback backend for llama.cpp and related local runtimes.
| Arcee Trinity Mini 26B | Q3_K_M | 30 tok/s | Good |
| Arcee Trinity Nano 6B | Q8_0 | 57 tok/s | Excellent |
| Codestral 22B | Q3_K_M | 20 tok/s | Acceptable |
| DeepSeek Coder V2 Lite 16B | Q5_K_M | 56 tok/s | Excellent |
| DeepSeek V3 | Q2_K | β | Not viable |
| FLUX.1 Dev | Q4_K_M | β | Acceptable |
| Gemma 2 27B Instruct | Q4_K_M | 13 tok/s | Acceptable |
| Gemma 3 12B | Q5_K_M | 47 tok/s | Good |
| Gemma 3 27B | Q3_K_M | 7 tok/s | Marginal |
| Gemma 4 26B-A4B | Q3_K_M | 110 tok/s | Excellent |
| Gemma 4 31B | Q3_K_M | 7 tok/s | Marginal |
| Gemma 4 E2B | Q8_0 | 48 tok/s | Good |
| Gemma 4 E4B | Q8_0 | 29 tok/s | Acceptable |
| GigaChat Lightning 10B | Q8_0 | 62 tok/s | Acceptable |
| Llama 3.1 8B Instruct | Q8_0 | 62 tok/s | Excellent |
| Llama 3.2 11B Vision | Q6_K | 43 tok/s | Good |
| Llama 3.2 1B Instruct | Q8_0 | 134 tok/s | Excellent |
| Llama 3.2 3B Instruct | Q8_0 | 84 tok/s | Excellent |
| LLaVA 1.6 13B | Q4_K_M | 25 tok/s | Good |
| NVIDIA Nemotron-3-super-120B-A12B | Q2_K | β | Not viable |
| Phi-3 Medium 14B Instruct | Q5_K_M | 31 tok/s | Good |
| Phi-4 14B | Q4_K_M | 31 tok/s | Good |
| Phi-4 Mini | Q8_0 | 76 tok/s | Excellent |
| Qwen 2.5 14B Instruct | Q4_K_M | 34 tok/s | Good |
| Qwen 2.5 Coder 32B Instruct | Q3_K_M | 11 tok/s | Acceptable |
| Qwen 2.5 Coder 7B Instruct | Q5_K_M | 58 tok/s | Excellent |
| Qwen3-14B Instruct | Q8_0 | 18 tok/s | Acceptable |
| Qwen3.5-122B-A10B | Q3_K_M | β | Not viable |
| Qwen3.5-27B | Q3_K_M | 28 tok/s | Acceptable |
| Qwen3.5-397B (MoE) | Q2_K | β | Not viable |
| Qwen3.6-27B | Q3_K_M | 28 tok/s | Acceptable |
| Stable Diffusion 3 Medium | FP16 | β | Good |
| Stable Diffusion 3.5 Large | FP16 | β | Good |
| StarCoder 2 15B | Q5_K_M | 28 tok/s | Good |
| Whisper Large V3 Turbo | FP16 | β | Excellent |
Showing 35 of 35 entries
Prices and availability vary. Inspect hardware before purchasing. Some links may be affiliate links.
See every AI model it supports, expected performance, and how to build around it.