
12 GB · 360 GB/s
$269
Updated 2026-03-01
The NVIDIA GeForce RTX 3060 12GB with 12 GB GDDR6 VRAM can handle 22 AI models across chat, coding, ai_coding. Best performance: Llama 3.2 1B Instruct at 140 tok/s (excellent). For AI coding workflows, it supports the Starter AI Coding tier — good for 7-8B models. Current price: approximately $269.
— OwnRig methodology, data updated 2026-03-01
Runs 7-8B models comfortably. Good for basic local code completion and small model experiments.
| Model | Quant | Speed | Rating | Notes |
|---|---|---|---|---|
| Llama 3.1 8B Instruct | Q5_K_M | 35 tok/s | Good | Comfortable fit with 6.2GB headroom. Good speed despite lower bandwidth. |
| DeepSeek Coder V2 Lite 16B | Q4_K_M | 40 tok/s | Good | MoE architecture — only 2.4B active params per token. Speed closer to a 3B model despite 15.7B total. Fits at Q4 (9.1GB) with headroom. |
| nomic-embed-text v1.5 | FP16 | — | Excellent | At 0.5GB VRAM, runs alongside any model with negligible impact. |
| Stable Diffusion XL 1.0 | FP16 | — | Good | ~15-20 seconds per 1024x1024 image at 30 steps. SDXL requires ~6.5GB VRAM. Fits on 12GB with room for LoRA. |
| Whisper Large V3 | FP16 | — | Excellent | 3.1GB VRAM at FP16. Transcribes ~10x faster than real-time. Runs on any GPU in our dataset. |
| Mistral 7B Instruct v0.3 | Q5_K_M | 33 tok/s | Good | Similar to Llama 8B performance. 6.7GB headroom. |
| Phi-3 Mini 3.8B Instruct | Q8_0 | 60 tok/s | Excellent | 3.8B model runs extremely fast even on budget GPUs. 7.5GB headroom at Q8. |
| all-MiniLM-L6-v2 | FP16 | — | Excellent | At 0.25GB, runs alongside any model with zero meaningful impact. |
| Gemma 2 9B Instruct | Q5_K_M | 30 tok/s | Good | Q5 at 6.6GB fits well on 12GB. Slightly slower than Llama 8B due to architecture differences. |
| Qwen 2.5 7B Instruct | Q5_K_M | 33 tok/s | Good | Similar performance to Llama 8B and Mistral 7B at this size class. |
| DeepSeek R1 Distill Qwen 7B | Q4_K_M | 38 tok/s | Good | 7B distill model fits comfortably on 12GB. Good speed for reasoning tasks. |
| Gemma 3 12B | Q4_K_M | 32 tok/s | Good | 12B at Q4 fits on 12GB. Good balance for budget builds. |
| Gemma 3 4B | Q5_K_M | 55 tok/s | Excellent | 4B model runs very fast on 12GB. Minimal VRAM footprint. |
| Qwen 2.5 Coder 7B Instruct | Q5_K_M | 36 tok/s | Good | 7B coding model fits well on 12GB. Good for code completion. |
| InternLM 2.5 7B Chat | Q5_K_M | 35 tok/s | Good | 7B model fits comfortably on 12GB. Good for general tasks. |
| Llama 3.2 3B Instruct | Q8_0 | 90 tok/s | Excellent | 3.21B model at Q8_0 (3.7GB) fits with 8GB headroom. Extremely fast on 360 GB/s bandwidth. |
| Llama 3.2 1B Instruct | Q8_0 | 140 tok/s | Excellent | 1.24B model at Q8_0 (1.5GB). Essentially instant — even faster than 3B. |
| Phi-4 Mini | Q8_0 | 80 tok/s | Excellent | 3.82B at Q8_0 (4.3GB). Slightly slower than 3B Llama due to more params. 7GB headroom. |
| Whisper Large V3 Turbo | FP16 | — | Excellent | Processes audio in real-time. Turbo variant is ~8x faster than full Whisper Large V3. 1.6GB VRAM at FP16. |
| Stable Diffusion 3.5 Large | Q8_0 | — | Good | Q8 variant (9GB) fits in 12GB. FP16 at 12.5GB doesn't fit. Image generation takes ~15s per image. |
| Gemma 3 27B | Q3_K_M | — | Not Viable | 12GB VRAM insufficient. Gemma 3 27B requires at least 13.3GB at Q3_K_M. Would need 16GB+ device or CPU offloading (impractical for speed). |
| DeepSeek V3 | Q2_K | — | Not Viable | 671B MoE model requires 115GB+ even at Q2_K. 12GB is nowhere near sufficient. Would need 128GB+ unified memory or multi-GPU. |
Prices and availability vary. Inspect hardware before purchasing.
Generation: Ampere. Last updated: 2026-03-01.