
24 GB · 936 GB/s
$899
Updated 2026-03-01
The NVIDIA GeForce RTX 3090 with 24 GB GDDR6X VRAM can handle 9 AI models across chat, coding, ai_coding. Best performance: Llama 3.2 1B Instruct at 220 tok/s (excellent). For AI coding workflows, it supports the Power AI Coding tier — runs 32B coding models at good quality. Current price: approximately $899.
— OwnRig methodology, data updated 2026-03-01
Runs 32B coding models at good quality. Can handle coding model + embeddings concurrently.
| Model | Quant | Speed | Rating | Notes |
|---|---|---|---|---|
| Llama 3.1 8B Instruct | Q8_0 | 70 tok/s | Excellent | Strong performance from 936 GB/s bandwidth. 15GB VRAM headroom. |
| Qwen 2.5 Coder 32B Instruct | Q4_K_M | 18 tok/s | Good | Fits on the 3090 at Q4 with ~5.6GB headroom. Slightly slower than 4090 due to lower compute, but the 936 GB/s bandwidth is strong. |
| Llama 3.2 3B Instruct | Q8_0 | 150 tok/s | Excellent | 936 GB/s bandwidth. 3B model runs extremely fast with 20GB headroom. |
| Llama 3.2 1B Instruct | Q8_0 | 220 tok/s | Excellent | 936 GB/s delivers excellent 1B speed. 22GB headroom. |
| Phi-4 Mini | Q8_0 | 140 tok/s | Excellent | 936 GB/s. Strong Phi-4 mini performance with 19GB headroom. |
| Whisper Large V3 Turbo | FP16 | — | Excellent | 936 GB/s. Fast transcription with 22GB headroom. |
| Stable Diffusion 3.5 Large | FP16 | — | Excellent | 936 GB/s. ~4s per image. 24GB fits FP16 with 11GB headroom. |
| Gemma 3 27B | Q4_K_M | 18 tok/s | Good | Q4_K_M fits with 7.7GB headroom. 936 GB/s bandwidth delivers solid performance. |
| DeepSeek V3 | Q2_K | — | Not Viable | 671B MoE model requires 115GB+ at Q2_K. 24GB insufficient. Would need 128GB+ unified memory. |
Prices and availability vary. Inspect hardware before purchasing.
Generation: Ampere. Last updated: 2026-03-01.