
32 GB · 1792 GB/s
$2,199
Updated 2026-03-01
The NVIDIA GeForce RTX 5090 with 32 GB GDDR7 VRAM can handle 15 AI models across chat, coding, ai_coding. Best performance: Llama 3.2 1B Instruct at 300 tok/s (excellent). For AI coding workflows, it supports the Power AI Coding tier — runs 32B coding models at good quality. Current price: approximately $2,199.
— OwnRig methodology, data updated 2026-03-01
Runs 32B coding models at good quality. Can handle coding model + embeddings concurrently.
| Model | Quant | Speed | Rating | Notes |
|---|---|---|---|---|
| Llama 3.1 8B Instruct | Q8_0 | 170 tok/s | Excellent | RTX 5090 bandwidth (1792 GB/s) ~1.8x 4090. Near-instant 8B inference. |
| Llama 3.1 70B Instruct | Q4_K_M | 14 tok/s | Good | 32GB VRAM fits 70B Q4. 5090 bandwidth makes it usable without offloading. |
| Qwen 2.5 Coder 32B Instruct | Q5_K_M | 45 tok/s | Excellent | Q5 at ~22GB. 5090 bandwidth delivers excellent coding speed. |
| DeepSeek R1 Distill Qwen 32B | Q5_K_M | 42 tok/s | Excellent | Q5 32B reasoning fits on 32GB. Best local reasoning performance. |
| Llama 3.3 70B Instruct | Q4_K_M | 12 tok/s | Good | 70B Q4 fits on 32GB 5090. Usable without offloading. |
| Mistral Small 24B Instruct | Q5_K_M | 55 tok/s | Excellent | 24B Q5 benefits from 5090 bandwidth. Excellent general-purpose speed. |
| Stable Diffusion XL 1.0 | FP16 | — | Excellent | ~2-3 seconds per 1024x1024 image. Massive headroom for LoRA stacking. |
| Stable Diffusion 3 Medium | FP16 | — | Excellent | Full FP16 SD3 Medium. ~5-7 seconds per image. Best quality. |
| Llama 3.2 3B Instruct | Q8_0 | 200 tok/s | Excellent | 1792 GB/s bandwidth. Maximum possible 3B speed. |
| Llama 3.2 1B Instruct | Q8_0 | 300 tok/s | Excellent | 1792 GB/s. Maximum possible 1B inference speed. |
| Phi-4 Mini | Q8_0 | 185 tok/s | Excellent | 1792 GB/s. Maximum Phi-4 mini speed. |
| Whisper Large V3 Turbo | FP16 | — | Excellent | 1792 GB/s. Minimum transcription latency. |
| Stable Diffusion 3.5 Large | FP16 | — | Excellent | 1792 GB/s. 32GB fits FP16 with 19GB headroom. ~2.5s per image. Fastest SD 3.5 Large. |
| Gemma 3 27B | Q5_K_M | 35 tok/s | Excellent | 32GB allows Q5_K_M with 12.7GB headroom. 1792 GB/s bandwidth delivers excellent throughput. |
| DeepSeek V3 | Q2_K | — | Not Viable | 671B MoE model requires 115GB+ at Q2_K. 32GB insufficient. Would need 128GB+ unified memory. |
Prices and availability vary. Inspect hardware before purchasing.
Generation: Blackwell. Last updated: 2026-03-01.