LLaVA · Llama 2 Community License
Multimodal model — processes images and text together. Built on Vicuna 13B with a vision encoder. Can analyze screenshots, diagrams, and photos. Useful for builders who need to process visual content in their workflows.
LLaVA 1.6 13B (13B) requires 9.1 GB VRAM at recommended quality (Q5_K_M). At efficient quality (Q4_K_M), it fits in 7.7 GB VRAM, making it compatible with the NVIDIA GeForce RTX 4060 8GB. On NVIDIA GeForce RTX 4090, expect approximately 30 tok/s at Q5_K_M. For the best experience, Starter AI Desktop ($582) is recommended.
— OwnRig methodology, data updated 2026-03-01
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| recommended | Q5_K_M | 9.1 GB | 7.8 GB |
| efficient | Q4_K_M | 7.7 GB | 6.5 GB |
| compressed | Q3_K_M | 6.2 GB | 5.1 GB |
KV cache VRAM at Q5_K_M quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 205 MB | 9.3 GB |
| 4K | 410 MB | 9.5 GB |
Performance data for LLaVA 1.6 13B across different hardware.
| Device | Quantization | Speed | Rating | Fits in VRAM |
|---|---|---|---|---|
| NVIDIA GeForce RTX 4060 Ti 16GB | Q4_K_M | 22 tok/s | Good | ✓ |
| NVIDIA GeForce RTX 4090 | Q5_K_M | 30 tok/s | Good | ✓ |
| NVIDIA GeForce RTX 4060 8GB | Q3_K_M | 22 tok/s | Marginal | ✓ |
| NVIDIA GeForce RTX 4070 Ti 12GB | Q4_K_M | 28 tok/s | Good | ✓ |
| NVIDIA GeForce RTX 3080 10GB | Q3_K_M | 18 tok/s | Acceptable | ✓ |
| Apple M3 Pro (18GB Unified) | Q4_K_M | 8 tok/s | Acceptable | ✓ |
Complete PC builds that can run LLaVA 1.6 13B.
Data confidence: estimated. Last updated: 2026-03-01. Source