Llama 3.2 11B Vision
Llama · Llama 3.2 Community License
Meta's first multimodal Llama model. Handles both text and image inputs. At 11B parameters, fits comfortably on 16GB GPUs at Q4. Vision capabilities are useful for image understanding tasks but text quality is comparable to Llama 3.1 8B, not 70B.
- Parameters
- 11B
- Architecture
- Dense
- Context
- 131,072 tokens
- Released
- 2024-09-25
- Engines
- llama.cpp, ollama, vLLM
- Builder Tools
- LM Studio, Ollama, Open WebUI
Parameters
11B
VRAM
10 GB
Context
128K
Formats
5
GPUs
34
Llama 3.2 11B Vision (11B) requires 10 GB VRAM at recommended quality (Q6_K). At efficient quality (Q4_K_M), it fits in 7.2 GB VRAM, making it compatible with the NVIDIA RTX 4090 Laptop (150-175W). On NVIDIA Grace Blackwell Ultra GB300, expect approximately 260 tok/s at Q8_0. For the best experience, Starter AI Desktop ($582) is recommended.
Source: OwnRig methodology
10 GB
Q6_K
8.8 GB
128K tokens
Chat
VRAM Requirements
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| full | Q8_0 | 13 GB | 11.5 GB |
| recommended | Q6_K | 10 GB | 8.8 GB |
| recommended | Q5_K_M | 8.5 GB | 7.6 GB |
| efficient | Q4_K_M | 7.2 GB | 6.4 GB |
| compressed | Q3_K_M | 5.8 GB | 5.2 GB |
Context Length Impact
KV cache VRAM at Q6_K quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 205 MB | 10.2 GB |
| 4K | 410 MB | 10.4 GB |
| 8K | 819 MB | 10.8 GB |
| 16K | 1.5 GB | 11.5 GB |
| 32K | 3.1 GB | 13.1 GB |
| 64K | 6.1 GB | 16.1 GB |
| 128K | 12.3 GB | 22.3 GB |
Compatible GPUs
34 devicesShowing 34 of 34 entries
Builder Context
Llama 3.2 11B Vision is commonly used with LM Studio, Ollama, Open WebUI.
Frequently Asked Questions
- How much VRAM does Llama 3.2 11B Vision need?
- Llama 3.2 11B Vision requires 10 GB VRAM at recommended quality (Q6_K). At lower quality settings, it can fit in as little as 5.8 GB.
- What is the best GPU for Llama 3.2 11B Vision?
- The NVIDIA Grace Blackwell Ultra GB300 delivers the best performance for Llama 3.2 11B Vision, achieving 260 tok/s at Q8_0 with an excellent rating.
- Can I run Llama 3.2 11B Vision on an RTX 4060 Ti?
- Yes. On the NVIDIA GeForce RTX 4060 Ti 16GB, Llama 3.2 11B Vision runs at 38 tok/s (Q6_K, good).
- What quantization should I use for Llama 3.2 11B Vision?
- For the best quality, use Q6_K (10 GB VRAM). If your GPU has limited VRAM, Q3_K_M (5.8 GB) is the most efficient option with acceptable quality.
Data confidence: estimated. Source
VRAM requirements are calculated from model parameters and may vary by inference engine, context length, and batch size. Performance estimates are based on community benchmarks and should be verified for your specific configuration.Llama is a trademark of its respective owner. OwnRig is not affiliated with or endorsed by the model creator.