GigaChat Lightning 10B
GigaChat · Apache 2.0
Mixture of Experts: 10B total parameters, 1.8B active per token.
Compact MoE model with 10B total parameters and around 1.8B active per token. Interesting efficiency play for chat workloads that want MoE-style speed without the memory cost of larger expert models.
- Parameters
- 10B
- Architecture
- MoE (1.8B active)
- Context
- 32,768 tokens
- Released
- 2026-03-20
- Engines
- llama.cpp, ollama
- Builder Tools
- Continue, Open WebUI
Parameters
10B
VRAM
6 GB
Context
32K
Formats
2
GPUs
43
GigaChat Lightning 10B (10B) requires 6 GB VRAM at recommended quality (Q4_K_M). On NVIDIA RTX PRO 6000 Blackwell, expect approximately 325 tok/s at Q8_0. For the best experience, Starter AI Desktop ($582) is recommended.
Source: OwnRig methodology
6 GB
Q4_K_M
5.5 GB
32K tokens
Chat
VRAM Requirements
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| full | Q8_0 | 11 GB | 10 GB |
| recommended | Q4_K_M | 6 GB | 5.5 GB |
Context Length Impact
KV cache VRAM at Q4_K_M quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 102 MB | 6.1 GB |
| 4K | 205 MB | 6.2 GB |
| 8K | 410 MB | 6.4 GB |
| 16K | 819 MB | 6.8 GB |
| 32K | 1.6 GB | 7.6 GB |
Compatible GPUs
43 devicesShowing 43 of 43 entries
Builder Context
GigaChat Lightning 10B is commonly used with Continue, Open WebUI.
Frequently Asked Questions
- How much VRAM does GigaChat Lightning 10B need?
- GigaChat Lightning 10B requires 6 GB VRAM at recommended quality (Q4_K_M). At lower quality settings, it can fit in as little as 6 GB.
- What is the best GPU for GigaChat Lightning 10B?
- The NVIDIA RTX PRO 6000 Blackwell delivers the best performance for GigaChat Lightning 10B, achieving 325 tok/s at Q8_0 with an excellent rating.
- Can I run GigaChat Lightning 10B on an RTX 4060 Ti?
- Yes. On the NVIDIA GeForce RTX 4060 Ti 16GB, GigaChat Lightning 10B runs at 55 tok/s (Q8_0, acceptable).
- What quantization should I use for GigaChat Lightning 10B?
- For the best quality, use Q4_K_M (6 GB VRAM). If your GPU has limited VRAM, Q4_K_M (6 GB) is the most efficient option with acceptable quality.
Data confidence: estimated. Source
VRAM requirements are calculated from model parameters and may vary by inference engine, context length, and batch size. Performance estimates are based on community benchmarks and should be verified for your specific configuration.GigaChat is a trademark of its respective owner. OwnRig is not affiliated with or endorsed by the model creator.