Qwen3.5-397B (MoE)
Qwen · Apache 2.0
Mixture of Experts: 397B total parameters, 30B active per token.
Frontier-scale MoE with 397B total parameters and roughly 30B active per token. Requires an enormous memory footprint even when quantized, so local runs are limited to GB300-class hardware or aggressive offloading setups.
- Parameters
- 397B
- Architecture
- MoE (30B active)
- Context
- 262,144 tokens
- Released
- 2026-02-24
- Engines
- llama.cpp, vLLM
- Builder Tools
- Continue, LM Studio, Open WebUI
Parameters
397B
VRAM
230 GB
Context
256K
Formats
3
GPUs
43
Qwen3.5-397B (MoE) (397B) requires 230 GB VRAM at recommended quality (Q4_K_M). On NVIDIA Grace Blackwell Ultra GB300, expect approximately 120 tok/s at Q4_K_M.
Source: OwnRig methodology
230 GB
Q4_K_M
220 GB
256K tokens
Chat
VRAM Requirements
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| recommended | Q4_K_M | 230 GB | 220 GB |
| efficient | Q3_K_M | 175 GB | 168 GB |
| compressed | Q2_K | 140 GB | 134 GB |
Context Length Impact
KV cache VRAM at Q4_K_M quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 819 MB | 230.8 GBexceeds 24 GB |
| 4K | 1.6 GB | 231.6 GBexceeds 24 GB |
| 8K | 3.2 GB | 233.2 GBexceeds 24 GB |
| 16K | 6.4 GB | 236.4 GBexceeds 24 GB |
| 32K | 12.8 GB | 242.8 GBexceeds 24 GB |
| 64K | 25.6 GB | 255.6 GBexceeds 24 GB |
| 128K | 51.2 GB | 281.2 GBexceeds 24 GB |
Compatible GPUs
43 devicesShowing 43 of 43 entries
Builder Context
Qwen3.5-397B (MoE) is commonly used with Continue, LM Studio, Open WebUI. For an AI coding workflow, pair it with an embedding model like nomic-embed-text for local RAG.
Frequently Asked Questions
- How much VRAM does Qwen3.5-397B (MoE) need?
- Qwen3.5-397B (MoE) requires 230 GB VRAM at recommended quality (Q4_K_M). At lower quality settings, it can fit in as little as 140 GB.
- What is the best GPU for Qwen3.5-397B (MoE)?
- The NVIDIA Grace Blackwell Ultra GB300 delivers the best performance for Qwen3.5-397B (MoE), achieving 120 tok/s at Q4_K_M with an excellent rating.
- Can I run Qwen3.5-397B (MoE) on an RTX 4060 Ti?
- Qwen3.5-397B (MoE) at Q2_K requires 230 GB VRAM, which exceeds the RTX 4060 Ti's 16 GB. Consider a lower quantization or a GPU with more VRAM.
- What quantization should I use for Qwen3.5-397B (MoE)?
- For the best quality, use Q4_K_M (230 GB VRAM). If your GPU has limited VRAM, Q2_K (140 GB) is the most efficient option with acceptable quality.
- Is Qwen3.5-397B (MoE) good for coding?
- Yes. Qwen3.5-397B (MoE) is used with Continue, LM Studio, Open WebUI for local AI coding. For the best coding experience, pair it with an embedding model for local RAG.
Data confidence: estimated. Source
VRAM requirements are calculated from model parameters and may vary by inference engine, context length, and batch size. Performance estimates are based on community benchmarks and should be verified for your specific configuration.Qwen is a trademark of its respective owner. OwnRig is not affiliated with or endorsed by the model creator.