NVIDIA Nemotron-3-super-120B-A12B
Nemotron · NVIDIA Open Model License
Mixture of Experts: 120B total parameters, 12B active per token.
MoE architecture with 120B total parameters and roughly 12B active per token. Requires VRAM for the full expert pool but decodes more like a smaller model once loaded. Native 131K context with 1M-token extension support.
- Parameters
- 120B
- Architecture
- MoE (12B active)
- Context
- 1,048,576 tokens
- Released
- 2025-12-15
- Engines
- llama.cpp, vLLM
- Builder Tools
- Continue, LM Studio, Open WebUI
Parameters
120B
VRAM
70 GB
Context
1024K
Formats
3
GPUs
43
NVIDIA Nemotron-3-super-120B-A12B (120B) requires 70 GB VRAM at recommended quality (Q4_K_M). On NVIDIA Grace Blackwell Ultra GB300, expect approximately 180 tok/s at Q4_K_M. For the best experience, Mac Studio AI Builder ($3,999) is recommended.
Source: OwnRig methodology
70 GB
Q4_K_M
67 GB
1024K tokens
Chat
VRAM Requirements
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| recommended | Q4_K_M | 70 GB | 67 GB |
| efficient | Q3_K_M | 50 GB | 48 GB |
| compressed | Q2_K | 40 GB | 38 GB |
Context Length Impact
KV cache VRAM at Q4_K_M quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 512 MB | 70.5 GBexceeds 24 GB |
| 4K | 1 GB | 71 GBexceeds 24 GB |
| 8K | 2 GB | 72 GBexceeds 24 GB |
| 16K | 4 GB | 74 GBexceeds 24 GB |
| 32K | 8 GB | 78 GBexceeds 24 GB |
| 64K | 16 GB | 86 GBexceeds 24 GB |
| 128K | 32 GB | 102 GBexceeds 24 GB |
Compatible GPUs
43 devicesShowing 43 of 43 entries
Builder Context
NVIDIA Nemotron-3-super-120B-A12B is commonly used with Continue, LM Studio, Open WebUI. For an AI coding workflow, pair it with an embedding model like nomic-embed-text for local RAG.
Frequently Asked Questions
- How much VRAM does NVIDIA Nemotron-3-super-120B-A12B need?
- NVIDIA Nemotron-3-super-120B-A12B requires 70 GB VRAM at recommended quality (Q4_K_M). At lower quality settings, it can fit in as little as 40 GB.
- What is the best GPU for NVIDIA Nemotron-3-super-120B-A12B?
- The NVIDIA Grace Blackwell Ultra GB300 delivers the best performance for NVIDIA Nemotron-3-super-120B-A12B, achieving 180 tok/s at Q4_K_M with an excellent rating.
- Can I run NVIDIA Nemotron-3-super-120B-A12B on an RTX 4060 Ti?
- NVIDIA Nemotron-3-super-120B-A12B at Q2_K requires 70 GB VRAM, which exceeds the RTX 4060 Ti's 16 GB. Consider a lower quantization or a GPU with more VRAM.
- What quantization should I use for NVIDIA Nemotron-3-super-120B-A12B?
- For the best quality, use Q4_K_M (70 GB VRAM). If your GPU has limited VRAM, Q2_K (40 GB) is the most efficient option with acceptable quality.
- Is NVIDIA Nemotron-3-super-120B-A12B good for coding?
- Yes. NVIDIA Nemotron-3-super-120B-A12B is used with Continue, LM Studio, Open WebUI for local AI coding. For the best coding experience, pair it with an embedding model for local RAG.
Data confidence: estimated. Source
VRAM requirements are calculated from model parameters and may vary by inference engine, context length, and batch size. Performance estimates are based on community benchmarks and should be verified for your specific configuration.Nemotron is a trademark of its respective owner. OwnRig is not affiliated with or endorsed by the model creator.