N
ChatCodingReasoningMulti-purpose120B
Chat

NVIDIA Nemotron-3-super-120B-A12B

Nemotron · NVIDIA Open Model License

Mixture of Experts: 120B total parameters, 12B active per token.

MoE architecture with 120B total parameters and roughly 12B active per token. Requires VRAM for the full expert pool but decodes more like a smaller model once loaded. Native 131K context with 1M-token extension support.

Parameters
120B
Architecture
MoE (12B active)
Context
1,048,576 tokens
Released
2025-12-15
Engines
llama.cpp, vLLM
Builder Tools
Continue, LM Studio, Open WebUI

Parameters

120B

VRAM

70 GB

Context

1024K

Formats

3

GPUs

43

NVIDIA Nemotron-3-super-120B-A12B (120B) requires 70 GB VRAM at recommended quality (Q4_K_M). On NVIDIA Grace Blackwell Ultra GB300, expect approximately 180 tok/s at Q4_K_M. For the best experience, Mac Studio AI Builder ($3,999) is recommended.

Source: OwnRig methodology

VRAM (Recommended)

70 GB

Quantization

Q4_K_M

File Size

67 GB

Max Context

1024K tokens

Primary Use

Chat

Memory

VRAM Requirements

QualityQuantizationVRAMFile Size
recommendedQ4_K_M70 GB67 GB
efficientQ3_K_M50 GB48 GB
compressedQ2_K40 GB38 GB
Scaling

Context Length Impact

KV cache VRAM at Q4_K_M quality. Longer context = more memory.

ContextKV CacheTotal VRAM
2K512 MB70.5 GBexceeds 24 GB
4K1 GB71 GBexceeds 24 GB
8K2 GB72 GBexceeds 24 GB
16K4 GB74 GBexceeds 24 GB
32K8 GB78 GBexceeds 24 GB
64K16 GB86 GBexceeds 24 GB
128K32 GB102 GBexceeds 24 GB

Compatible GPUs

43 devices
NVIDIA Grace Blackwell Ultra GB300Q4_K_M180 tok/sExcellent
Apple M4 Max (128GB Unified)Q4_K_M39 tok/sExcellent
Apple M4 Ultra (192GB)Q4_K_M51 tok/sExcellent
NVIDIA RTX PRO 6000 BlackwellQ4_K_M158 tok/sExcellent
NVIDIA RTX PRO 6000 Blackwell Max-QQ4_K_M145 tok/sExcellent
Apple M4 Max (64GB Unified)Q3_K_M41 tok/sGood
Apple M4 Pro (48GB)Q2_K50 tok/sGood
AMD Radeon Pro W7900Q2_K54 tok/sGood
Apple M4 Max (36GB Unified)Q2_K9 tok/sMarginal
Apple M4 Pro (24GB Unified)Q2_K8 tok/sMarginal
NVIDIA GeForce RTX 3090Q2_K11 tok/sMarginal
NVIDIA GeForce RTX 4090Q2_K18 tok/sMarginal
NVIDIA GeForce RTX 5090Q2_K23 tok/sMarginal
Apple M3 Pro (18GB Unified)Q2_KNot viable
Apple M4 (16GB Unified)Q2_KNot viable
NVIDIA GeForce RTX 3060 12GBQ2_KNot viable
NVIDIA GeForce RTX 3080 10GBQ2_KNot viable
NVIDIA GeForce RTX 4060 8GBQ2_KNot viable
NVIDIA RTX 4060 Laptop (40-60W)Q2_KNot viable
NVIDIA GeForce RTX 4060 Ti 16GBQ2_KNot viable
NVIDIA RTX 4070 Laptop (80-115W)Q2_KNot viable
NVIDIA GeForce RTX 4070 SuperQ2_KNot viable
NVIDIA GeForce RTX 4070 Ti 12GBQ2_KNot viable
NVIDIA GeForce RTX 4070 Ti SuperQ2_KNot viable
NVIDIA RTX 4080 Laptop (120-150W)Q2_KNot viable
NVIDIA GeForce RTX 4080 SuperQ2_KNot viable
NVIDIA RTX 4090 Laptop (150-175W)Q2_KNot viable
NVIDIA GeForce RTX 5080Q2_KNot viable
AMD Radeon RX 7600Q2_KNot viable
AMD Radeon RX 7900 XTXQ2_KNot viable
AMD Radeon RX 9070Q2_KNot viable
Apple M1 (8GB Unified)Q2_KNot viable
Apple M1 (16GB Unified)Q2_KNot viable
Apple M1 Pro (16GB Unified)Q2_KNot viable
Apple M2 (8GB Unified)Q2_KNot viable
Apple M2 (16GB Unified)Q2_KNot viable
Apple M2 Pro (16GB Unified)Q2_KNot viable
Apple M3 (8GB Unified)Q2_KNot viable
Apple M3 (16GB Unified)Q2_KNot viable
AMD Radeon RX 9060 XT 16GBQ2_KNot viable
AMD Radeon RX 9060 XT 8GBQ2_KNot viable
NVIDIA GeForce RTX 5060 8GBQ2_KNot viable
NVIDIA GeForce RTX 5060 Ti 16GBQ2_KNot viable

Showing 43 of 43 entries

Builder Context

NVIDIA Nemotron-3-super-120B-A12B is commonly used with Continue, LM Studio, Open WebUI. For an AI coding workflow, pair it with an embedding model like nomic-embed-text for local RAG.

FAQ

Frequently Asked Questions

How much VRAM does NVIDIA Nemotron-3-super-120B-A12B need?
NVIDIA Nemotron-3-super-120B-A12B requires 70 GB VRAM at recommended quality (Q4_K_M). At lower quality settings, it can fit in as little as 40 GB.
What is the best GPU for NVIDIA Nemotron-3-super-120B-A12B?
The NVIDIA Grace Blackwell Ultra GB300 delivers the best performance for NVIDIA Nemotron-3-super-120B-A12B, achieving 180 tok/s at Q4_K_M with an excellent rating.
Can I run NVIDIA Nemotron-3-super-120B-A12B on an RTX 4060 Ti?
NVIDIA Nemotron-3-super-120B-A12B at Q2_K requires 70 GB VRAM, which exceeds the RTX 4060 Ti's 16 GB. Consider a lower quantization or a GPU with more VRAM.
What quantization should I use for NVIDIA Nemotron-3-super-120B-A12B?
For the best quality, use Q4_K_M (70 GB VRAM). If your GPU has limited VRAM, Q2_K (40 GB) is the most efficient option with acceptable quality.
Is NVIDIA Nemotron-3-super-120B-A12B good for coding?
Yes. NVIDIA Nemotron-3-super-120B-A12B is used with Continue, LM Studio, Open WebUI for local AI coding. For the best coding experience, pair it with an embedding model for local RAG.
All models

Data confidence: estimated. Source

VRAM requirements are calculated from model parameters and may vary by inference engine, context length, and batch size. Performance estimates are based on community benchmarks and should be verified for your specific configuration.Nemotron is a trademark of its respective owner. OwnRig is not affiliated with or endorsed by the model creator.