Nemotron · NVIDIA Nemotron Open Model License
Diffusion language model with Autoregressive (AR), Diffusion, Self-speculation. Default matrix mode (when available): Self-speculation.
Diffusion language model with autoregressive, diffusion, and self-speculation decode modes. Official weights ship as BF16 Safetensors (~17 GB) with custom architecture code; no official GGUF. Consumer path today is Python transformers>=5.0 with trust_remote_code or pending SGLang DLM support. Not in our compatibility matrix until a reproducible llama.cpp or Ollama path exists. VRAM figures are estimated from official weight size, not benchmarked on OwnRig hardware.
Parameters
8.5B
VRAM
19 GB
Context
32K
Formats
1
GPUs
Pending — GPU compatibility ratings not published yet
Nemotron-Labs Diffusion 8B (8.5B) requires 19 GB VRAM at recommended quality (FP16). For the best experience, AMD AI Powerhouse ($1,818) is recommended.
Source: OwnRig methodology
Nemotron-Labs Diffusion 8B is in our catalog for reference. We add GPU fit and speed ratings only after a reproducible consumer runtime exists (stock llama.cpp, Ollama, or stable SGLang DLM). Today most builders still need transformers with custom CUDA code.
How diffusion language models differ from standard chat models
19 GB
FP16
16.98 GB
32K tokens
Chat
KV cache VRAM at FP16 quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 205 MB | 19.2 GB |
| 4K | 410 MB | 19.4 GB |
| 8K | 717 MB | 19.7 GB |
| 16K | 1.4 GB | 20.4 GB |
| 32K | 2.9 GB | 21.9 GB |
Data confidence: estimated. Source
VRAM requirements are calculated from model parameters and may vary by inference engine, context length, and batch size. Performance estimates are based on community benchmarks and should be verified for your specific configuration.Nemotron is a trademark of its respective owner. OwnRig is not affiliated with or endorsed by the model creator.