Most local AI guides assume one story: load a GGUF, pick a quantization, watch tokens stream left to right. That is autoregressive decoding. It is what Llama, Qwen, and Mistral do in Ollama today.
Diffusion language models break that habit. They can still chat. They can still code. But under the hood they sometimes rewrite whole chunks of text in parallel instead of marching one token at a time.
Autoregressive vs diffusion
Autoregressive (AR): each new token conditions on everything before it. Simple mental model, predictable VRAM (weights plus KV cache), mature tooling.
Diffusion LM: the model iterates on a block of tokens, refining noise into readable text. NVIDIA's Nemotron-Labs family also advertises self-speculation, switching between AR and diffusion-style steps based on attention patterns.
That flexibility is the pitch. It is also the support problem. Your inference engine has to implement those modes, not just load weights.
What OwnRig tracks today
We added Nemotron-Labs Diffusion 8B to the catalog with architecture type diffusion_lm. Official BF16 weights are about 16.98 GB on disk; we estimate 19 GB total VRAM at practical context. Those numbers come from NVIDIA's published artifact size, not from our RTX 4090 benchmark lab.
We deliberately omit GPU compatibility rows. Ada Gate A failed the consumer runtime bar: no official GGUF, no stock Ollama path, SGLang DLM support still landing via pull requests. Listing tok/s would be theater.
How to read vendor speed claims
NVIDIA's launch materials cite multipliers versus autoregressive baselines on datacenter hardware with custom kernels. Impressive slides. Not a shopping list for a $299 GPU.
OwnRig policy: we publish speeds only when a typical builder can reproduce the command on hardware we track. Until then, editorial context only.
What to run instead right now
If you want a coding model on a 16GB card today, Qwen3.6-35B-A3B at Q3_K_M is the honest OwnRig story: MoE, Apache 2.0, community GGUF, Ollama-ready. Diffusion LMs are the next chapter, not the current homework.
