What AI models can NVIDIA RTX 4090 Laptop (150-175W) run?

The NVIDIA RTX 4090 Laptop (150-175W) can run 34 AI models. Top performers include Gemma 4 26B-A4B, Llama 3.2 1B Instruct, Arcee Trinity Nano 6B. See the full compatibility table above for speeds and quality ratings.

Is NVIDIA RTX 4090 Laptop (150-175W) good for AI coding?

Yes. With 16 GB, the NVIDIA RTX 4090 Laptop (150-175W) handles single-model coding workflows well at the Capable tier.

How much VRAM does NVIDIA RTX 4090 Laptop (150-175W) have?

The NVIDIA RTX 4090 Laptop (150-175W) has 16 GB of GDDR6 VRAM with 512 GB/s bandwidth.

Can NVIDIA RTX 4090 Laptop (150-175W) run 70B models?

70B models can run on the NVIDIA RTX 4090 Laptop (150-175W) with CPU offloading, but performance will be reduced. Consider a device with 48GB+ inference memory for full-speed 70B inference.

Is NVIDIA RTX 4090 Laptop (150-175W) worth it for AI?

Pricing for NVIDIA RTX 4090 Laptop (150-175W) has not been announced. It offers 16 GB GDDR6 VRAM, but OwnRig should treat recommendations as provisional until pricing and benchmarks are available.

Laptop GPU

NVIDIA RTX 4090 Laptop (150-175W)

16 GB GDDR6 · 512 GB/s

Pricing

Included in laptop

Not sold as a standalone component

VRAM

16 GB

Bandwidth

512 GB/s

TDP

150W

Models

Tier

Capable

The NVIDIA RTX 4090 Laptop (150-175W) with 16 GB GDDR6 VRAM can handle 34 AI models across coding, ai_coding, ai_building. Best performance: Gemma 4 26B-A4B at 175 tok/s (excellent). For AI coding workflows, it supports the Capable AI Coding tier, handling single model workflows well. Current price has not been announced.

Source: OwnRig methodology

VRAM

16 GB

Bandwidth

512 GB/s

Memory Type

GDDR6

TDP

150W

Form Factor

Laptop (soldered)

Laptop Performance Note

Laptop GPU performance varies by manufacturer, cooling design, and power limits. The tok/s numbers below reflect sustained performance after thermal throttling, not peak. Actual results on your specific laptop may differ by 10-20%.

Builder Capability: Capable AI Coding

Runs 16-22B coding models comfortably, or 32B at reduced quality. Handles single model workflows well.

Software

Inference Backends

The software stacks that matter most for real-world inference on this device.

CUDA

production

Primary high-performance backend for NVIDIA inference workloads.

Vulkan

stable

Fallback backend for llama.cpp and related local runtimes.

What it can run

34 models


Arcee Trinity Mini 26B	Q3_K_M	49 tok/s	Excellent
Arcee Trinity Nano 6B	Q8_0	90 tok/s	Excellent
Codestral 22B	Q3_K_M	15 tok/s	Acceptable
DeepSeek Coder V2 Lite 16B	Q5_K_M	43 tok/s	Good
FLUX.1 Dev	Q4_K_M	–	Acceptable
Gemma 2 27B Instruct	Q4_K_M	10 tok/s	Acceptable
Gemma 3 12B	Q5_K_M	36 tok/s	Good
Gemma 3 27B	Q3_K_M	5 tok/s	Marginal
Gemma 4 26B-A4B	Q3_K_M	175 tok/s	Excellent
Gemma 4 31B	Q3_K_M	11 tok/s	Acceptable
Gemma 4 E2B	Q8_0	77 tok/s	Excellent
Gemma 4 E4B	Q8_0	47 tok/s	Good
GigaChat Lightning 10B	Q8_0	66 tok/s	Acceptable
Llama 3.1 8B Instruct	Q8_0	47 tok/s	Good
Llama 3.2 11B Vision	Q6_K	32 tok/s	Good
Llama 3.2 1B Instruct	Q8_0	102 tok/s	Excellent
Llama 3.2 3B Instruct	Q8_0	64 tok/s	Excellent
LLaVA 1.6 13B	Q4_K_M	19 tok/s	Acceptable
NVIDIA Nemotron-3-super-120B-A12B	Q2_K	–	Not viable
Phi-3 Medium 14B Instruct	Q5_K_M	24 tok/s	Acceptable
Phi-4 14B	Q4_K_M	24 tok/s	Acceptable
Phi-4 Mini	Q8_0	58 tok/s	Excellent
Qwen 2.5 14B Instruct	Q4_K_M	26 tok/s	Good
Qwen 2.5 Coder 32B Instruct	Q3_K_M	9 tok/s	Marginal
Qwen 2.5 Coder 7B Instruct	Q5_K_M	44 tok/s	Good
Qwen3-14B Instruct	Q8_0	13 tok/s	Acceptable
Qwen3.5-122B-A10B	Q3_K_M	–	Not viable
Qwen3.5-27B	Q3_K_M	30 tok/s	Acceptable
Qwen3.5-397B (MoE)	Q2_K	–	Not viable
Qwen3.6-27B	Q3_K_M	30 tok/s	Acceptable
Stable Diffusion 3 Medium	FP16	–	Acceptable
Stable Diffusion 3.5 Large	FP16	–	Good
StarCoder 2 15B	Q5_K_M	21 tok/s	Acceptable
Whisper Large V3 Turbo	FP16	–	Excellent

Showing 34 of 34 entries

Looking for a desktop build?

Desktop GPUs offer higher sustained performance with no thermal throttling. Check our curated desktop builds for dedicated AI workstations.

FAQ

Frequently Asked Questions

What AI models can NVIDIA RTX 4090 Laptop (150-175W) run?: The NVIDIA RTX 4090 Laptop (150-175W) can run 34 AI models. Top performers include Gemma 4 26B-A4B, Llama 3.2 1B Instruct, Arcee Trinity Nano 6B. See the full compatibility table above for speeds and quality ratings.
Is NVIDIA RTX 4090 Laptop (150-175W) good for AI coding?: Yes. With 16 GB, the NVIDIA RTX 4090 Laptop (150-175W) handles single-model coding workflows well at the Capable tier.
How much VRAM does NVIDIA RTX 4090 Laptop (150-175W) have?: The NVIDIA RTX 4090 Laptop (150-175W) has 16 GB of GDDR6 VRAM with 512 GB/s bandwidth.
Can NVIDIA RTX 4090 Laptop (150-175W) run 70B models?: 70B models can run on the NVIDIA RTX 4090 Laptop (150-175W) with CPU offloading, but performance will be reduced. Consider a device with 48GB+ inference memory for full-speed 70B inference.
Is NVIDIA RTX 4090 Laptop (150-175W) worth it for AI?: Pricing for NVIDIA RTX 4090 Laptop (150-175W) has not been announced. It offers 16 GB GDDR6 VRAM, but OwnRig should treat recommendations as provisional until pricing and benchmarks are available.

Own this GPU?

See every AI model it supports, expected performance, and how to build around it.

Check my rig

All GPUs