What AI models can NVIDIA RTX 4060 Laptop (40-60W) run?

The NVIDIA RTX 4060 Laptop (40-60W) can run 52 AI models. Top performers include all-MiniLM-L6-v2, nomic-embed-text v1.5, Llama 3.2 1B Instruct. See the full compatibility table above for speeds and quality ratings.

Is NVIDIA RTX 4060 Laptop (40-60W) good for AI coding?

With 8 GB, the NVIDIA RTX 4060 Laptop (40-60W) has limited VRAM for AI coding workflows.

How much VRAM does NVIDIA RTX 4060 Laptop (40-60W) have?

The NVIDIA RTX 4060 Laptop (40-60W) has 8 GB of GDDR6 VRAM with 256 GB/s bandwidth.

Can NVIDIA RTX 4060 Laptop (40-60W) run 70B models?

70B models can run on the NVIDIA RTX 4060 Laptop (40-60W) with CPU offloading, but performance will be reduced. Consider a device with 48GB+ inference memory for full-speed 70B inference.

Is NVIDIA RTX 4060 Laptop (40-60W) worth it for AI?

Pricing for NVIDIA RTX 4060 Laptop (40-60W) has not been announced. It offers 8 GB GDDR6 VRAM, but OwnRig should treat recommendations as provisional until pricing and benchmarks are available.

Laptop GPU

NVIDIA RTX 4060 Laptop (40-60W)

8 GB GDDR6 · 256 GB/s

Pricing

Included in laptop

Not sold as a standalone component

VRAM

8 GB

Bandwidth

256 GB/s

TDP

40W

Models

Tier

Limited

The NVIDIA RTX 4060 Laptop (40-60W) with 8 GB GDDR6 VRAM can handle 52 AI models across embedding, ai_building, coding. Best performance: all-MiniLM-L6-v2 at 5100 tok/s (excellent). Current price has not been announced.

Source: OwnRig methodology

VRAM

8 GB

Bandwidth

256 GB/s

Memory Type

GDDR6

TDP

40W

Form Factor

Laptop (soldered)

Laptop Performance Note

Laptop GPU performance varies by manufacturer, cooling design, and power limits. The tok/s numbers below reflect sustained performance after thermal throttling, not peak. Actual results on your specific laptop may differ by 10-20%.

Builder Capability: Limited

Insufficient VRAM for most AI coding workflows.

Software

Inference Backends

The software stacks that matter most for real-world inference on this device.

CUDA

production

Primary high-performance backend for NVIDIA inference workloads.

Vulkan

stable

Fallback backend for llama.cpp and related local runtimes.

What it can run

52 models


all-MiniLM-L6-v2	FP16	5100 tok/s	Excellent
Arcee Trinity Nano 6B	Q8_0	45 tok/s	Excellent
Code Llama 34B Instruct	Q2_K	–	Not viable
Codestral 22B	Q3_K_M	–	Not viable
Command R 35B	Q2_K	–	Not viable
DeepSeek Coder V2 Lite 16B	Q3_K_M	27 tok/s	Good
DeepSeek R1 Distill Qwen 32B	Q2_K	–	Not viable
DeepSeek R1 Distill Qwen 7B	Q4_K_M	19 tok/s	Acceptable
DeepSeek V3	Q2_K	–	Not viable
FLUX.1 Dev	Q4_K_M	–	Marginal
Gemma 2 27B Instruct	Q3_K_M	–	Not viable
Gemma 2 9B Instruct	Q4_K_M	17 tok/s	Acceptable
Gemma 3 12B	Q3_K_M	11 tok/s	Acceptable
Gemma 3 27B	Q3_K_M	–	Not viable
Gemma 3 4B	Q5_K_M	33 tok/s	Good
Gemma 4 E2B	Q8_0	38 tok/s	Good
Gemma 4 E4B	Q6_K	30 tok/s	Good
GigaChat Lightning 10B	Q4_K_M	48 tok/s	Acceptable
InternLM 2.5 7B Chat	Q4_K_M	18 tok/s	Acceptable
Llama 3.1 70B Instruct	Q2_K	–	Not viable
Llama 3.1 8B Instruct	Q4_K_M	19 tok/s	Acceptable
Llama 3.2 1B Instruct	Q8_0	57 tok/s	Excellent
Llama 3.2 3B Instruct	Q8_0	39 tok/s	Good
Llama 3.3 70B Instruct	Q2_K	–	Not viable
LLaVA 1.6 13B	Q3_K_M	13 tok/s	Acceptable
Mistral 7B Instruct v0.3	Q4_K_M	19 tok/s	Acceptable
Mistral Small 24B Instruct	Q3_K_M	–	Not viable
Mixtral 8x7B Instruct	Q4_K_M	–	Not viable
nomic-embed-text v1.5	Q8_0	2520 tok/s	Excellent
NVIDIA Nemotron-3-super-120B-A12B	Q2_K	–	Not viable
Phi-3 Medium 14B Instruct	Q3_K_M	12 tok/s	Acceptable
Phi-3 Mini 3.8B Instruct	Q5_K_M	31 tok/s	Good
Phi-4 14B	Q3_K_M	11 tok/s	Acceptable
Phi-4 Mini	Q5_K_M	33 tok/s	Good
Qwen 2.5 14B Instruct	Q3_K_M	10 tok/s	Acceptable
Qwen 2.5 72B Instruct	Q2_K	–	Not viable
Qwen 2.5 7B Instruct	Q4_K_M	18 tok/s	Acceptable
Qwen 2.5 Coder 32B Instruct	Q2_K	–	Not viable
Qwen 2.5 Coder 7B Instruct	Q4_K_M	19 tok/s	Acceptable
Qwen3-14B Instruct	Q3_K_M	11 tok/s	Acceptable
Qwen3-8B Instruct	Q5_K_M	14 tok/s	Acceptable
Qwen3.5-27B	Q3_K_M	–	Not viable
Qwen3.5-397B (MoE)	Q2_K	–	Not viable
Qwen3.6-27B	Q3_K_M	–	Not viable
QwQ 32B Preview	Q2_K	–	Not viable
Stable Diffusion 3 Medium	FP16	–	Good
Stable Diffusion 3.5 Large	Q8_0	–	Not viable
Stable Diffusion XL 1.0	FP16	–	Good
StarCoder 2 15B	Q3_K_M	10 tok/s	Acceptable
Whisper Large V3	Q5_K_M	–	Excellent
Whisper Large V3 Turbo	FP16	–	Excellent
Yi 1.5 34B Chat	Q2_K	–	Not viable

Showing 52 of 52 entries

Looking for a desktop build?

Desktop GPUs offer higher sustained performance with no thermal throttling. Check our curated desktop builds for dedicated AI workstations.

FAQ

Frequently Asked Questions

What AI models can NVIDIA RTX 4060 Laptop (40-60W) run?: The NVIDIA RTX 4060 Laptop (40-60W) can run 52 AI models. Top performers include all-MiniLM-L6-v2, nomic-embed-text v1.5, Llama 3.2 1B Instruct. See the full compatibility table above for speeds and quality ratings.
Is NVIDIA RTX 4060 Laptop (40-60W) good for AI coding?: With 8 GB, the NVIDIA RTX 4060 Laptop (40-60W) has limited VRAM for AI coding workflows.
How much VRAM does NVIDIA RTX 4060 Laptop (40-60W) have?: The NVIDIA RTX 4060 Laptop (40-60W) has 8 GB of GDDR6 VRAM with 256 GB/s bandwidth.
Can NVIDIA RTX 4060 Laptop (40-60W) run 70B models?: 70B models can run on the NVIDIA RTX 4060 Laptop (40-60W) with CPU offloading, but performance will be reduced. Consider a device with 48GB+ inference memory for full-speed 70B inference.
Is NVIDIA RTX 4060 Laptop (40-60W) worth it for AI?: Pricing for NVIDIA RTX 4060 Laptop (40-60W) has not been announced. It offers 8 GB GDDR6 VRAM, but OwnRig should treat recommendations as provisional until pricing and benchmarks are available.

Own this GPU?

See every AI model it supports, expected performance, and how to build around it.

Check my rig

All GPUs