NVIDIA RTX 4080 Laptop (120-150W)
12 GB GDDR6 Β· 384 GB/s
Pricing
Included in laptop
Not sold as a standalone component
VRAM
12 GB
Bandwidth
384 GB/s
TDP
120W
Models
40
Tier
Starter
The NVIDIA RTX 4080 Laptop (120-150W) with 12 GB GDDR6 VRAM can handle 40 AI models across embedding, ai_building, coding. Best performance: all-MiniLM-L6-v2 at 8400 tok/s (excellent). For AI coding workflows, it supports the Starter AI Coding tier, good for 7β8B models. Current price has not been announced.
Source: OwnRig methodology
12 GB
384 GB/s
GDDR6
120W
Laptop (soldered)
Laptop Performance Note
Laptop GPU performance varies by manufacturer, cooling design, and power limits. The tok/s numbers below reflect sustained performance after thermal throttling, not peak. Actual results on your specific laptop may differ by 10-20%.
Builder Capability: Starter AI Coding
Runs 7-8B models comfortably. Good for basic local code completion and small model experiments.
Inference Backends
The software stacks that matter most for real-world inference on this device.
CUDA
productionPrimary high-performance backend for NVIDIA inference workloads.
Vulkan
stableFallback backend for llama.cpp and related local runtimes.
What it can run
40 models| all-MiniLM-L6-v2 | FP16 | 8400 tok/s | Excellent |
| Arcee Trinity Mini 26B | Q3_K_M | 6 tok/s | Not viable |
| Arcee Trinity Nano 6B | Q8_0 | 68 tok/s | Excellent |
| Codestral 22B | Q3_K_M | 8 tok/s | Marginal |
| DeepSeek Coder V2 Lite 16B | Q4_K_M | 39 tok/s | Good |
| DeepSeek R1 Distill Qwen 7B | Q5_K_M | 34 tok/s | Good |
| Gemma 2 9B Instruct | Q5_K_M | 34 tok/s | Good |
| Gemma 3 12B | Q4_K_M | 22 tok/s | Acceptable |
| Gemma 3 4B | Q8_0 | 59 tok/s | Excellent |
| Gemma 4 26B-A4B | Q3_K_M | 8 tok/s | Not viable |
| Gemma 4 E2B | Q8_0 | 57 tok/s | Good |
| Gemma 4 E4B | Q8_0 | 35 tok/s | Good |
| GigaChat Lightning 10B | Q4_K_M | 80 tok/s | Acceptable |
| InternLM 2.5 7B Chat | Q5_K_M | 32 tok/s | Good |
| Llama 3.1 8B Instruct | Q5_K_M | 36 tok/s | Good |
| Llama 3.2 1B Instruct | Q8_0 | 98 tok/s | Excellent |
| Llama 3.2 3B Instruct | Q8_0 | 67 tok/s | Excellent |
| LLaVA 1.6 13B | Q4_K_M | 20 tok/s | Acceptable |
| Mistral 7B Instruct v0.3 | Q5_K_M | 35 tok/s | Good |
| nomic-embed-text v1.5 | Q8_0 | 4550 tok/s | Excellent |
| NVIDIA Nemotron-3-super-120B-A12B | Q2_K | β | Not viable |
| Phi-3 Medium 14B Instruct | Q3_K_M | 25 tok/s | Good |
| Phi-3 Mini 3.8B Instruct | Q8_0 | 55 tok/s | Excellent |
| Phi-4 14B | Q3_K_M | 24 tok/s | Acceptable |
| Phi-4 Mini | Q8_0 | 57 tok/s | Excellent |
| Qwen 2.5 14B Instruct | Q4_K_M | 21 tok/s | Acceptable |
| Qwen 2.5 7B Instruct | Q5_K_M | 34 tok/s | Good |
| Qwen 2.5 Coder 7B Instruct | Q5_K_M | 34 tok/s | Good |
| Qwen3-14B Instruct | Q5_K_M | 16 tok/s | Acceptable |
| Qwen3-8B Instruct | Q8_0 | 21 tok/s | Acceptable |
| Qwen3.5-122B-A10B | Q3_K_M | β | Not viable |
| Qwen3.5-27B | Q3_K_M | 8 tok/s | Marginal |
| Qwen3.5-397B (MoE) | Q2_K | β | Not viable |
| Qwen3.6-27B | Q3_K_M | β | Not viable |
| Stable Diffusion 3 Medium | FP16 | β | Excellent |
| Stable Diffusion 3.5 Large | Q8_0 | β | Good |
| Stable Diffusion XL 1.0 | FP16 | β | Excellent |
| StarCoder 2 15B | Q3_K_M | 20 tok/s | Acceptable |
| Whisper Large V3 | FP16 | β | Excellent |
| Whisper Large V3 Turbo | FP16 | β | Excellent |
Showing 40 of 40 entries
Looking for a desktop build?
Desktop GPUs offer higher sustained performance with no thermal throttling. Check our curated desktop builds for dedicated AI workstations.
Frequently Asked Questions
- What AI models can NVIDIA RTX 4080 Laptop (120-150W) run?
- The NVIDIA RTX 4080 Laptop (120-150W) can run 40 AI models. Top performers include all-MiniLM-L6-v2, nomic-embed-text v1.5, Llama 3.2 1B Instruct. See the full compatibility table above for speeds and quality ratings.
- Is NVIDIA RTX 4080 Laptop (120-150W) good for AI coding?
- With 12 GB, the NVIDIA RTX 4080 Laptop (120-150W) runs 7-8B coding models at the Starter tier. Good for basic code completion.
- How much VRAM does NVIDIA RTX 4080 Laptop (120-150W) have?
- The NVIDIA RTX 4080 Laptop (120-150W) has 12 GB of GDDR6 VRAM with 384 GB/s bandwidth.
- Can NVIDIA RTX 4080 Laptop (120-150W) run 70B models?
- 70B models can run on the NVIDIA RTX 4080 Laptop (120-150W) with CPU offloading, but performance will be reduced. Consider a device with 48GB+ inference memory for full-speed 70B inference.
- Is NVIDIA RTX 4080 Laptop (120-150W) worth it for AI?
- Pricing for NVIDIA RTX 4080 Laptop (120-150W) has not been announced. It offers 12 GB GDDR6 VRAM, but OwnRig should treat recommendations as provisional until pricing and benchmarks are available.
Own this GPU?
See every AI model it supports, expected performance, and how to build around it.