NVIDIA RTX 4070 Laptop (80-115W)
8 GB GDDR6 Β· 256 GB/s
Pricing
Included in laptop
Not sold as a standalone component
VRAM
8 GB
Bandwidth
256 GB/s
TDP
80W
Models
52
Tier
Limited
The NVIDIA RTX 4070 Laptop (80-115W) with 8 GB GDDR6 VRAM can handle 52 AI models across embedding, ai_building, coding. Best performance: all-MiniLM-L6-v2 at 5950 tok/s (excellent). Current price has not been announced.
Source: OwnRig methodology
8 GB
256 GB/s
GDDR6
80W
Laptop (soldered)
Laptop Performance Note
Laptop GPU performance varies by manufacturer, cooling design, and power limits. The tok/s numbers below reflect sustained performance after thermal throttling, not peak. Actual results on your specific laptop may differ by 10-20%.
Builder Capability: Limited
Insufficient VRAM for most AI coding workflows.
Inference Backends
The software stacks that matter most for real-world inference on this device.
CUDA
productionPrimary high-performance backend for NVIDIA inference workloads.
Vulkan
stableFallback backend for llama.cpp and related local runtimes.
What it can run
52 models| all-MiniLM-L6-v2 | FP16 | 5950 tok/s | Excellent |
| Arcee Trinity Nano 6B | Q8_0 | 45 tok/s | Excellent |
| Code Llama 34B Instruct | Q2_K | β | Not viable |
| Codestral 22B | Q3_K_M | β | Not viable |
| Command R 35B | Q2_K | β | Not viable |
| DeepSeek Coder V2 Lite 16B | Q3_K_M | 31 tok/s | Good |
| DeepSeek R1 Distill Qwen 32B | Q2_K | β | Not viable |
| DeepSeek R1 Distill Qwen 7B | Q4_K_M | 22 tok/s | Acceptable |
| DeepSeek V3 | Q2_K | β | Not viable |
| FLUX.1 Dev | Q4_K_M | β | Marginal |
| Gemma 2 27B Instruct | Q3_K_M | β | Not viable |
| Gemma 2 9B Instruct | Q4_K_M | 20 tok/s | Acceptable |
| Gemma 3 12B | Q3_K_M | 13 tok/s | Acceptable |
| Gemma 3 27B | Q3_K_M | β | Not viable |
| Gemma 3 4B | Q5_K_M | 39 tok/s | Good |
| Gemma 4 E2B | Q8_0 | 38 tok/s | Good |
| Gemma 4 E4B | Q6_K | 30 tok/s | Good |
| GigaChat Lightning 10B | Q4_K_M | 56 tok/s | Acceptable |
| InternLM 2.5 7B Chat | Q4_K_M | 21 tok/s | Acceptable |
| Llama 3.1 70B Instruct | Q2_K | β | Not viable |
| Llama 3.1 8B Instruct | Q4_K_M | 22 tok/s | Acceptable |
| Llama 3.2 1B Instruct | Q8_0 | 67 tok/s | Excellent |
| Llama 3.2 3B Instruct | Q8_0 | 46 tok/s | Good |
| Llama 3.3 70B Instruct | Q2_K | β | Not viable |
| LLaVA 1.6 13B | Q3_K_M | 15 tok/s | Acceptable |
| Mistral 7B Instruct v0.3 | Q4_K_M | 22 tok/s | Acceptable |
| Mistral Small 24B Instruct | Q3_K_M | β | Not viable |
| Mixtral 8x7B Instruct | Q4_K_M | β | Not viable |
| nomic-embed-text v1.5 | Q8_0 | 2940 tok/s | Excellent |
| NVIDIA Nemotron-3-super-120B-A12B | Q2_K | β | Not viable |
| Phi-3 Medium 14B Instruct | Q3_K_M | 14 tok/s | Acceptable |
| Phi-3 Mini 3.8B Instruct | Q5_K_M | 36 tok/s | Good |
| Phi-4 14B | Q3_K_M | 13 tok/s | Acceptable |
| Phi-4 Mini | Q5_K_M | 39 tok/s | Good |
| Qwen 2.5 14B Instruct | Q3_K_M | 12 tok/s | Acceptable |
| Qwen 2.5 72B Instruct | Q2_K | β | Not viable |
| Qwen 2.5 7B Instruct | Q4_K_M | 21 tok/s | Acceptable |
| Qwen 2.5 Coder 32B Instruct | Q2_K | β | Not viable |
| Qwen 2.5 Coder 7B Instruct | Q4_K_M | 22 tok/s | Acceptable |
| Qwen3-14B Instruct | Q3_K_M | 13 tok/s | Acceptable |
| Qwen3-8B Instruct | Q5_K_M | 16 tok/s | Acceptable |
| Qwen3.5-27B | Q3_K_M | β | Not viable |
| Qwen3.5-397B (MoE) | Q2_K | β | Not viable |
| Qwen3.6-27B | Q3_K_M | β | Not viable |
| QwQ 32B Preview | Q2_K | β | Not viable |
| Stable Diffusion 3 Medium | FP16 | β | Good |
| Stable Diffusion 3.5 Large | Q8_0 | β | Not viable |
| Stable Diffusion XL 1.0 | FP16 | β | Good |
| StarCoder 2 15B | Q3_K_M | 11 tok/s | Acceptable |
| Whisper Large V3 | Q5_K_M | β | Excellent |
| Whisper Large V3 Turbo | FP16 | β | Excellent |
| Yi 1.5 34B Chat | Q2_K | β | Not viable |
Showing 52 of 52 entries
Available in these Machines
Looking for a desktop build?
Desktop GPUs offer higher sustained performance with no thermal throttling. Check our curated desktop builds for dedicated AI workstations.
Frequently Asked Questions
- What AI models can NVIDIA RTX 4070 Laptop (80-115W) run?
- The NVIDIA RTX 4070 Laptop (80-115W) can run 52 AI models. Top performers include all-MiniLM-L6-v2, nomic-embed-text v1.5, Llama 3.2 1B Instruct. See the full compatibility table above for speeds and quality ratings.
- Is NVIDIA RTX 4070 Laptop (80-115W) good for AI coding?
- With 8 GB, the NVIDIA RTX 4070 Laptop (80-115W) has limited VRAM for AI coding workflows.
- How much VRAM does NVIDIA RTX 4070 Laptop (80-115W) have?
- The NVIDIA RTX 4070 Laptop (80-115W) has 8 GB of GDDR6 VRAM with 256 GB/s bandwidth.
- Can NVIDIA RTX 4070 Laptop (80-115W) run 70B models?
- 70B models can run on the NVIDIA RTX 4070 Laptop (80-115W) with CPU offloading, but performance will be reduced. Consider a device with 48GB+ inference memory for full-speed 70B inference.
- Is NVIDIA RTX 4070 Laptop (80-115W) worth it for AI?
- Pricing for NVIDIA RTX 4070 Laptop (80-115W) has not been announced. It offers 8 GB GDDR6 VRAM, but OwnRig should treat recommendations as provisional until pricing and benchmarks are available.
Own this GPU?
See every AI model it supports, expected performance, and how to build around it.