DeepSeek Coder V2 Lite 16B on NVIDIA GeForce RTX 4060 8GB
NVIDIA GeForce RTX 4060 8GB handles DeepSeek Coder V2 Lite 16B well at 45 tok/s at Q3_K_M. A solid choice for this model.
Model Size
15.7B
Device VRAM
8 GB
Bandwidth
272 GB/s
Quantizations Tested
1
Performance by Quantization
Each row shows DeepSeek Coder V2 Lite 16B performance at a different quality level on NVIDIA GeForce RTX 4060 8GB.
| Quantization | Speed | TTFT | Fits in VRAM | Rating | Confidence |
|---|---|---|---|---|---|
| Q3_K_M | 45 tok/s | 180ms | ✓ Yes | Good | estimated |
Notes
Q3_K_M
MoE architecture makes this 15.7B total model run like a small model. Excellent coding performance for 8GB.
About DeepSeek Coder V2 Lite 16B
DeepSeek Coder V2 Lite 16B (15.7B) is a coding, ai coding, ai building model. MoE architecture — 15.7B total, ~2.4B active per token. Excellent code generation and completion. Extremely fast inference despite total param count. One of the best coding models for its effective size.
View all DeepSeek Coder V2 Lite 16B hardware options →About NVIDIA GeForce RTX 4060 8GB
NVIDIA GeForce RTX 4060 8GB has 8 GB at 272 GB/s. Street price: $289.
See all models NVIDIA GeForce RTX 4060 8GB can run →Source: MoE model — only ~2.4B active per token. Q3_K_M 7.4GB fits in 8GB (2026-03-15)
Data last updated: 2026-03-01