StarCoder · BigCode OpenRAIL-M
Trained on The Stack v2 with 619 programming languages. Strong fill-in-the-middle (FIM) support for code completion. BigCode's best model — competitive but surpassed by Qwen 2.5 Coder on most benchmarks.
StarCoder 2 15B (15.5B) requires 10.7 GB VRAM at recommended quality (Q5_K_M). At efficient quality (Q4_K_M), it fits in 9 GB VRAM, making it compatible with the NVIDIA GeForce RTX 4060 8GB. On NVIDIA GeForce RTX 4090, expect approximately 50 tok/s at Q8_0. For the best experience, Starter AI Desktop ($582) is recommended.
— OwnRig methodology, data updated 2026-03-01
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| full | Q8_0 | 16.8 GB | 15.5 GB |
| recommended | Q5_K_M | 10.7 GB | 9.3 GB |
| efficient | Q4_K_M | 9 GB | 7.8 GB |
| compressed | Q3_K_M | 7.3 GB | 6 GB |
KV cache VRAM at Q5_K_M quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 205 MB | 10.9 GB |
| 4K | 410 MB | 11.1 GB |
| 8K | 819 MB | 11.5 GB |
| 16K | 1.5 GB | 12.2 GB |
Performance data for StarCoder 2 15B across different hardware.
| Device | Quantization | Speed | Rating | Fits in VRAM |
|---|---|---|---|---|
| NVIDIA GeForce RTX 4060 Ti 16GB | Q5_K_M | 25 tok/s | Good | ✓ |
| NVIDIA GeForce RTX 4090 | Q8_0 | 50 tok/s | Excellent | ✓ |
| NVIDIA GeForce RTX 4060 8GB | Q3_K_M | 16 tok/s | Marginal | ✓ |
| NVIDIA GeForce RTX 4070 Ti 12GB | Q3_K_M | 28 tok/s | Acceptable | ✓ |
| NVIDIA GeForce RTX 3080 10GB | Q3_K_M | 22 tok/s | Acceptable | ✓ |
| Apple M3 Pro (18GB Unified) | Q3_K_M | 4 tok/s | Marginal | ✓ |
StarCoder 2 15B is commonly used with Continue, LM Studio. For an AI coding workflow, pair it with an embedding model like nomic-embed-text for local RAG.
Data confidence: verified. Last updated: 2026-03-01. Source