
18 GB · 150 GB/s
$1,799
Updated 2026-03-15
The Apple M3 Pro (18GB Unified) with 18 GB unified memory can handle 42 AI models across chat, coding, ai_coding. Best performance: all-MiniLM-L6-v2 at 1200 tok/s (good). For AI coding workflows, it supports the Capable AI Coding tier — handles single model workflows well. Current price: approximately $1,799.
— OwnRig methodology, data updated 2026-03-15
Runs 16-22B coding models comfortably, or 32B at reduced quality. Handles single model workflows well.
| Model | Quant | Speed | Rating | Notes |
|---|---|---|---|---|
| Llama 3.2 1B Instruct | Q8_0 | 45 tok/s | Good | 150 GB/s bandwidth limits throughput vs M4 Pro. Good for lightweight tasks. |
| Llama 3.2 3B Instruct | Q8_0 | 35 tok/s | Good | Slower than M4 Pro due to half the bandwidth. Usable for light workloads. |
| Phi-4 Mini | Q8_0 | 30 tok/s | Good | Bandwidth-limited. Good for draft model or light coding. |
| Phi-3 Mini 3.8B Instruct | Q8_0 | 32 tok/s | Good | Slower than M4 Pro. Acceptable for everyday use. |
| nomic-embed-text v1.5 | Q8_0 | 600 tok/s | Good | Embedding throughput limited by 150 GB/s. Good for RAG pipelines. |
| all-MiniLM-L6-v2 | FP16 | 1200 tok/s | Good | Lightweight embeddings. Good for concurrent use with coding models. |
| Whisper Large V3 | Q5_K_M | — | Good | Transcription model. Slower than M4 Pro but usable for voice-to-code. |
| Whisper Large V3 Turbo | FP16 | — | Good | Real-time transcription. 8x faster than full Whisper, minimal quality loss. |
| Llama 3.1 8B Instruct | Q4_K_M | 15 tok/s | Acceptable | 150 GB/s is the bottleneck. Usable but slower than M4 Pro. |
| Mistral 7B Instruct v0.3 | Q4_K_M | 14 tok/s | Acceptable | Bandwidth-limited. Acceptable for chat and light coding. |
| Qwen 2.5 7B Instruct | Q4_K_M | 16 tok/s | Acceptable | Usable. 150 GB/s limits speed vs discrete GPUs. |
| Gemma 2 9B Instruct | Q4_K_M | 13 tok/s | Acceptable | Slower than M4 Pro. Usable for chat and light coding. |
| InternLM 2.5 7B Chat | Q4_K_M | 15 tok/s | Acceptable | Bandwidth-limited. Acceptable for multilingual chat. |
| DeepSeek R1 Distill Qwen 7B | Q4_K_M | 14 tok/s | Acceptable | Reasoning model. Slower on M3 Pro but usable. |
| Qwen 2.5 Coder 7B Instruct | Q4_K_M | 15 tok/s | Acceptable | Coding model. Usable for Cursor/Continue. Slower than M4 Pro. |
| Gemma 3 4B | Q4_K_M | 22 tok/s | Acceptable | Compact model. Good for resource-constrained coding. |
| Gemma 3 12B | Q3_K_M | 5 tok/s | Marginal | ~6.5GB fits in 14GB effective. Very slow at 150 GB/s. Marginal usability. |
| Phi-3 Medium 14B Instruct | Q3_K_M | 6 tok/s | Marginal | Fits at Q3. Very slow at 150 GB/s. Marginal for reasoning tasks. |
| Phi-4 14B | Q3_K_M | 5 tok/s | Marginal | Fits at Q3. Very slow. Marginal for chat and coding. |
| Qwen 2.5 14B Instruct | Q3_K_M | 5 tok/s | Marginal | Fits at Q3. Very slow at 150 GB/s. Marginal. |
| StarCoder 2 15B | Q3_K_M | 4 tok/s | Marginal | Fits at Q3. Very slow. Marginal for code completion. |
| DeepSeek Coder V2 Lite 16B | Q4_K_M | 20 tok/s | Good | MoE 9.1GB fits in 14GB effective. Good coding performance despite low bandwidth. |
| LLaVA 1.6 13B | Q4_K_M | 8 tok/s | Acceptable | Q4_K_M ~8GB fits. Slower at 150 GB/s but usable for image analysis. |
| Stable Diffusion XL 1.0 | FP16 | — | Good | 5GB fits easily. Slower than M4 Pro. Good for image generation. |
| Stable Diffusion 3.5 Large | FP16 | — | Marginal | FP16 12.5GB fits barely in 14GB effective. Marginal headroom. |
| Stable Diffusion 3 Medium | FP16 | — | Good | 5GB fits. Good for SD3 medium quality images. |
| FLUX.1 Dev | Q4_K_M | — | Not Viable | 12B model. Q4 7.2GB fits but 14GB effective + image overhead. Not viable. |
| Gemma 2 27B Instruct | Q4_K_M | — | Not Viable | Q4 16.3GB exceeds 14GB effective. Doesn't fit. |
| Gemma 3 27B | Q3_K_M | 3 tok/s | Marginal | Q3 13.3GB barely fits in 14GB effective. 3 tok/s. Marginal usability. |
| Codestral 22B | Q3_K_M | — | Not Viable | Q3 10.3GB + context overhead exceeds 14GB effective. Doesn't fit. |
| Mistral Small 24B Instruct | Q3_K_M | — | Not Viable | 24B exceeds 14GB effective. Doesn't fit. |
| Yi 1.5 34B Chat | Q3_K_M | — | Not Viable | 34B exceeds 14GB effective. Doesn't fit. |
| Code Llama 34B Instruct | Q3_K_M | — | Not Viable | 34B exceeds 14GB effective. Doesn't fit. |
| Command R 35B | Q3_K_M | — | Not Viable | 35B exceeds 14GB effective. Doesn't fit. |
| QwQ 32B Preview | Q3_K_M | — | Not Viable | 32B exceeds 14GB effective. Doesn't fit. |
| Qwen 2.5 Coder 32B Instruct | Q3_K_M | — | Not Viable | 32B exceeds 14GB effective. Doesn't fit. |
| DeepSeek R1 Distill Qwen 32B | Q3_K_M | — | Not Viable | 32B exceeds 14GB effective. Doesn't fit. |
| Llama 3.1 70B Instruct | Q2_K | — | Not Viable | 70B doesn't fit in 18GB unified. |
| Llama 3.3 70B Instruct | Q2_K | — | Not Viable | 70B doesn't fit in 18GB unified. |
| Qwen 2.5 72B Instruct | Q2_K | — | Not Viable | 72B doesn't fit in 18GB unified. |
| Mixtral 8x7B Instruct | Q2_K | — | Not Viable | MoE 16.4GB exceeds 14GB effective. Doesn't fit. |
| DeepSeek V3 | Q2_K | — | Not Viable | 671B MoE. Doesn't fit. Requires 128GB+ Apple Silicon. |
Prices and availability vary. Inspect hardware before purchasing.
Generation: M3. Last updated: 2026-03-15.