Claude Code + Local Sidecar
basicClaude Code runs on Anthropic's cloud API for agentic coding. Pair it with local embeddings and a small model for private snippets, offline drafts, and codebase RAG without sending everything to the API.
Concurrent VRAM
5.4 GB
Peak VRAM
5.4 GB
Min Bandwidth
200 GB/s
Models
2
VRAM Breakdown
How the 5.4 GB concurrent VRAM is used.
Always Running (Concurrent)
Switched (Loaded As Needed)
These share VRAM with the largest concurrent model. Only one runs at a time.
Q4_K_M
What matters most for this workflow
This workflow fits on surprisingly modest hardware, so the main decision is whether you want the cheapest workable setup or enough headroom to keep the experience snappy.
How to think about the hardware
Treat this as a workflow where convenience and control matter more than raw ROI. Local hardware still makes sense, but the win is predictable latency and ownership, not just monthly cost savings.
Local vs API Costs
Typical Monthly API Cost
$120/mo
Break-Even Point
30 months
Annual Savings
~$1152/yr
Based on moderate-to-heavy Claude Code agentic usage (~$80β150/mo API tokens). Local hardware covers embeddings and a small offline model β not Claude inference itself. Budget AI Desktop at $753. Break-even applies when you shift RAG and drafting local; pure cloud-only Claude Code users should compare against API spend, not GPU capex.
Recommended Builds
Pre-configured builds that can run the Claude Code + Local Sidecar workflow.
Prefer a Mac? Apple Silicon with unified memory can run this workflow too. See the Mac AI Builder workflow β