AI Workflow

Claude Code + Local Sidecar

basic

Claude Code runs on Anthropic's cloud API for agentic coding. Pair it with local embeddings and a small model for private snippets, offline drafts, and codebase RAG without sending everything to the API.

Claude CodeContinueLM StudioOllama

Concurrent VRAM

5.4 GB

Peak VRAM

5.4 GB

Min Bandwidth

200 GB/s

Models

2

Memory

VRAM Breakdown

How the 5.4 GB concurrent VRAM is used.

Always Running (Concurrent)

nomic-embed-text v1.5(embeddings for rag)
512 MB

FP16 Β· 137M

Switched (Loaded As Needed)

These share VRAM with the largest concurrent model. Only one runs at a time.

Llama 3.1 8B Instruct(local fallback and drafting)
4.9 GB

Q4_K_M

Buying Priority

What matters most for this workflow

This workflow fits on surprisingly modest hardware, so the main decision is whether you want the cheapest workable setup or enough headroom to keep the experience snappy.

Practical Tradeoff

How to think about the hardware

Treat this as a workflow where convenience and control matter more than raw ROI. Local hardware still makes sense, but the win is predictable latency and ownership, not just monthly cost savings.

Return on Investment

Local vs API Costs

Typical Monthly API Cost

$120/mo

Break-Even Point

30 months

Annual Savings

~$1152/yr

Based on moderate-to-heavy Claude Code agentic usage (~$80–150/mo API tokens). Local hardware covers embeddings and a small offline model β€” not Claude inference itself. Budget AI Desktop at $753. Break-even applies when you shift RAG and drafting local; pure cloud-only Claude Code users should compare against API spend, not GPU capex.

Hardware

Recommended Builds

Pre-configured builds that can run the Claude Code + Local Sidecar workflow.

Prefer a Mac? Apple Silicon with unified memory can run this workflow too. See the Mac AI Builder workflow β†’

Build my rig for this workflow β†’