AI Workflow

Model Fine-Tuning & Training

training

Fine-tune language models locally with QLoRA, LoRA, and full fine-tuning. Train custom adapters for domain-specific tasks without sending proprietary data to third-party APIs. VRAM requirements scale with model size and method: QLoRA fine-tuning a 7B model fits in 16GB, while full fine-tuning of 32B models needs 48GB+. System RAM matters: gradient checkpointing and dataset loading use 2-4x the model's VRAM in system memory.

AxolotlUnslothHugging Face TRLLLaMA FactoryPEFT

Concurrent VRAM

10 GB

Peak VRAM

16 GB

Min Bandwidth

400 GB/s

Models

Memory

VRAM Breakdown

How the 10 GB concurrent VRAM is used.

Switched (Loaded As Needed)

These share VRAM with the largest concurrent model. Only one runs at a time.

Llama 3.1 8B Instruct(qlora fine tuning target)

10 GB

Q4_K_M

Mistral 7B Instruct v0.3(qlora fine tuning target)

10 GB

Q4_K_M

Qwen 2.5 Coder 7B Instruct(code fine tuning target)

10 GB

Q4_K_M

Return on Investment

Local vs API Costs

Typical Monthly API Cost

$200/mo

Break-Even Point

10 months

Annual Savings

~$1920/yr

Based on OpenAI fine-tuning API pricing ($8/1M training tokens, 3-5 fine-tuning runs/month on 50K-row datasets). Local fine-tuning is unlimited iterations with zero per-token cost. Electricity cost ~$15/mo at 6hr/day GPU usage during training. Mid-Range Workstation at ~$1,400. Privacy advantage is the real differentiator: proprietary data never leaves your machine.

Hardware