NVIDIA
Desktop GPU
Desktop GPU

NVIDIA GeForce RTX 4070 Ti Super

16 GB GDDR6X Β· 672 GB/s

From

$779

Estimated street price

VRAM

16 GB

Bandwidth

672 GB/s

TDP

285W

Models

26

Tier

Capable

The NVIDIA GeForce RTX 4070 Ti Super with 16 GB GDDR6X VRAM can handle 26 AI models across reasoning, coding, chat. Best performance: Gemma 4 26B-A4B at 229 tok/s (excellent). For AI coding workflows, it supports the Capable AI Coding tier, handling single model workflows well. Current price: approximately $779.

Source: OwnRig methodology

VRAM

16 GB

Bandwidth

672 GB/s

Memory Type

GDDR6X

TDP

285W

Form Factor

2-slot, 300mm

Builder Capability: Capable AI Coding

Runs 16-22B coding models comfortably, or 32B at reduced quality. Handles single model workflows well.

Software

Inference Backends

The software stacks that matter most for real-world inference on this device.

CUDA

production

Primary high-performance backend for NVIDIA inference workloads.

Vulkan

stable

Fallback backend for llama.cpp and related local runtimes.

What it can run

26 models
Arcee Trinity Mini 26BQ3_K_M64 tok/sExcellent
Arcee Trinity Nano 6BQ8_0119 tok/sExcellent
DeepSeek R1 Distill Qwen 32BQ3_K_M15 tok/sAcceptable
DeepSeek V3Q2_K–Not viable
Gemma 3 27BQ3_K_M12 tok/sAcceptable
Gemma 4 26B-A4BQ3_K_M229 tok/sExcellent
Gemma 4 31BQ3_K_M15 tok/sAcceptable
Gemma 4 E2BQ8_0101 tok/sExcellent
Gemma 4 E4BQ8_062 tok/sExcellent
GigaChat Lightning 10BQ8_072 tok/sAcceptable
Llama 3.1 8B InstructQ8_075 tok/sExcellent
Llama 3.2 11B VisionQ8_055 tok/sExcellent
Llama 3.2 1B InstructQ8_0190 tok/sExcellent
Llama 3.2 3B InstructQ8_0130 tok/sExcellent
Mistral Small 24B InstructQ3_K_M18 tok/sAcceptable
NVIDIA Nemotron-3-super-120B-A12BQ2_K–Not viable
Phi-4 14BQ5_K_M42 tok/sGood
Phi-4 MiniQ8_0120 tok/sExcellent
Qwen 2.5 Coder 32B InstructQ3_K_M16 tok/sAcceptable
Qwen3-14B InstructQ8_029 tok/sGood
Qwen3.5-122B-A10BQ3_K_M–Not viable
Qwen3.5-27BQ3_K_M32 tok/sAcceptable
Qwen3.5-397B (MoE)Q2_K–Not viable
Qwen3.6-27BQ3_K_M32 tok/sAcceptable
Stable Diffusion 3.5 LargeFP16–Excellent
Whisper Large V3 TurboFP16–Excellent

Showing 26 of 26 entries

Buy Used

Prices and availability vary. Inspect hardware before purchasing. Some links may be affiliate links.

FAQ

Frequently Asked Questions

What AI models can NVIDIA GeForce RTX 4070 Ti Super run?
The NVIDIA GeForce RTX 4070 Ti Super can run 26 AI models. Top performers include Gemma 4 26B-A4B, Llama 3.2 1B Instruct, Llama 3.2 3B Instruct. See the full compatibility table above for speeds and quality ratings.
Is NVIDIA GeForce RTX 4070 Ti Super good for AI coding?
Yes. With 16 GB, the NVIDIA GeForce RTX 4070 Ti Super handles single-model coding workflows well at the Capable tier.
How much VRAM does NVIDIA GeForce RTX 4070 Ti Super have?
The NVIDIA GeForce RTX 4070 Ti Super has 16 GB of GDDR6X VRAM with 672 GB/s bandwidth.
Can NVIDIA GeForce RTX 4070 Ti Super run 70B models?
70B models can run on the NVIDIA GeForce RTX 4070 Ti Super with CPU offloading, but performance will be reduced. Consider a GPU with 48GB+ VRAM for full-speed 70B inference.
Is NVIDIA GeForce RTX 4070 Ti Super worth it for AI?
At $779, the NVIDIA GeForce RTX 4070 Ti Super offers 16 GB VRAM and runs 26 AI models. It works for smaller models and experimentation.

Own this GPU?

See every AI model it supports, expected performance, and how to build around it.

Check my rig