Mistral
ChatCodingReasoning24B
Chat

Mistral Small 24B Instruct

Mistral Β· Apache 2.0

Mistral's efficient 24B model with chat, coding, and reasoning capabilities.

Parameters
24B
Architecture
Dense
Context
32,768 tokens
Released
2025-01-30
Engines
llama.cpp, ollama, vLLM, TGI
Builder Tools
Cursor, Continue, Aider, Open WebUI, LM Studio

Parameters

24B

VRAM

20.5 GB

Context

32K

Formats

6

GPUs

20

Mistral Small 24B Instruct (24B) requires 20.5 GB VRAM at recommended quality (Q6_K). At efficient quality (Q4_K_M), it fits in 14 GB VRAM, making it compatible with the AMD Radeon RX 9060 XT 16GB. On NVIDIA Grace Blackwell Ultra GB300, expect approximately 150 tok/s at Q8_0. For the best experience, AMD AI Powerhouse ($1,818) is recommended.

Source: OwnRig methodology

VRAM (Recommended)

20.5 GB

Quantization

Q6_K

File Size

18 GB

Max Context

32K tokens

Primary Use

Chat

Memory

VRAM Requirements

QualityQuantizationVRAMFile Size
fullQ8_026.5 GB24 GB
recommendedQ6_K20.5 GB18 GB
recommendedQ5_K_M17.2 GB15 GB
efficientQ4_K_M14 GB12 GB
compressedQ3_K_M11.2 GB9.5 GB
compressedQ2_K8.8 GB7.4 GB
Scaling

Context Length Impact

KV cache VRAM at Q6_K quality. Longer context = more memory.

ContextKV CacheTotal VRAM
2K410 MB20.9 GB
4K819 MB21.3 GB
8K1.5 GB22 GB
16K3.1 GB23.6 GB
32K6.1 GB26.6 GBexceeds 24 GB

Compatible GPUs

20 devices
NVIDIA Grace Blackwell Ultra GB300Q8_0150 tok/sExcellent
NVIDIA GeForce RTX 5090Q5_K_M55 tok/sExcellent
Apple M4 Max (64GB Unified)Q5_K_M22 tok/sGood
NVIDIA GeForce RTX 4090Q5_K_M32 tok/sGood
AMD Radeon RX 7900 XTXQ5_K_M28 tok/sGood
AMD Radeon Pro W7900Q5_K_M24 tok/sGood
NVIDIA RTX PRO 6000 BlackwellQ8_041 tok/sGood
NVIDIA RTX PRO 6000 Blackwell Max-QQ8_038 tok/sGood
AMD Radeon RX 9070Q3_K_M32 tok/sGood
NVIDIA GeForce RTX 4070 Ti SuperQ3_K_M18 tok/sAcceptable
AMD Radeon RX 9060 XT 16GBQ3_K_M16 tok/sAcceptable
AMD Radeon RX 7600Q3_K_M2 tok/sMarginal
Apple M3 Pro (18GB Unified)Q3_K_M–Not viable
NVIDIA GeForce RTX 3080 10GBQ2_K–Not viable
NVIDIA GeForce RTX 4060 8GBQ3_K_M–Not viable
NVIDIA RTX 4060 Laptop (40-60W)Q3_K_M–Not viable
NVIDIA RTX 4070 Laptop (80-115W)Q3_K_M–Not viable
NVIDIA GeForce RTX 4070 Ti 12GBQ3_K_M–Not viable
AMD Radeon RX 9060 XT 8GBQ3_K_M–Not viable
NVIDIA GeForce RTX 5060 8GBQ3_K_M–Not viable

Showing 20 of 20 entries

Builder Context

Mistral Small 24B Instruct is commonly used with Cursor, Continue, Aider, Open WebUI, LM Studio. For an AI coding workflow, pair it with an embedding model like nomic-embed-text for local RAG.

FAQ

Frequently Asked Questions

How much VRAM does Mistral Small 24B Instruct need?
Mistral Small 24B Instruct requires 20.5 GB VRAM at recommended quality (Q6_K). At lower quality settings, it can fit in as little as 8.8 GB.
What is the best GPU for Mistral Small 24B Instruct?
The NVIDIA Grace Blackwell Ultra GB300 delivers the best performance for Mistral Small 24B Instruct, achieving 150 tok/s at Q8_0 with an excellent rating.
What quantization should I use for Mistral Small 24B Instruct?
For the best quality, use Q6_K (20.5 GB VRAM). If your GPU has limited VRAM, Q2_K (8.8 GB) is the most efficient option with acceptable quality.
Is Mistral Small 24B Instruct good for coding?
Yes. Mistral Small 24B Instruct is used with Cursor, Continue, Aider, Open WebUI, LM Studio for local AI coding. For the best coding experience, pair it with an embedding model for local RAG.
All models

Data confidence: estimated. Source

VRAM requirements are calculated from model parameters and may vary by inference engine, context length, and batch size. Performance estimates are based on community benchmarks and should be verified for your specific configuration.Mistral is a trademark of its respective owner. OwnRig is not affiliated with or endorsed by the model creator.