nomic-embed-text v1.5
Nomic · Apache 2.0
High-quality text embedding model for RAG pipelines. 137M params, negligible VRAM. Competitive with OpenAI's ada-002 on MTEB benchmarks. Essential for builders running local RAG with Cursor or similar tools. Can run concurrently with coding models without meaningful VRAM impact.
- Parameters
- 137M
- Architecture
- Dense
- Context
- 8,192 tokens
- Released
- 2024-02-02
- Engines
- ollama, llama.cpp
- Builder Tools
- Cursor, Continue, AnythingLLM, Open WebUI
Parameters
137M
VRAM
410 MB
Context
8K
Formats
2
GPUs
22
nomic-embed-text v1.5 (137M) requires 410 MB VRAM at recommended quality (Q8_0). On NVIDIA GeForce RTX 4070 Ti 12GB, expect approximately 6500 tok/s at Q8_0. For the best experience, Starter AI Desktop ($582) is recommended.
Source: OwnRig methodology
410 MB
Q8_0
0.14 GB
8K tokens
Embeddings
VRAM Requirements
| Quality | Quantization | VRAM | File Size |
|---|---|---|---|
| full | FP16 | 512 MB | 0.27 GB |
| recommended | Q8_0 | 410 MB | 0.14 GB |
Context Length Impact
KV cache VRAM at Q8_0 quality. Longer context = more memory.
| Context | KV Cache | Total VRAM |
|---|---|---|
| 2K | 0 MB | 410 MB |
| 4K | 0 MB | 410 MB |
| 8K | 102 MB | 512 MB |
Compatible GPUs
22 devices| NVIDIA Grace Blackwell Ultra GB300 | FP16 | 2000 tok/s | Excellent |
| Apple M4 Max (64GB Unified) | FP16 | – | Excellent |
| Apple M4 Pro (48GB) | FP16 | – | Excellent |
| NVIDIA GeForce RTX 3060 12GB | FP16 | – | Excellent |
| NVIDIA GeForce RTX 3080 10GB | Q8_0 | 2500 tok/s | Excellent |
| NVIDIA GeForce RTX 4060 8GB | Q8_0 | 4200 tok/s | Excellent |
| NVIDIA RTX 4060 Laptop (40-60W) | Q8_0 | 2520 tok/s | Excellent |
| NVIDIA RTX 4070 Laptop (80-115W) | Q8_0 | 2940 tok/s | Excellent |
| NVIDIA GeForce RTX 4070 Super | FP16 | – | Excellent |
| NVIDIA GeForce RTX 4070 Ti 12GB | Q8_0 | 6500 tok/s | Excellent |
| NVIDIA RTX 4080 Laptop (120-150W) | Q8_0 | 4550 tok/s | Excellent |
| NVIDIA GeForce RTX 4090 | FP16 | – | Excellent |
| AMD Radeon Pro W7900 | FP16 | – | Excellent |
| NVIDIA RTX PRO 6000 Blackwell | FP16 | 2000 tok/s | Excellent |
| NVIDIA RTX PRO 6000 Blackwell Max-Q | FP16 | 1840 tok/s | Excellent |
| NVIDIA GeForce RTX 5060 8GB | Q8_0 | 4830 tok/s | Excellent |
| Apple M3 Pro (18GB Unified) | Q8_0 | 600 tok/s | Good |
| AMD Radeon RX 7600 | Q8_0 | 500 tok/s | Good |
| AMD Radeon RX 7900 XTX | FP16 | – | Good |
| AMD Radeon RX 9070 | FP16 | – | Acceptable |
| AMD Radeon RX 9060 XT 16GB | FP16 | – | Acceptable |
| AMD Radeon RX 9060 XT 8GB | FP16 | – | Acceptable |
Showing 22 of 22 entries
Builder Context
nomic-embed-text v1.5 is commonly used with Cursor, Continue, AnythingLLM, Open WebUI.
Recommended Builds
Complete PC builds that can run nomic-embed-text v1.5.
AI Builder Workstation
Run every AI tool you need. Nothing leaves your machine
Runs 10 models
Budget AI Desktop
Your own AI coding setup for under $800
Runs 7 models
Budget Home AI Server
Always-on AI assistant for the whole household
Runs 7 models
Compact SFF AI Build
Serious AI power in a compact, desk-friendly form factor
Runs 5 models
High-End Home AI Server
Your household's private AI: chatbots, code tools, and more
Runs 12 models
Mac Studio AI Builder
Plug in and run AI: silent, powerful, no assembly required
Runs 6 models
Mid-Range AI Workstation
The sweet spot for AI: handles most models without overspending
Runs 8 models
Mid-Range Home AI Server
Serve multiple AI models to every device at home
Runs 9 models
Silent Mini-ITX AI Box
Whisper-quiet AI processing for noise-sensitive environments
Runs 8 models
Frequently Asked Questions
- How much VRAM does nomic-embed-text v1.5 need?
- nomic-embed-text v1.5 requires 410 MB VRAM at recommended quality (Q8_0). At lower quality settings, it can fit in as little as 410 MB.
- What is the best GPU for nomic-embed-text v1.5?
- The NVIDIA GeForce RTX 4070 Ti 12GB delivers the best performance for nomic-embed-text v1.5, achieving 6500 tok/s at Q8_0 with an excellent rating.
- What quantization should I use for nomic-embed-text v1.5?
- For the best quality, use Q8_0 (410 MB VRAM). If your GPU has limited VRAM, Q8_0 (410 MB) is the most efficient option with acceptable quality.
Data confidence: verified. Source
VRAM requirements are calculated from model parameters and may vary by inference engine, context length, and batch size. Performance estimates are based on community benchmarks and should be verified for your specific configuration.Nomic is a trademark of its respective owner. OwnRig is not affiliated with or endorsed by the model creator.