HP Z8 Fury G6i (4× RTX PRO 6000 Max-Q, 384 GB)
Windows · Linux
HP Z8 Fury G6i with four NVIDIA RTX PRO 6000 Blackwell Max-Q GPUs (384 GB GDDR7 total). Intel Xeon 698X 86-core, up to 2 TB DDR5-6400. Runs DeepSeek R1 and V3 (671B) at full FP16 precision. Dual 1,700W PSUs. Enterprise-grade local AI inference server.
From
$90,000
Estimated · varies by configuration
Enterprise pricing varies by configuration and region. Confirm quote and availability with HP.
View on HPMemory
384 GB
GPUs
4×
RAM
2048 GB
Models
63
Type
Desktop
384 GB
4× 96 GB GDDR7
2048 GB
Intel Xeon 698X (86-core Granite Rapids, 350W)
Windows, Linux
Multi-GPU System
This system has 4 GPUs (384 GB total). Models that fit on a single GPU run at full speed. Larger models require cross-GPU inference — actual throughput depends on the inference engine and interconnect bandwidth.
What it can run
63 models| all-MiniLM-L6-v2 | FP16 | 2760 tok/s | Excellent |
| Arcee Trinity Mini 26B | Q8_0 | 75 tok/s | Excellent |
| Arcee Trinity Nano 6B | Q8_0 | 318 tok/s | Excellent |
| Code Llama 34B Instruct | Q5_K_M | 44 tok/s | Good |
| Codestral 22B | Q5_K_M | 67 tok/s | Good |
| Command R 35B | Q8_0 | 26 tok/s | Acceptable |
| DeepSeek Coder V2 Lite 16B | Q8_0 | 58 tok/s | Good |
| DeepSeek R1 | Q2_K | – | Not viable |
| DeepSeek R1 Distill Qwen 32B | Q8_0 | 28 tok/s | Acceptable |
| DeepSeek R1 Distill Qwen 7B | Q8_0 | 119 tok/s | Excellent |
| DeepSeek V3 | Q2_K | – | Not viable |
| FLUX.1 Dev | FP16 | – | Excellent |
| Gemma 2 27B Instruct | Q5_K_M | 54 tok/s | Good |
| Gemma 2 9B Instruct | Q8_0 | 98 tok/s | Excellent |
| Gemma 3 12B | Q8_0 | 74 tok/s | Good |
| Gemma 3 27B | Q8_0 | 33 tok/s | Good |
| Gemma 3 4B | Q8_0 | 210 tok/s | Excellent |
| Gemma 4 26B-A4B | Q8_0 | 279 tok/s | Excellent |
| Gemma 4 31B | Q8_0 | 41 tok/s | Good |
| Gemma 4 E2B | Q8_0 | 271 tok/s | Excellent |
| Gemma 4 E4B | Q8_0 | 168 tok/s | Excellent |
| GigaChat Lightning 10B | Q8_0 | 299 tok/s | Excellent |
| InternLM 2.5 7B Chat | Q8_0 | 117 tok/s | Excellent |
| Llama 3.1 70B Instruct | Q5_K_M | 21 tok/s | Acceptable |
| Llama 3.1 8B Instruct | Q8_0 | 112 tok/s | Excellent |
| Llama 3.2 11B Vision | Q8_0 | 82 tok/s | Excellent |
| Llama 3.2 1B Instruct | Q8_0 | 433 tok/s | Excellent |
| Llama 3.2 3B Instruct | Q8_0 | 239 tok/s | Excellent |
| Llama 3.3 70B Instruct | Q8_0 | 13 tok/s | Acceptable |
| Llama 4 Scout | Q5_K_M | 87 tok/s | Excellent |
| LLaVA 1.6 13B | Q5_K_M | 114 tok/s | Excellent |
| Mistral 7B Instruct v0.3 | Q8_0 | 125 tok/s | Excellent |
| Mistral Large 2 123B | Q5_K_M | 12 tok/s | Acceptable |
| Mistral Small 24B Instruct | Q8_0 | 38 tok/s | Good |
| Mixtral 8x7B Instruct | Q5_K_M | 115 tok/s | Excellent |
| nomic-embed-text v1.5 | FP16 | 1840 tok/s | Excellent |
| NVIDIA Nemotron-3-super-120B-A12B | Q4_K_M | 145 tok/s | Excellent |
| Phi-3 Medium 14B Instruct | Q8_0 | 64 tok/s | Good |
| Phi-3 Mini 3.8B Instruct | Q8_0 | 236 tok/s | Excellent |
| Phi-4 14B | Q8_0 | 62 tok/s | Good |
| Phi-4 Mini | Q8_0 | 236 tok/s | Excellent |
| Qwen 2.5 14B Instruct | Q8_0 | 61 tok/s | Good |
| Qwen 2.5 72B Instruct | Q4_K_M | 24 tok/s | Acceptable |
| Qwen 2.5 7B Instruct | Q8_0 | 119 tok/s | Excellent |
| Qwen 2.5 Coder 32B Instruct | Q5_K_M | 46 tok/s | Good |
| Qwen 2.5 Coder 7B Instruct | Q8_0 | 119 tok/s | Excellent |
| Qwen3-14B Instruct | Q8_0 | 64 tok/s | Good |
| Qwen3-30B-A3B | Q8_0 | 256 tok/s | Excellent |
| Qwen3-32B Instruct | Q8_0 | 29 tok/s | Acceptable |
| Qwen3-8B Instruct | Q8_0 | 110 tok/s | Excellent |
| Qwen3.5-122B-A10B | Q8_0 | 90 tok/s | Excellent |
| Qwen3.5-27B | Q8_0 | 33 tok/s | Good |
| Qwen3.5-397B (MoE) | Q2_K | – | Not viable |
| Qwen3.6-27B | Q8_0 | 33 tok/s | Good |
| Qwen3.6-35B-A3B | Q5_K_M | 256 tok/s | Excellent |
| QwQ 32B Preview | Q5_K_M | 46 tok/s | Good |
| Stable Diffusion 3 Medium | FP16 | – | Excellent |
| Stable Diffusion 3.5 Large | FP16 | – | Excellent |
| Stable Diffusion XL 1.0 | FP16 | – | Excellent |
| StarCoder 2 15B | Q8_0 | 58 tok/s | Good |
| Whisper Large V3 | FP16 | – | Excellent |
| Whisper Large V3 Turbo | FP16 | – | Excellent |
| Yi 1.5 34B Chat | Q8_0 | 27 tok/s | Acceptable |
Showing 63 of 63 entries
Who this machine makes sense for
This machine is aimed at team, lab, or enterprise buyers who want a supported system instead of assembling a tower. 384 GB makes it viable for serious local workloads without a DIY build process.
What to verify first
The main question is not whether the machine works, but whether the price premium is justified by warranty, support, and deployment simplicity versus an equivalent custom build.