HP Z8 Fury G6i (1× RTX PRO 6000, 96 GB)
Windows · Linux
HP Z8 Fury G6i workstation with a single NVIDIA RTX PRO 6000 Blackwell (96 GB GDDR7). Intel Xeon 600-series CPU, 256 GB DDR5-6400 system RAM. Runs 70B models at high quantization quality. A professional single-GPU inference workstation ; Windows or Linux.
From
$17,500
Estimated · varies by configuration
Enterprise pricing varies by configuration and region. Confirm quote and availability with HP.
View on HPMemory
96 GB
GPUs
1×
RAM
256 GB
Models
63
Type
Desktop
96 GB
96 GB GDDR7
256 GB
Intel Xeon w5-3435X (16-core Granite Rapids)
Windows, Linux
What it can run
63 models| all-MiniLM-L6-v2 | FP16 | 3000 tok/s | Excellent |
| Arcee Trinity Mini 26B | Q8_0 | 75 tok/s | Excellent |
| Arcee Trinity Nano 6B | Q8_0 | 318 tok/s | Excellent |
| Code Llama 34B Instruct | Q5_K_M | 48 tok/s | Good |
| Codestral 22B | Q5_K_M | 73 tok/s | Good |
| Command R 35B | Q8_0 | 28 tok/s | Acceptable |
| DeepSeek Coder V2 Lite 16B | Q8_0 | 63 tok/s | Good |
| DeepSeek R1 | Q2_K | – | Not viable |
| DeepSeek R1 Distill Qwen 32B | Q8_0 | 30 tok/s | Acceptable |
| DeepSeek R1 Distill Qwen 7B | Q8_0 | 129 tok/s | Excellent |
| DeepSeek V3 | Q2_K | – | Not viable |
| FLUX.1 Dev | FP16 | – | Excellent |
| Gemma 2 27B Instruct | Q5_K_M | 59 tok/s | Good |
| Gemma 2 9B Instruct | Q8_0 | 106 tok/s | Excellent |
| Gemma 3 12B | Q8_0 | 80 tok/s | Good |
| Gemma 3 27B | Q8_0 | 36 tok/s | Good |
| Gemma 3 4B | Q8_0 | 228 tok/s | Excellent |
| Gemma 4 26B-A4B | Q8_0 | 279 tok/s | Excellent |
| Gemma 4 31B | Q8_0 | 41 tok/s | Good |
| Gemma 4 E2B | Q8_0 | 271 tok/s | Excellent |
| Gemma 4 E4B | Q8_0 | 168 tok/s | Excellent |
| GigaChat Lightning 10B | Q8_0 | 325 tok/s | Excellent |
| InternLM 2.5 7B Chat | Q8_0 | 127 tok/s | Excellent |
| Llama 3.1 70B Instruct | Q5_K_M | 23 tok/s | Acceptable |
| Llama 3.1 8B Instruct | Q8_0 | 122 tok/s | Excellent |
| Llama 3.2 11B Vision | Q8_0 | 89 tok/s | Excellent |
| Llama 3.2 1B Instruct | Q8_0 | 471 tok/s | Excellent |
| Llama 3.2 3B Instruct | Q8_0 | 260 tok/s | Excellent |
| Llama 3.3 70B Instruct | Q8_0 | 14 tok/s | Acceptable |
| Llama 4 Scout | Q5_K_M | 95 tok/s | Excellent |
| LLaVA 1.6 13B | Q5_K_M | 124 tok/s | Excellent |
| Mistral 7B Instruct v0.3 | Q8_0 | 136 tok/s | Excellent |
| Mistral Large 2 123B | Q5_K_M | 13 tok/s | Acceptable |
| Mistral Small 24B Instruct | Q8_0 | 41 tok/s | Good |
| Mixtral 8x7B Instruct | Q5_K_M | 125 tok/s | Excellent |
| nomic-embed-text v1.5 | FP16 | 2000 tok/s | Excellent |
| NVIDIA Nemotron-3-super-120B-A12B | Q4_K_M | 158 tok/s | Excellent |
| Phi-3 Medium 14B Instruct | Q8_0 | 70 tok/s | Good |
| Phi-3 Mini 3.8B Instruct | Q8_0 | 257 tok/s | Excellent |
| Phi-4 14B | Q8_0 | 67 tok/s | Good |
| Phi-4 Mini | Q8_0 | 257 tok/s | Excellent |
| Qwen 2.5 14B Instruct | Q8_0 | 66 tok/s | Good |
| Qwen 2.5 72B Instruct | Q4_K_M | 26 tok/s | Acceptable |
| Qwen 2.5 7B Instruct | Q8_0 | 129 tok/s | Excellent |
| Qwen 2.5 Coder 32B Instruct | Q5_K_M | 50 tok/s | Good |
| Qwen 2.5 Coder 7B Instruct | Q8_0 | 129 tok/s | Excellent |
| Qwen3-14B Instruct | Q8_0 | 70 tok/s | Good |
| Qwen3-30B-A3B | Q8_0 | 278 tok/s | Excellent |
| Qwen3-32B Instruct | Q8_0 | 31 tok/s | Good |
| Qwen3-8B Instruct | Q8_0 | 120 tok/s | Excellent |
| Qwen3.5-122B-A10B | Q8_0 | 98 tok/s | Excellent |
| Qwen3.5-27B | Q8_0 | 36 tok/s | Good |
| Qwen3.5-397B (MoE) | Q2_K | – | Not viable |
| Qwen3.6-27B | Q8_0 | 36 tok/s | Good |
| Qwen3.6-35B-A3B | Q5_K_M | 278 tok/s | Excellent |
| QwQ 32B Preview | Q5_K_M | 50 tok/s | Good |
| Stable Diffusion 3 Medium | FP16 | – | Excellent |
| Stable Diffusion 3.5 Large | FP16 | – | Excellent |
| Stable Diffusion XL 1.0 | FP16 | – | Excellent |
| StarCoder 2 15B | Q8_0 | 63 tok/s | Good |
| Whisper Large V3 | FP16 | – | Excellent |
| Whisper Large V3 Turbo | FP16 | – | Excellent |
| Yi 1.5 34B Chat | Q8_0 | 29 tok/s | Acceptable |
Showing 63 of 63 entries
Who this machine makes sense for
This machine is aimed at team, lab, or enterprise buyers who want a supported system instead of assembling a tower. 96 GB makes it viable for serious local workloads without a DIY build process.
What to verify first
The main question is not whether the machine works, but whether the price premium is justified by warranty, support, and deployment simplicity versus an equivalent custom build.