HP Z8 Fury G6i (1× RTX PRO 6000, 96 GB)

Windows · Linux

HP Z8 Fury G6i workstation with a single NVIDIA RTX PRO 6000 Blackwell (96 GB GDDR7). Intel Xeon 600-series CPU, 256 GB DDR5-6400 system RAM. Runs 70B models at high quantization quality. A professional single-GPU inference workstation ; Windows or Linux.

From

$17,500

Estimated · varies by configuration

Enterprise pricing varies by configuration and region. Confirm quote and availability with HP.

View on HP

Memory

96 GB

GPUs

1×

RAM

256 GB

Models

Type

Desktop

Inference Memory

96 GB

Accelerator

96 GB GDDR7

System RAM

256 GB

CPU

Intel Xeon w5-3435X (16-core Granite Rapids)

Windows, Linux

What it can run

63 models


all-MiniLM-L6-v2	FP16	3000 tok/s	Excellent
Arcee Trinity Mini 26B	Q8_0	75 tok/s	Excellent
Arcee Trinity Nano 6B	Q8_0	318 tok/s	Excellent
Code Llama 34B Instruct	Q5_K_M	48 tok/s	Good
Codestral 22B	Q5_K_M	73 tok/s	Good
Command R 35B	Q8_0	28 tok/s	Acceptable
DeepSeek Coder V2 Lite 16B	Q8_0	63 tok/s	Good
DeepSeek R1	Q2_K	–	Not viable
DeepSeek R1 Distill Qwen 32B	Q8_0	30 tok/s	Acceptable
DeepSeek R1 Distill Qwen 7B	Q8_0	129 tok/s	Excellent
DeepSeek V3	Q2_K	–	Not viable
FLUX.1 Dev	FP16	–	Excellent
Gemma 2 27B Instruct	Q5_K_M	59 tok/s	Good
Gemma 2 9B Instruct	Q8_0	106 tok/s	Excellent
Gemma 3 12B	Q8_0	80 tok/s	Good
Gemma 3 27B	Q8_0	36 tok/s	Good
Gemma 3 4B	Q8_0	228 tok/s	Excellent
Gemma 4 26B-A4B	Q8_0	279 tok/s	Excellent
Gemma 4 31B	Q8_0	41 tok/s	Good
Gemma 4 E2B	Q8_0	271 tok/s	Excellent
Gemma 4 E4B	Q8_0	168 tok/s	Excellent
GigaChat Lightning 10B	Q8_0	325 tok/s	Excellent
InternLM 2.5 7B Chat	Q8_0	127 tok/s	Excellent
Llama 3.1 70B Instruct	Q5_K_M	23 tok/s	Acceptable
Llama 3.1 8B Instruct	Q8_0	122 tok/s	Excellent
Llama 3.2 11B Vision	Q8_0	89 tok/s	Excellent
Llama 3.2 1B Instruct	Q8_0	471 tok/s	Excellent
Llama 3.2 3B Instruct	Q8_0	260 tok/s	Excellent
Llama 3.3 70B Instruct	Q8_0	14 tok/s	Acceptable
Llama 4 Scout	Q5_K_M	95 tok/s	Excellent
LLaVA 1.6 13B	Q5_K_M	124 tok/s	Excellent
Mistral 7B Instruct v0.3	Q8_0	136 tok/s	Excellent
Mistral Large 2 123B	Q5_K_M	13 tok/s	Acceptable
Mistral Small 24B Instruct	Q8_0	41 tok/s	Good
Mixtral 8x7B Instruct	Q5_K_M	125 tok/s	Excellent
nomic-embed-text v1.5	FP16	2000 tok/s	Excellent
NVIDIA Nemotron-3-super-120B-A12B	Q4_K_M	158 tok/s	Excellent
Phi-3 Medium 14B Instruct	Q8_0	70 tok/s	Good
Phi-3 Mini 3.8B Instruct	Q8_0	257 tok/s	Excellent
Phi-4 14B	Q8_0	67 tok/s	Good
Phi-4 Mini	Q8_0	257 tok/s	Excellent
Qwen 2.5 14B Instruct	Q8_0	66 tok/s	Good
Qwen 2.5 72B Instruct	Q4_K_M	26 tok/s	Acceptable
Qwen 2.5 7B Instruct	Q8_0	129 tok/s	Excellent
Qwen 2.5 Coder 32B Instruct	Q5_K_M	50 tok/s	Good
Qwen 2.5 Coder 7B Instruct	Q8_0	129 tok/s	Excellent
Qwen3-14B Instruct	Q8_0	70 tok/s	Good
Qwen3-30B-A3B	Q8_0	278 tok/s	Excellent
Qwen3-32B Instruct	Q8_0	31 tok/s	Good
Qwen3-8B Instruct	Q8_0	120 tok/s	Excellent
Qwen3.5-122B-A10B	Q8_0	98 tok/s	Excellent
Qwen3.5-27B	Q8_0	36 tok/s	Good
Qwen3.5-397B (MoE)	Q2_K	–	Not viable
Qwen3.6-27B	Q8_0	36 tok/s	Good
Qwen3.6-35B-A3B	Q5_K_M	278 tok/s	Excellent
QwQ 32B Preview	Q5_K_M	50 tok/s	Good
Stable Diffusion 3 Medium	FP16	–	Excellent
Stable Diffusion 3.5 Large	FP16	–	Excellent
Stable Diffusion XL 1.0	FP16	–	Excellent
StarCoder 2 15B	Q8_0	63 tok/s	Good
Whisper Large V3	FP16	–	Excellent
Whisper Large V3 Turbo	FP16	–	Excellent
Yi 1.5 34B Chat	Q8_0	29 tok/s	Acceptable

Showing 63 of 63 entries

Best Fit

Who this machine makes sense for

This machine is aimed at team, lab, or enterprise buyers who want a supported system instead of assembling a tower. 96 GB makes it viable for serious local workloads without a DIY build process.

Before You Buy

What to verify first

The main question is not whether the machine works, but whether the price premium is justified by warranty, support, and deployment simplicity versus an equivalent custom build.

All machines