Nvidia RTX 6000 Pro power efficiency testing (@themarkymark)

Image from thread

Number Six is my newer AI server I built about a month ago. I have done a lot of testing and tweaking in that time, but I finally got around to do a more thorough power efficiency test.

The server mainly revolves around the two Nvidia RTX 6000 Pro 600W cards. My initial testing showed only around 4% loss in performance by implementing a power limit of 300W per card. This effectively cut the power ceiling of the cards by 50%, and yielded around 43% power savings. A trade I was more than happy to do.

After some discussion on Twitter, I decided to spend a few and do more thorough testing now I have had time to tweak performance and get my desired model running well.

My daily driver is GLM Air 4.5 FP8 until they get around to releasing 4.6 they promised months ago. I typically see around 95 tokens/sec when just asking a simple question and as much as 195 tokens/sec when doing more complex and agentic tasks.

My testing is for 250W, 300W, 360W, and 600W (stock).

250W

Input token throughput (tok/s): 1071.01
Output token throughput (tok/s): 525.69
Total token throughput (tok/s): 1596.71

300W

Input token throughput (tok/s): 1216.33
Output token throughput (tok/s): 597.02
Total token throughput (tok/s): 1813.35

360W

Request throughput (req/s): 2.46
Input token throughput (tok/s): 1263.23
Output token throughput (tok/s): 620.04
Total token throughput (tok/s): 1883.27

600W

Input token throughput (tok/s): 1274.46
Output token throughput (tok/s): 625.55
Total token throughput (tok/s): 1900.02

These tokens/sec seem high, but this is simulating a multi user workload which will perform considerably better than a single user making one request.

Peak performance is of course at 600W for 625.55 tokens/second with the lowest performance at 250W giving 525.69 tokens/second. When looking at everything, 300W is a clear winner with 597.02 tokens/second.

If you look at the actual power draw, this gets really interesting though.
Image from thread

250W actually uses more power overall, the tests take longer but actually has peak spikes higher than 300W. If you look closely at the graph you can see the 250W test hits as high as 862W where as the 300W test peaked at 821W. The average wattage is fairly similar between these two tests.

Performance & Efficiency Comparison

Per-card limit	System power (measured)	Total tok/s	% of max throughput	Output tok/s	Median TTFT	Median ITL	Tokens per Watt	Efficiency vs 600W
250 W	814 W	1 597	84.0 %	526	229.8 s	20.68 ms	1.963	+27 %
300 W	816 W	1 813	95.4 %	597	201.9 s	17.79 ms	2.223	+44 %
360 W	990 W	1 883	99.1 %	620	195.5 s	17.33 ms	1.902	+23 %
600 W (max)	1 229 W	1 900	100 %	626	196.6 s	17.27 ms	1.546	baseline

Summary – vs full 600 W mode

Per-card limit	System power	Power saved vs max	Throughput loss vs max
300 W per card	816 W	–34 %	–4.6 %
360 W per card	990 W	–19 %	–0.9 %
250 W per card	814 W	–34 %	–16 %
600 W per card (max)	1 229 W	0 %	0 %

In reality though, the numbers are even more in favor of 300W, as I was cherry picking the peak wattage specifically. It is interesting that 360W is where you get almost no loss in performance with 99.1% throughput, but with minimal power savings.