Posts

Nvidia RTX 6000 Pro power efficiency testing

61 comments·0 reblogs
themarkymark
81
·
0 views
·
min-read

Image from thread

Number Six is my newer AI server I built about a month ago. I have done a lot of testing and tweaking in that time, but I finally got around to do a more thorough power efficiency test.

The server mainly revolves around the two Nvidia RTX 6000 Pro 600W cards. My initial testing showed only around 4% loss in performance by implementing a power limit of 300W per card. This effectively cut the power ceiling of the cards by 50%, and yielded around 43% power savings. A trade I was more than happy to do.

After some discussion on Twitter, I decided to spend a few and do more thorough testing now I have had time to tweak performance and get my desired model running well.

My daily driver is GLM Air 4.5 FP8 until they get around to releasing 4.6 they promised months ago. I typically see around 95 tokens/sec when just asking a simple question and as much as 195 tokens/sec when doing more complex and agentic tasks.

My testing is for 250W, 300W, 360W, and 600W (stock).

250W

Input token throughput (tok/s): 1071.01
Output token throughput (tok/s): 525.69
Total token throughput (tok/s): 1596.71

300W

Input token throughput (tok/s): 1216.33
Output token throughput (tok/s): 597.02
Total token throughput (tok/s): 1813.35

360W

Request throughput (req/s): 2.46
Input token throughput (tok/s): 1263.23
Output token throughput (tok/s): 620.04
Total token throughput (tok/s): 1883.27

600W

Input token throughput (tok/s): 1274.46
Output token throughput (tok/s): 625.55
Total token throughput (tok/s): 1900.02

These tokens/sec seem high, but this is simulating a multi user workload which will perform considerably better than a single user making one request.

Peak performance is of course at 600W for 625.55 tokens/second with the lowest performance at 250W giving 525.69 tokens/second. When looking at everything, 300W is a clear winner with 597.02 tokens/second.

If you look at the actual power draw, this gets really interesting though.
Image from thread

250W actually uses more power overall, the tests take longer but actually has peak spikes higher than 300W. If you look closely at the graph you can see the 250W test hits as high as 862W where as the 300W test peaked at 821W. The average wattage is fairly similar between these two tests.

Performance & Efficiency Comparison

Per-card limitSystem power (measured)Total tok/s% of max throughputOutput tok/sMedian TTFTMedian ITLTokens per WattEfficiency vs 600W
250 W814 W1 59784.0 %526229.8 s20.68 ms1.963+27 %
300 W816 W1 81395.4 %597201.9 s17.79 ms2.223+44 %
360 W990 W1 88399.1 %620195.5 s17.33 ms1.902+23 %
600 W (max)1 229 W1 900100 %626196.6 s17.27 ms1.546baseline

Summary – vs full 600 W mode

Per-card limitSystem powerPower saved vs maxThroughput loss vs max
300 W per card816 W–34 %–4.6 %
360 W per card990 W–19 %–0.9 %
250 W per card814 W–34 %–16 %
600 W per card (max)1 229 W0 %0 %

In reality though, the numbers are even more in favor of 300W, as I was cherry picking the peak wattage specifically. It is interesting that 360W is where you get almost no loss in performance with 99.1% throughput, but with minimal power savings.

Image from thread