Characterizing the Impact of GPU Power Management on an Exascale System

Nov 1, 2025·

Mariana Toledo Costa

Antigoni Georgiadou

James B. White III

Bruno Villasenor Alvarez

Jordà Polo

Woong Shin

Philippe Olivier Alexandre Navaux

Bronson Messer

Arthur Francisco Lorenzon

· 0 min read

PDF Cite DOI OSTI URL

Abstract

This paper examines GPU power and frequency capping techniques on exascale systems. We conduct experiments on a system where each Graphics Compute Die (GCD) includes 64 GB of high-bandwidth memory (HBM3E), with each GCD treated as an independent GPU, yielding 8 GPUs per node. To assess the behavior of power and frequency capping techniques under different scalability conditions, all applications were executed in two configurations—a single-node run utilizing all 8 GPUs, and a multi-node run using 32 nodes for a total of 256 GPUs. Our results show that applications exhibiting certain behavior can benefit from moderate frequency reductions, allowing for energy-aware tuning with minimal performance degradation. A dynamic capping mechanism that reacts to workload characteristics can reduce power usage by 20% without measurable runtime penalty.

Type

Workshop

Publication

Proceedings of the 16th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems (PMBS), held in conjunction with SC25

Last updated on Nov 1, 2025

HPC Energy Efficiency GPU Power Management Exascale Computing Frontier Supercomputer Performance Modeling

Authors

Woong Shin

Research Scientist | Scientific Computing × AI | HPC Infrastructure

← Bridging the Gap: User-Centric Energy Monitoring for Policy-Driven Application Optimization in HPC Data Centers Nov 1, 2025

LLM Agents for Interactive Workflow Provenance: Reference Architecture and Evaluation Methodology Nov 1, 2025 →