This paper examines GPU power and frequency capping techniques on exascale systems. We conduct experiments on a system where each Graphics Compute Die (GCD) includes 64 GB of high-bandwidth memory (HBM3E), with each GCD treated as an independent GPU, yielding 8 GPUs per node. To assess the behavior of power and frequency capping techniques under different scalability conditions, all applications were executed in two configurations—a single-node run utilizing all 8 GPUs, and a multi-node run using 32 nodes for a total of 256 GPUs. Our results show that applications exhibiting certain behavior can benefit from moderate frequency reductions, allowing for energy-aware tuning with minimal performance degradation. A dynamic capping mechanism that reacts to workload characteristics can reduce power usage by 20% without measurable runtime penalty.
Proceedings of the 16th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems (PMBS), held in conjunction with SC25