HPC Application Energy Efficiency (2022 - Present)
Research Staff, July 2022 ~ Present
Overview
This initiative increases energy awareness by providing tools and frameworks for scientific users.
- Analyzed and reported energy projections given the system hardware and system software capabilities
- Developed a prototype on-demand job-level performance and energy monitoring tool for user-driven analysis.
- Collaborated with the vendor (AMD) to enhance energy monitoring and workflow integration.
- Led a division-wide collaboration to engage end users and bridge system operations with energy efficiency.
- Served as a subject matter expert, driving collaboration through presentations and publications shaping energy efficiency in leadership computing.
Technologies
- Python based BYOM (bring your own monitoring) concept HPC monitoring tools for users
- AMD OmniStat, AMD MI250x power & energy sensors and tools
- Python based UI backends into user driven HPC power & energy data viewer
Publications
- Shin et al., “Towards Sustainable Post-Exascale Leadership Computing”, SC24-W SusSup24, 2024
- Karimi et al., “Exploring the Frontiers of Energy Efficiency using Power Management at System Scale”, SC24-W SusSup24, 2024
Invited Talks
- ADAC13, “Software for sustainability – Green IT and Sustainable Computing”
- PASC23-MiniSymposia, “User Facility Support for Post-Exascale HPC Energy Efficiency”
- SCSP AIExpo24 - DOE Booth, “HPC Energy Efficiency @ OLCF”
- SC24 DOE Booth Featured Talk - “HPC Energy Efficiency @ Oak Ridge Leadership Computing Facility”
