HPC Application Energy Efficiency (2022 - Present)

Jul 8, 2022 · 2 min read
Analytics and AI Methods at Scale Group (AAIMS), Oak Ridge National Laboratory, USA
Research Staff, July 2022 ~ Present

image

Overview

This initiative increases energy awareness by providing tools and frameworks for scientific users.

  • Analyzed and reported energy projections given the system hardware and system software capabilities
  • Developed a prototype on-demand job-level performance and energy monitoring tool for user-driven analysis.
  • Collaborated with the vendor (AMD) to enhance energy monitoring and workflow integration.
  • Led a division-wide collaboration to engage end users and bridge system operations with energy efficiency.
  • Served as a subject matter expert, driving collaboration through presentations and publications shaping energy efficiency in leadership computing.

Technologies

  • Python based BYOM (bring your own monitoring) concept HPC monitoring tools for users
  • AMD OmniStat, AMD MI250x power & energy sensors and tools
  • Python based UI backends into user driven HPC power & energy data viewer

Publications

Invited Talks