Operational Data Analytics for HPC Energy Efficiency - The Cost of Staying Afloat


Operational Data Analytics (ODA) provides unique opportunities to analyze, understand, and optimize operations of HPC systems. Readily available open-source frameworks make the collection of monitoring data from different domains of the HPC system (infrastructure, system hardware, software, applications) increasingly easy. However, making the data work for HPC operations is not straight-forward. AI-based methods seem interesting, but which tools and methods are suitable for this type of data is not obvious. This BoF aims to bring together practitioners in HPC operations to share use cases for ODA, discuss problems, and provide feedback.
In this BoF, as a panel speaker, I presented the ODA system architecting efforts at OLCF, sharing the challenges we face due to the amount of data we collect and how we solved or aim to solve them, and shared the lessons learned moving forward.
Room C144-145, Kay Bailey Hutchison Convention Center
650 S Griffin St, Dallas, TX 75202