Abstract
Operational data analytics (ODA) provides unique opportunities to analyze,
understand, and optimize operations of HPC systems. Readily available
open-source frameworks make the collection of monitoring data from different
domains of the HPC system increasingly easy. However, making the data work
for HPC operations is not straightforward and HPC sites are duplicating
efforts to develop methods and tools to analyze and leverage the data.
AI-based analysis methods are appealing, but certainly not the only option.
This BoF aims to bring together practitioners in HPC operations to share use
cases for ODA, discuss problems and provide feedback.
To support this BoF, I discuss the data journey of OLCF in the past two
generations of system and share lessons learned in an interactive way.
Event
Location
Georgia World Conference Center
285 Andrew Young International Blvd NW, Atlanta, Georgia 30303