Building on these robust data foundations, I develop advanced analytics and visualization tools that help HPC operators detect anomalies, predict failures, and optimize resource usage. Whether it’s diagnosing GPU memory corruption at scale or pinpointing overcooling in data centers, the overarching goal is to enhance system reliability, efficiency, and user productivity. By bridging data engineering with domain knowledge, I help HPC sites adopt a proactive, data-centric approach to managing next-generation supercomputers.
Analytics and AI Methods at Scale Group (AAIMS), Oak Ridge National Laboratory, USA
Jul 1, 2022
Analytics and AI Methods at Scale Group (AAIMS), Oak Ridge National Laboratory, USA
Jun 1, 2021
Analytics and AI Methods at Scale Group (AAIMS), Oak Ridge National Laboratory, USA
Jun 1, 2021
Analytics and AI Methods at Scale Group (AAIMS), Oak Ridge National Laboratory, USA
Oct 1, 2020
Technology Integration Group, Oak Ridge National Laboratory, USA
Jun 1, 2018
The 6th ISC HPC International Workshop on “Monitoring & Operational Data Analytics”
Jun 13, 2025
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC24) - Birds of a Feather on Operational Data Analytics
Nov 20, 2024
2022 The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22) - Birds of a Feather on Operational Data Analytics
Nov 17, 2022