ExaDigiT: Digital Twin for Exascale Supercomputers (2022 - Present)
Research Staff, November 2022 ~ Present
Overview
A digital twin is a virtual replica of an HPC system, enabling scenario modeling and optimization. This project integrates real-time telemetry with physics-based models to optimize energy efficiency and system stability.
- Co-led the design of ExaDigiT, a digital twin for Frontier’s liquid cooling system, enhancing predictive control strategies.
- Developed a real-time data integration layer linking system telemetry with digital twins.
- Published initial findings at SC24 and contributed to a global digital twin initiative for HPC centers.
Technology
- Python REST API based data access towards Apache Druid
- Kafka based simulation task dispatch and management
- Modelica based cooling models
- Python based scheduler simulation
- Unreal engine for AR/VR (Microsoft Hololens)
Publication
