ExaDigiT: Digital Twin for Exascale Supercomputers (2022 - Present)

Jul 1, 2022 · 1 min read
Analytics and AI Methods at Scale Group (AAIMS), Oak Ridge National Laboratory, USA
Research Staff, November 2022 ~ Present

ExaDigiT

Overview

A digital twin is a virtual replica of an HPC system, enabling scenario modeling and optimization. This project integrates real-time telemetry with physics-based models to optimize energy efficiency and system stability.

  • Co-led the design of ExaDigiT, a digital twin for Frontier’s liquid cooling system, enhancing predictive control strategies.
  • Developed a real-time data integration layer linking system telemetry with digital twins.
  • Published initial findings at SC24 and contributed to a global digital twin initiative for HPC centers.

ExaDigiT Architecture

Technology

  • Python REST API based data access towards Apache Druid
  • Kafka based simulation task dispatch and management
  • Modelica based cooling models
  • Python based scheduler simulation
  • Unreal engine for AR/VR (Microsoft Hololens)

Publication