HPC Analytics for Extreme Scale Computing
While detailed measurements about system components and applications could be collected via
various monitoring tools, gaining knowledge from these data on a petascale and beyond system is
a daunting problem. The goal of this project is twofold: (1) to provide multilevel analysis of fault models,
workload characteristics, and performance-reliability-power tradeoffs from various system data
by exploring advanced data mining and statistical learning technologies, and (2) to develop runtime strategies
to improve power-performance efficiency of scientific applications on Argonne's leadership systems.
Members:
Zhiling Lan (faculty)
Sean Wallace (Ph.D. student)
Collaborators:
Mike Papka (Argonne Lab)
Susan Coghlan (Argonne Lab)
Venkatram Vishwanath (Argonne Lab)
Publications:
S. Wallace, X. Yang, V. Vishwanath,
W. Allcock, S. Coghlan, M. Papka, and Z. Lan, "A Data Driven Scheduling Approach for Power Management
on HPC Systems", Proc. of SC16 (acceptance rate is 18%),
2016.[PDF]
S. Wallace, Z. Zhou, V. Vishwanath, S. Coghlan,
J. Tramm, Z. Lan, and M.E. Papka, "Application Power Profiling on IBM Blue Gene/Q",
Journal of Parallel Computing (ParCo) ,
2016. [PDF]
S. Wallace, V. Vishwanath, S. Coghlan, Z. Lan, and M. Papka,
"Comparison of Vendor Supplied Environmental Data Collection Mechanisms",
Workshop on Monitoring and Analysis for High Performance Computing Systems Plus
Applications (HPCMASPA), in conjunction with IEEE Cluster'15,
2015.
S. Wallace, V. Vishwanath,S. Coghlan,
J. Tramm, Z. Lan, and M. Papka,
"Application Power Profiling on IBM Blue Gene/Q",
Proc. of Cluster'13, 2013.
S. Wallace, V. Vishwanath,S. Coghlan,
Z. Lan, and M. Papka,
"Measuring Power Consumption on IBM Blue Gene/Q",
The 9th Workshop on High-Performance, Power-Aware
Computing (HPPAC) (in conjunction with IPDPS'13), 2013.
Contact:
Dr. Zhiling Lan (lan AT iit DOT edu)
Acknowlegement:
This project is supported by the US DOE/Argonne.