HPC Analytics for Extreme Scale Computing

While detailed measurements about system components and applications could be collected via various monitoring tools, gaining knowledge from these data on a petascale and beyond system is a daunting problem. The goal of this project is twofold: (1) to provide multilevel analysis of fault models, workload characteristics, and performance-reliability-power tradeoffs from various system data by exploring advanced data mining and statistical learning technologies, and (2) to develop runtime strategies to improve power-performance efficiency of scientific applications on Argonne's leadership systems.

Members:
  • Zhiling Lan (faculty)
  • Sean Wallace (Ph.D. student)

  • Collaborators:
  • Mike Papka (Argonne Lab)
  • Susan Coghlan (Argonne Lab)
  • Venkatram Vishwanath (Argonne Lab)

  • Publications:
  • S. Wallace, X. Yang, V. Vishwanath, W. Allcock, S. Coghlan, M. Papka, and Z. Lan, "A Data Driven Scheduling Approach for Power Management on HPC Systems", Proc. of SC16 (acceptance rate is 18%), 2016.[PDF]
  • S. Wallace, Z. Zhou, V. Vishwanath, S. Coghlan, J. Tramm, Z. Lan, and M.E. Papka, "Application Power Profiling on IBM Blue Gene/Q", Journal of Parallel Computing (ParCo) , 2016. [PDF]
  • S. Wallace, V. Vishwanath, S. Coghlan, Z. Lan, and M. Papka, "Comparison of Vendor Supplied Environmental Data Collection Mechanisms", Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications (HPCMASPA), in conjunction with IEEE Cluster'15, 2015.
  • S. Wallace, V. Vishwanath,S. Coghlan, J. Tramm, Z. Lan, and M. Papka, "Application Power Profiling on IBM Blue Gene/Q", Proc. of Cluster'13, 2013.
  • S. Wallace, V. Vishwanath,S. Coghlan, Z. Lan, and M. Papka, "Measuring Power Consumption on IBM Blue Gene/Q", The 9th Workshop on High-Performance, Power-Aware Computing (HPPAC) (in conjunction with IPDPS'13), 2013.


    Contact:
    Dr. Zhiling Lan (lan AT iit DOT edu)


    Acknowlegement:
    This project is supported by the US DOE/Argonne.