Current Projects

ChronoLog: A High-Performance Storage Infrastructure for Activity and Log Workloads

HPC applications generate more data than storage systems can handle, and it is becoming increasingly important to store activity (log) data generated by people and applications. ChronoLog is a hierarchical, distributed log store that leverages physical time to achieve log ordering and reduce contention while utilizing storage tiers to elastically scale the log capacity.

Hermes: Extending the HDF Library to Support Intelligent I/O Buffering for Deep Memory and Storage Hierarchy System

To reduce the I/O bottleneck, complex storage hierarchies have been introduced. However, managing this complexity should not be left to application developers. Hermes is a middeware library that automatically manages buffering in heterogeneous storage environments.

IRIS: I/O Redirection Via Integrated Storage

Various storage solutions exist and require specialized APIs and data models in order to use, which binds developers, applications, and entire computing facilities to using certain interfaces. Each storage system is designed and optimized for certain applications but does not perform well for others. IRIS is a unified storage access system that bridges the semantic gap between filesystems and object stores.

Empower Data-Intensive Computing: the integrated data management approach

From the system point of view, there are two types of data: observational data, the data collected by electrical devices such as sensors, monitors, cameras, etc.; and simulation data, data generated by computing. In general, the latter is used in the traditional scientific high-performance computing (HPC) and requires strong consistency for correctness. The former is popular for newly emerged big data applications and does not require strong consistency. The difference in consistency leads to two kinds of file systems: data-intensive distributed file system, represented by the MapReduce-based Hadoop distributed file systems (HDFS) from Google and Yahoo...

Utilizing Memory Parallelism for High Performance Data Processing

While advances in microprocessor design continue to increase computing speed, improvements in data access speed of computing systems lag far behind. At the same time, data-intensive large-scale applications, such as information retrieval, computer animation, and big data analytics are emerging. Data access delay has become the vital performance bottleneck of modern high performance computing (HPC). Memory concurrency exists at each layer of modern memory hierarchies; however, conventional computing systems are primarily designed to improve CPU utilization and have inherent limitations in addressing ...

DEP: A Decoupled Execution Paradigm for Data-intensive High-End Computing

Large scale applications in critical areas of science and technology have become more and more data intensive. I/O has become a vital performance bottleneck of modern high-end computing (HEC) practices. Conventional HEC execution paradigms, however, are computing-centric. They are designed to utilize CPU performance for computation intensive applications, and have inherent limitations in addressing newly-emerged data access and data management issues of HEC.In this project, we propose an innovative decoupled execution paradigm (DEP) and the notion of separation of computing-intensive and data-intensive operations....

Application-Specific Optimization via Server Push I/O Architecture 

As modern multicore architectures put ever more pressure on the sluggish memory systems, computer applications become more and more data intensive. Advanced memory hierarchies and parallel file systems have been developed in recent years. However, they only provide performance for well-formed data streams, and fail to meet a more general demand. I/O has become a crucial performance bottleneck of high-end computing (HEC), especially for data intensive applications. New mechanisms and new I/O architectures need to be developed to solve the ‘I/O-wall’ problem. We propose a new I/O architecture for HEC. Unlike traditional I/O designs ... 

Empowering Data Management, Diagnosis, and Visualization of Cloud-Resolving Models by Cloud Library upon Spark and Hadoop

In the age of big data, scientific applications are generating large volume of data, leading to an explosion of requirements and complexity to process these data. In High Performance Computing (HPC), data management is traditionally supported by the Parallel File Systems (PFS), such as Lustre, PVFS2, GPFS, etc. In big data environments, general-purpose analysis frameworks like MapReduce and Spark are popular and highly available with data storage supported by distributed file systems, such as Hadoop Distributed File Systems...

Past Projects

Multicore Scheduling and Data Access

Multicore microprocessor has totally changed the landscape of what we know as (single machine) computing. It brings parallel processing into a single processor at task level. On one hand, it significantly further enlarges the performance gap between data processing and data access. On the other hand, it calls for a rethinking of system design to utilize the potential of multicore architecture. We believe the key of utilizing multicore microprocessors is to reduce data access delay. We rethink the system scheduling and support from the viewpoint of data access under multicore environments. ...

Grid Harvest Service (GHS) 

Rapid advancement of communication technology has changed the landscape of computing. New models of computing such as business-on-demand, Web services, peer-to-peer networks, Grid computing and Cloud computing have emerged to harness distributed computing and network resources and provide powerful services. In such computing platforms, resources are shared and are likely to be remote, thereby out of the user's control. Consequently, resource availability to each user varies largely from time to time due to resource sharing, system configuration change, potential...

Workflow Research Project

Workflow management is a new area of distributed computing. It shares many common characteristics with business workflow. However, with the management of thousands processes running coordinatedly in a widely distributed, shared network environment, the workflow of distributed computing is much more complex than the conventional business workflow. Workflow supports task scheduling but is more than task scheduling. From the view point of computing service, any user request is a service, which can be decomposed into a series of known basic services. These basic services may have inherent control...

Pervasive Computing

Computing addresses the issues of making human life easier, but in true sense in todays computing the users have to follow computing rather then other way around. With the advance of mobile computing and wireless communication technologies, pervasive computing has emerged as a feasible technology for human-centered computing. Pervasive Computing creates a ubiquitous environment that combines processors and sensors with network technologies (wireless and otherwise) and intelligent software to create an immerse environment to improve life, in which computers have become an ...

Dynamic Virtual Machine Project

DVM system is a prototype middleware providing applications a secure, stable and specialized computing environment in cyberspace. It encapsulates computational resources in a secure, isolated and customized virtual machine environment, and enables transparent service mobility and service provisioning. In this research, a computing environment is modeled as DVM, an abstract virtual machine, and is incarnated automatically on various virtualization platforms. To migrate a virtual machine, the DVM system needs to collect the runtime states of a VM. The communication should be kept alive...

Highly Accurate PArallel Numerical Simulations (HAPANS)

In this interdisciplinary research, we study scalable parallel algorithms and simulations based on new mathematical development of adaptive wavelet methods. The driving force of this research is the need of next generation large scale simulation capability as required by the intrinsic physical property of industrial applications. The recent advances in both parallel wavelet methods and scalable parallel algorithms with new computer architectures have made such an endeavor a more realistic task. This on-going research activities combine the...

High Performance Computing Mobility (HPCM) middleware

Mobility is a primary functionality of the next generation computing. Intensive research has been done in recent years on mobile agents and mobile computing...

Virtual Collaboratory for Numerical Simulation (VCNS) 

A suite of software systems, communication protocals, and tools that enable computer-based cooperative work. It is a sister project of SNOW.


Stuart Building
Room 112i and 010
10 W. 31st Street
Chicago, Illinois 60616


Phone: +1 312 567 6885