HPC applications generate more data than storage systems can handle, and it is becoming increasingly important to store activity (log) data generated by people and applications. ChronoLog is a hierarchical, distributed log store that leverages physical time to achieve log ordering and reduce contention while utilizing storage tiers to elastically scale the log capacity.
To reduce the I/O bottleneck, complex storage hierarchies have been introduced.
However, managing this complexity should not be left to application developers. Hermes
is a middeware library that automatically manages buffering in heterogeneous storage
environments.
Various storage solutions exist and require specialized APIs and data models in order to use, which binds developers, applications, and entire computing facilities to using certain interfaces. Each storage system is designed and optimized for certain applications but does not perform well for others. IRIS is a unified storage access system that bridges the semantic gap between filesystems and object stores.
From the system point of view, there are two types of data: observational data, the data collected by electrical devices such as sensors, monitors, cameras, etc.; and simulation data, data
generated by computing. In general, the latter is used in the traditional scientific high-performance computing (HPC) and requires strong consistency for correctness. The former is popular for newly emerged big data applications and
does not require strong consistency. The difference in consistency leads to two kinds of file systems: data-intensive distributed file system, represented by the MapReduce-based Hadoop distributed file systems (HDFS) from Google and
Yahoo...
While advances in microprocessor design continue to increase computing speed, improvements in data access speed of computing systems lag far behind. At the same time, data-intensive
large-scale applications, such as information retrieval, computer animation, and big data analytics are emerging. Data access delay has become the vital performance bottleneck of modern high performance computing (HPC). Memory
concurrency exists at each layer of modern memory hierarchies; however, conventional computing systems are primarily designed to improve CPU utilization and have inherent limitations in addressing ...
Large scale applications in critical areas of science and technology have become more and more data intensive. I/O has become a vital performance bottleneck of modern high-end computing (HEC) practices. Conventional HEC execution paradigms, however, are computing-centric. They are designed to utilize CPU performance for computation intensive applications, and have inherent limitations in addressing newly-emerged data access and data management issues of HEC.In this project, we propose an innovative decoupled execution paradigm (DEP) and the notion of separation of computing-intensive and data-intensive operations....
As modern multicore architectures put ever more pressure on the sluggish memory systems, computer applications become more and more data intensive. Advanced memory hierarchies and parallel
file systems have been developed in recent years. However, they only provide performance for well-formed data streams, and fail to meet a more general demand. I/O has become a crucial performance bottleneck of high-end computing
(HEC), especially for data intensive applications. New mechanisms and new I/O architectures need to be developed to solve the ‘I/O-wall’ problem. We propose a new I/O architecture for HEC. Unlike traditional I/O designs ...
In the age of big data, scientific applications are generating large volume of data, leading to an explosion of requirements and complexity to process these data. In High Performance
Computing (HPC), data management is traditionally supported by the Parallel File Systems (PFS), such as Lustre, PVFS2, GPFS, etc. In big data environments, general-purpose analysis frameworks like MapReduce and Spark are popular and
highly available with data storage supported by distributed file systems, such as Hadoop Distributed File Systems...
Multicore microprocessor has totally changed the landscape of what we know as (single machine) computing. It brings parallel processing into a single processor at task level. On one hand, it significantly further enlarges the performance gap between data processing and data access. On the other hand, it calls for a rethinking of system design to utilize the potential of multicore architecture. We believe the key of utilizing multicore microprocessors is to reduce data access delay. We rethink the system scheduling and support from the viewpoint of data access under multicore environments. ...
Rapid advancement of communication technology has changed the landscape of computing. New models of computing such as business-on-demand, Web services, peer-to-peer networks, Grid computing
and Cloud computing have emerged to harness distributed computing and network resources and provide powerful services. In such computing platforms, resources are shared and are likely to be remote, thereby out of the user's control.
Consequently, resource availability to each user varies largely from time to time due to resource sharing, system configuration change, potential...
Workflow management is a new area of distributed computing. It shares many common characteristics with business workflow. However, with the management of thousands processes running coordinatedly in a widely distributed, shared network environment, the workflow of distributed computing is much more complex than the conventional business workflow. Workflow supports task scheduling but is more than task scheduling. From the view point of computing service, any user request is a service, which can be decomposed into a series of known basic services. These basic services may have inherent control...
Computing addresses the issues of making human life easier, but in true sense in todays computing the users have to follow computing rather then other way around. With the advance of mobile
computing and wireless communication technologies, pervasive computing has emerged as a feasible technology for human-centered computing. Pervasive Computing creates a ubiquitous environment that combines processors and sensors with
network technologies (wireless and otherwise) and intelligent software to create an immerse environment to improve life, in which computers have become an ...
DVM system is a prototype middleware providing applications a secure, stable and specialized computing environment in cyberspace. It encapsulates computational resources in a secure,
isolated and customized virtual machine environment, and enables transparent service mobility and service provisioning. In this research, a computing environment is modeled as DVM, an abstract virtual machine, and is incarnated
automatically on various virtualization platforms. To migrate a virtual machine, the DVM system needs to collect the runtime states of a VM. The communication should be kept alive...
In this interdisciplinary research, we study scalable parallel algorithms and simulations based on new mathematical development of adaptive wavelet methods. The driving force of this
research is the need of next generation large scale simulation capability as required by the intrinsic physical property of industrial applications. The recent advances in both parallel wavelet methods and scalable parallel algorithms
with new computer architectures have made such an endeavor a more realistic task. This on-going research activities combine the...
Mobility is a primary functionality of the next generation computing. Intensive research has been done in recent years on mobile agents and mobile computing...
A suite of software systems, communication protocals, and tools that enable computer-based cooperative work. It is a sister project of SNOW.