I/O Redirection Via Integrated Storage
(CCF-1744317,
CNS-1526887,
CNS-0751200)
1) Client can access files concurrently
2) scalability, and capability
3) Distribute large files across multiple nodes
4) Hierarchical global
name space
5) High bandwidth via parallel data transfer.
1) Bad performance in small accesses, unaligned requests, and heavy metadata operations.
2) Performance depends on files, directories, and tree structures.
3) Maintaining data consistency generate overheads, and can create issues such as fragmentation, journaling, and simultaneous operations on the same file system structures.
4) Storage subsystem may pose limitations because of RAID, disk sizes, and other limiting factors by either hardware or software.
1) Encapsulate data, metadata, a globally unique identifier, and data attributes into a single immutable entity termed object.
2) Scalability, flexibility, rapid data retrieval, and distributed access.
3) Easily expandable and well suited for applications requesting non contiguous data accesses and/or heavy metadata operations.
4) Offer consistent data access throughput and extensible metadata.
5) NoSQL schemes demonstrates a huge variety of different implementations each with its own strengths and weaknesses.
1) They are ill suited for access patterns with frequently changing data.
2) Slow in access involving complex operations since each update operation leads to the creation of a new object and the destruction of the previous one followed by an update to the metadata.
3) Object Stores are not POSIX-compliant. this blocks their wide adoption by the HPC community.
There is an ocean of available storage solutions in modern high-performance and distributed systems. These solutions consist of Parallel File Systems (PFS) for the more traditional high-performance computing (HPC) systems and of Object Stores for emerging cloud environments. More often than not, these storage solutions are tied to specific APIs and data models and thus, bind developers, applications, and entire computing facilities to using certain interfaces. Each storage system is designed and optimized for certain applications but does not perform well for others. Furthermore, modern applications have become more and more complex consisting of a collection of phases with different computation and I/O requirements. In this paper, we propose a unified storage access system, called IRIS (i.e., I/O Redirection via Integrated Storage). IRIS enables unified data access and seamlessly bridges the semantic gap between file systems and object stores. With IRIS, emerging High-Performance Data Analytics software has capable and diverse I/O support. IRIS can bring us closer to the convergence of HPC and Cloud environments by combining the best storage subsystems from both worlds. Experimental results show that IRIS can grant more than 7x improvement in performance than existing solutions.
7x improvement
6x improvement
7x improvement
40-60% improvement
30-50% improvement