Document Type
Dissertation
Degree
Doctor of Philosophy (PhD)
Major/Program
Computer Science
First Advisor's Name
Ming Zhao
First Advisor's Committee Title
Committee chair
Second Advisor's Name
Raju Rangaswami
Second Advisor's Committee Title
Committee member
Third Advisor's Name
Jason Liu
Third Advisor's Committee Title
Committee member
Fourth Advisor's Name
Gang Quan
Fourth Advisor's Committee Title
Committee member
Fifth Advisor's Name
Deng Pan
Fifth Advisor's Committee Title
Committee member
Sixth Advisor's Name
Seetharami Seelam
Sixth Advisor's Committee Title
Committee member
Keywords
Big data, Storage management, I/O scheduler, Performance
Date of Defense
3-18-2016
Abstract
Computing systems are becoming increasingly data-intensive because of the explosion of data and the needs for processing the data, and storage management is critical to application performance in such data-intensive computing systems. However, existing resource management frameworks in these systems lack the support for storage management, which causes unpredictable performance degradations when applications are under I/O contention. Storage management of data-intensive systems is a challenging problem because I/O resources cannot be easily partitioned and distributed storage systems require scalable management. This dissertation presents the solutions to address these challenges for typical data-intensive systems including high-performance computing (HPC) systems and big-data systems.
For HPC systems, the dissertation presents vPFS, a performance virtualization layer for parallel file system (PFS) based storage systems. It employs user-level PFS proxies to interpose and schedule parallel I/Os on a per-application basis. Based on this framework, it enables SFQ(D)+, a new proportional-share scheduling algorithm which allows diverse applications with good performance isolation and resource utilization. To manage an HPC system’s total I/O service, it also provides two complementary synchronization schemes to coordinate the scheduling of large numbers of storage nodes in a scalable manner.
For big-data systems, the dissertation presents IBIS, an interposition-based big-data I/O scheduler. By interposing the different I/O phases of big-data applications, it schedules the I/Os transparently to the applications. It enables a new proportional-share scheduling algorithm, SFQ(D2), to address the dynamics of the underlying storage by adaptively adjusting the I/O concurrency. Moreover, it employs a scalable broker to coordinate the distributed I/O schedulers and provide proportional sharing of a big-data system’s total I/O service.
Experimental evaluations show that these solutions have low-overhead and provide strong I/O performance isolation. For example, vPFS’ overhead is less than 3% in through- put and it delivers proportional sharing within 96% of the target for diverse workloads; and IBIS provides up to 99% better performance isolation for WordCount and 30% better proportional slowdown for TeraSort and TeraGen than native YARN.
Identifier
FIDC000251
Recommended Citation
Xu, Yiqi, "Storage Management of Data-intensive Computing Systems" (2016). FIU Electronic Theses and Dissertations. 2474.
https://digitalcommons.fiu.edu/etd/2474
Rights Statement
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).