Doctor of Philosophy (PhD)
First Advisor's Name
First Advisor's Committee Title
Second Advisor's Name
Second Advisor's Committee Title
Third Advisor's Name
Third Advisor's Committee Title
Fourth Advisor's Name
Fourth Advisor's Committee Title
Fifth Advisor's Name
Fifth Advisor's Committee Title
Sixth Advisor's Name
Sixth Advisor's Committee Title
Big data, Storage management, I/O scheduler, Performance
Date of Defense
Computing systems are becoming increasingly data-intensive because of the explosion of data and the needs for processing the data, and storage management is critical to application performance in such data-intensive computing systems. However, existing resource management frameworks in these systems lack the support for storage management, which causes unpredictable performance degradations when applications are under I/O contention. Storage management of data-intensive systems is a challenging problem because I/O resources cannot be easily partitioned and distributed storage systems require scalable management. This dissertation presents the solutions to address these challenges for typical data-intensive systems including high-performance computing (HPC) systems and big-data systems.
For HPC systems, the dissertation presents vPFS, a performance virtualization layer for parallel file system (PFS) based storage systems. It employs user-level PFS proxies to interpose and schedule parallel I/Os on a per-application basis. Based on this framework, it enables SFQ(D)+, a new proportional-share scheduling algorithm which allows diverse applications with good performance isolation and resource utilization. To manage an HPC system’s total I/O service, it also provides two complementary synchronization schemes to coordinate the scheduling of large numbers of storage nodes in a scalable manner.
For big-data systems, the dissertation presents IBIS, an interposition-based big-data I/O scheduler. By interposing the different I/O phases of big-data applications, it schedules the I/Os transparently to the applications. It enables a new proportional-share scheduling algorithm, SFQ(D2), to address the dynamics of the underlying storage by adaptively adjusting the I/O concurrency. Moreover, it employs a scalable broker to coordinate the distributed I/O schedulers and provide proportional sharing of a big-data system’s total I/O service.
Experimental evaluations show that these solutions have low-overhead and provide strong I/O performance isolation. For example, vPFS’ overhead is less than 3% in through- put and it delivers proportional sharing within 96% of the target for diverse workloads; and IBIS provides up to 99% better performance isolation for WordCount and 30% better proportional slowdown for TeraSort and TeraGen than native YARN.
Xu, Yiqi, "Storage Management of Data-intensive Computing Systems" (2016). FIU Electronic Theses and Dissertations. 2474.
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).