Document Type
Dissertation
Degree
Doctor of Philosophy (PhD)
Major/Program
Computer Science
First Advisor's Name
Giri Narasimhan
First Advisor's Committee Title
Committee Chair
Second Advisor's Name
Ruogu Fang
Second Advisor's Committee Title
Committee Member
Third Advisor's Name
Jennifer Clarke
Third Advisor's Committee Title
Committee Member
Fourth Advisor's Name
Kalai Mathee
Fourth Advisor's Committee Title
Committee Member
Fifth Advisor's Name
Leonardo Bobadilla
Fifth Advisor's Committee Title
Committee Member
Keywords
Cloud Computing, MapReduce, Hilbert Curve, Deep Learning, Metagenomics, Microbiome, DNA Sequencing, Image Analysis, Neural Networks, Genomics
Date of Defense
3-20-2020
Abstract
Metagenomics is the study of the combined genetic material found in microbiome samples, and it serves as an instrument for studying microbial communities, their biodiversities, and the relationships to their host environments. Creating, interpreting, and understanding microbial community profiles produced from microbiome samples is a challenging task as it requires large computational resources along with innovative techniques to process and analyze datasets that can contain terabytes of information.
The community profiles are critical because they provide information about what microorganisms are present in the sample, and in what proportions. This is particularly important as many human diseases and environmental disasters are linked to changes in microbiome compositions.
In this work we propose novel approaches for the creation and interpretation of microbial community profiles. This includes: (a) a cloud-based, distributed computational system that generates detailed community profiles by processing large DNA sequencing datasets against large reference genome collections, (b) the creation of Microbiome Maps: interpretable, high-resolution visualizations of community profiles, and (c) a machine learning framework for characterizing microbiomes from the Microbiome Maps that delivers deep insights into microbial communities.
The proposed approaches have been implemented in three software solutions: Flint, a large scale profiling framework for commercial cloud systems that can process millions of DNA sequencing fragments and produces microbial community profiles at a very low cost; Jasper, a novel method for creating Microbiome Maps, which visualizes the abundance profiles based on the Hilbert curve; and Amber, a machine learning framework for characterizing microbiomes using the Microbiome Maps generated by Jasper with high accuracy.
Results show that Flint scales well for reference genome collections that are an order of magnitude larger than those used by competing tools, while using less than a minute to profile a million reads on the cloud with 65 commodity processors. Microbiome maps produced by Jasper are compact, scalable representations of extremely complex microbial community profiles with numerous demonstrable advantages, including the ability to display latent relationships that are hard to elicit. Finally, experiments show that by using images as input instead of unstructured tabular input, the carefully engineered software, Amber, can outperform other sophisticated machine learning tools available for classification of microbiomes.
Identifier
FIDC008938
ORCID
https://orcid.org/0000-0003-4662-8269
Previously Published In
Valdes, C., Stebliankin, V., & Narasimhan, G. (2019). Large scale microbiome profiling in the cloud. Bioinformatics (Oxford, England), 35(14), i13–i22. http://doi.org/10.1093/bioinformatics/btz356
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Recommended Citation
Valdes, Camilo, "Scalable Profiling and Visualization for Characterizing Microbiomes" (2020). FIU Electronic Theses and Dissertations. 4411.
https://digitalcommons.fiu.edu/etd/4411
Included in
Bacteria Commons, Bioimaging and Biomedical Optics Commons, Bioinformatics Commons, Computer and Systems Architecture Commons, Digital Communications and Networking Commons, Disease Modeling Commons, Hardware Systems Commons, Other Computer Engineering Commons
Rights Statement
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).