Document Type
Dissertation
Degree
Doctor of Philosophy (PhD)
Major/Program
Computer Science
First Advisor's Name
Ananda Mohan Mondal
First Advisor's Committee Title
Co-Committee Chair
Second Advisor's Name
Giri Narasimhan
Second Advisor's Committee Title
Co-Committee Chair
Third Advisor's Name
Fahad Saeed
Third Advisor's Committee Title
Commitee Member
Fourth Advisor's Name
Leonardo Bobadilla
Fourth Advisor's Committee Title
Committee Member
Fifth Advisor's Name
Kalai Mathee
Fifth Advisor's Committee Title
Committee Member
Keywords
Cancer, Biomarker, Machine Learning, Bioinformatics, Computational Biology
Date of Defense
7-26-2023
Abstract
Biomarkers are highly significant in cancer research, diagnosis, and treatment as they help to comprehend the biological response mechanisms following an internal or external intervention. The next-generation sequencing technologies have significantly increased the generation of genomic, epigenomic, and transcriptomic data, resulting in data-driven biomarker discovery using both single-omics and multi-omics. The existing statistical approaches in identifying biomarker genes using comparative transcriptomics between cancer patients and healthy controls have two major shortcomings. Shortcoming 1: The existing approaches overlook critical biological phenomena like inter-gene association and groups of genes working together to trigger a particular ailment alongside crosstalk among these groups of genes. Shortcoming 2: These approaches fail to consider individual genetic and epigenetic variations in a tumor, termed intratumor heterogeneity (ITH). The state-of-the-art multi-omics analyses in cancer research use graph neural network-based (GNN-based) approaches for biomarker discovery and cancer subtype prediction. But these approaches have major limitations since they fail to determine the relative significance of the neighboring nodes (in this case, patients) in the graph and identify the most influential omics data when it comes to downstream analyses, such as cancer subtype classification, patient stratification, etc. We propose graph-theoretic, feature selection-based machine learning, and graph attention network-based approaches to overcome the shortcoming 1, shortcoming 2, and the limitations of GNN-based multi-omics analyses, respectively. Graph-Theoretic Approaches in Biomarker Discovery: We hypothesize that a group of genes work together by forming a clique-like structure, and a bipartite graph can represent the crosstalk between two groups of genes that form clique-like structures. To prove this hypothesis, gene expression data of three cancer types were analyzed separately. The biomarkers identified using the proposed graph-theoretic approaches were prognostically significant. Feature Selection-Based Approach in Evaluating ITH: ITH is defined by the diversity of the tumor cell subpopulations, which is the biggest obstacle in precision medicine. The major limitation of the state-of-the-art method in estimating ITH level using transcription profile is that it uses expression values of all the genes. We hypothesize that a reduced set of important genes (biomarkers) is sufficient to estimate the level of ITH. Our proposed deep learning-based feature selection approach identified a reduced set of genes, effectively estimating ITH levels in different patients. Multi-Omics Integration Using Graph Attention Networks (MOGAT): We propose MOGAT, a novel multi-omics integration approach, leveraging a graph attention network model that incorporates graph-based learning with an attention mechanism. It performs better than other GNN-based approaches in cancer subtype prediction and patient characterization. Overall, this dissertation is a significant step forward in discovering cancer biomarkers and cancer subtype prediction, which could help physicians select appropriate treatment strategies and thus reduce patients’ suffering.
Identifier
FIDC011171
ORCID
0000-0002-9494-9756
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Tanvir, Raihanul Bari, "Graph-Theoretic and Machine Learning-Based Frameworks for Cancer Biomarker Discovery" (2023). FIU Electronic Theses and Dissertations. 5396.
https://digitalcommons.fiu.edu/etd/5396
Rights Statement
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).