Document Type

Dissertation

Degree

Doctor of Philosophy (PhD)

Major/Program

Computer Science

First Advisor's Name

Ananda Mohan Mondal

First Advisor's Committee Title

Co-Committee Chair

Second Advisor's Name

Giri Narasimhan

Second Advisor's Committee Title

Co-Committee Chair

Third Advisor's Name

Fahad Saeed

Third Advisor's Committee Title

Commitee Member

Fourth Advisor's Name

Leonardo Bobadilla

Fourth Advisor's Committee Title

Committee Member

Fifth Advisor's Name

Kalai Mathee

Fifth Advisor's Committee Title

Committee Member

Keywords

Cancer, Biomarker, Machine Learning, Bioinformatics, Computational Biology

Date of Defense

7-26-2023

Abstract

Biomarkers are highly significant in cancer research, diagnosis, and treatment as they help to comprehend the biological response mechanisms following an internal or external intervention. The next-generation sequencing technologies have significantly increased the generation of genomic, epigenomic, and transcriptomic data, resulting in data-driven biomarker discovery using both single-omics and multi-omics. The existing statistical approaches in identifying biomarker genes using comparative transcriptomics between cancer patients and healthy controls have two major shortcomings. Shortcoming 1: The existing approaches overlook critical biological phenomena like inter-gene association and groups of genes working together to trigger a particular ailment alongside crosstalk among these groups of genes. Shortcoming 2: These approaches fail to consider individual genetic and epigenetic variations in a tumor, termed intratumor heterogeneity (ITH). The state-of-the-art multi-omics analyses in cancer research use graph neural network-based (GNN-based) approaches for biomarker discovery and cancer subtype prediction. But these approaches have major limitations since they fail to determine the relative significance of the neighboring nodes (in this case, patients) in the graph and identify the most influential omics data when it comes to downstream analyses, such as cancer subtype classification, patient stratification, etc. We propose graph-theoretic, feature selection-based machine learning, and graph attention network-based approaches to overcome the shortcoming 1, shortcoming 2, and the limitations of GNN-based multi-omics analyses, respectively. Graph-Theoretic Approaches in Biomarker Discovery: We hypothesize that a group of genes work together by forming a clique-like structure, and a bipartite graph can represent the crosstalk between two groups of genes that form clique-like structures. To prove this hypothesis, gene expression data of three cancer types were analyzed separately. The biomarkers identified using the proposed graph-theoretic approaches were prognostically significant. Feature Selection-Based Approach in Evaluating ITH: ITH is defined by the diversity of the tumor cell subpopulations, which is the biggest obstacle in precision medicine. The major limitation of the state-of-the-art method in estimating ITH level using transcription profile is that it uses expression values of all the genes. We hypothesize that a reduced set of important genes (biomarkers) is sufficient to estimate the level of ITH. Our proposed deep learning-based feature selection approach identified a reduced set of genes, effectively estimating ITH levels in different patients. Multi-Omics Integration Using Graph Attention Networks (MOGAT): We propose MOGAT, a novel multi-omics integration approach, leveraging a graph attention network model that incorporates graph-based learning with an attention mechanism. It performs better than other GNN-based approaches in cancer subtype prediction and patient characterization. Overall, this dissertation is a significant step forward in discovering cancer biomarkers and cancer subtype prediction, which could help physicians select appropriate treatment strategies and thus reduce patients’ suffering.

Identifier

FIDC011171

ORCID

0000-0002-9494-9756

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Files over 15MB may be slow to open. For best results, right-click and select "Save as..."

Share

COinS
 

Rights Statement

Rights Statement

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).