Document Type
Dissertation
Degree
Doctor of Philosophy (PhD)
Major/Program
Biology
First Advisor's Name
Jessica Siltberg-Liberles
First Advisor's Committee Title
Committee chair
Second Advisor's Name
Timothy Collins
Second Advisor's Committee Title
Committee member
Third Advisor's Name
Heather Bracken-Grissom
Third Advisor's Committee Title
Committee member
Fourth Advisor's Name
Prem Chapagain
Fourth Advisor's Committee Title
Committee member
Fifth Advisor's Name
Giri Narasimhan
Fifth Advisor's Committee Title
Committee member
Keywords
protein, sequence, evolution, eukaryotes, rates, disorder, structure, orthology, paralogy
Date of Defense
3-28-2019
Abstract
The myriad protein-coding genes found in present-day eukaryotes arose from a combination of speciation and gene duplication events, spanning more than one billion years of evolution. Notably, as these proteins evolved, the individual residues at each site in their amino acid sequences were replaced at markedly different rates. The relationship between protein structure, protein function, and site-specific rates of amino acid replacement is a topic of ongoing research. Additionally, there is much interest in the different evolutionary constraints imposed on sequences related by speciation (orthologs) versus sequences related by gene duplication (paralogs). A principal aim of this dissertation is to evaluate and characterize several broad trends in eukaryote protein evolution. To this end, I use sequence-based computational predictors of protein structure (intrinsic disorder and protein secondary structure) and protein function (predicted functional domains), in addition to Bayesian phylogenetic inference methods, to analyze thousands of homologous protein sequence clusters from four eukaryotic lineages: animals, plants, fungi and protists. Using these data, I performed large-scale factorial analyses, testing the correlation between protein structure/function and rates of sequence evolution. The combined results of these analyses somewhat corroborate the findings of previous research in the field, but they also illuminate a subtle interaction among multiple drivers of protein sequence evolution, which is consistently observed across multiple eukaryote groups. Furthermore, using the results of Bayesian phylogenetic analysis on real and simulated protein sequence alignments, I show that orthologous and paralogous proteins exhibit significantly different overall patterns of sequence divergence, indicating that paralogs tend to evolve under relaxed selective pressure. The acquisition of homologous biological sequence clusters is a prominent component of computational biological research. To assist in the identification of protein families within large sequence databases, I implement a simple, graph-based single-linkage clustering procedure, and I demonstrate its capacity to recover homologous subunits of the Rpt regulatory ring in the 26S proteasome complex.
Identifier
FIDC007082
ORCID
https://orcid.org/0000-0003-0716-1117
Recommended Citation
Ahrens, Joseph Boehm, "Computational Analysis of Large-Scale Trends and Dynamics in Eukaryotic Protein Family Evolution" (2019). FIU Electronic Theses and Dissertations. 4039.
https://digitalcommons.fiu.edu/etd/4039
Included in
Applied Statistics Commons, Bioinformatics Commons, Biostatistics Commons, Computational Biology Commons, Evolution Commons, Genomics Commons, Molecular Biology Commons, Molecular Genetics Commons, Statistical Models Commons, Theory and Algorithms Commons
Rights Statement
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).