Doctor of Philosophy (PhD)
First Advisor's Name
First Advisor's Committee Title
Second Advisor's Name
Second Advisor's Committee Title
Third Advisor's Name
Third Advisor's Committee Title
Fourth Advisor's Name
Fourth Advisor's Committee Title
Fifth Advisor's Name
Fifth Advisor's Committee Title
protein, sequence, evolution, eukaryotes, rates, disorder, structure, orthology, paralogy
Date of Defense
The myriad protein-coding genes found in present-day eukaryotes arose from a combination of speciation and gene duplication events, spanning more than one billion years of evolution. Notably, as these proteins evolved, the individual residues at each site in their amino acid sequences were replaced at markedly different rates. The relationship between protein structure, protein function, and site-specific rates of amino acid replacement is a topic of ongoing research. Additionally, there is much interest in the different evolutionary constraints imposed on sequences related by speciation (orthologs) versus sequences related by gene duplication (paralogs). A principal aim of this dissertation is to evaluate and characterize several broad trends in eukaryote protein evolution. To this end, I use sequence-based computational predictors of protein structure (intrinsic disorder and protein secondary structure) and protein function (predicted functional domains), in addition to Bayesian phylogenetic inference methods, to analyze thousands of homologous protein sequence clusters from four eukaryotic lineages: animals, plants, fungi and protists. Using these data, I performed large-scale factorial analyses, testing the correlation between protein structure/function and rates of sequence evolution. The combined results of these analyses somewhat corroborate the findings of previous research in the field, but they also illuminate a subtle interaction among multiple drivers of protein sequence evolution, which is consistently observed across multiple eukaryote groups. Furthermore, using the results of Bayesian phylogenetic analysis on real and simulated protein sequence alignments, I show that orthologous and paralogous proteins exhibit significantly different overall patterns of sequence divergence, indicating that paralogs tend to evolve under relaxed selective pressure. The acquisition of homologous biological sequence clusters is a prominent component of computational biological research. To assist in the identification of protein families within large sequence databases, I implement a simple, graph-based single-linkage clustering procedure, and I demonstrate its capacity to recover homologous subunits of the Rpt regulatory ring in the 26S proteasome complex.
Ahrens, Joseph Boehm, "Computational Analysis of Large-Scale Trends and Dynamics in Eukaryotic Protein Family Evolution" (2019). FIU Electronic Theses and Dissertations. 4039.
Available for download on Sunday, February 28, 2021
Applied Statistics Commons, Bioinformatics Commons, Biostatistics Commons, Computational Biology Commons, Evolution Commons, Genomics Commons, Molecular Biology Commons, Molecular Genetics Commons, Statistical Models Commons, Theory and Algorithms Commons
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).