Date of this Version

3-2019

Document Type

Report

Abstract

The most commonly employed method for peptide identification in mass-spectrometry based proteomics involves comparing experimentally obtained tandem MS/MS spectra against a set of theoretical MS/MS spectra. The theoretical MS/MS spectra data are predicted using protein sequence database. Most state-of-the-art peptide search algorithms index theoretical spectra data to quickly filter-in the relevant (similar) indexed spectra when searching an experimental MS/MS spectrum. Data filtration substantially reduces the required number of computationally expensive spectrum-to-spectrum comparison operations. However, the number of predicted (and indexed) theoretical spectra grows exponentially with increase in posttranslational modifications creating a memory and I/O bottleneck. In this paper, we present a parallel algorithm, called LBE, for efficient partitioning of theoretical spectra data on a distributed-memory architecture. Our proposed algorithm first groups the similar theoretical spectra. The groups are then finely split across the system allowing machines to perform almost equal amount of work when querying a MS/MS spectrum. Our results show that the compute load imbalance using LBE based data distribution is 20% allowing speedups of order of magnitudes over existing methods. The proposed algorithm has been implemented on a compute cluster using MPI library. Experimental results for increasing index sizes are reported in terms of execution time, speedups and memory footprint. To the best of our knowledge, LBE is the first load-balancing technique for MS/MS proteomics data on memory-distributed clusters that incorporates proteomics domain knowledge for efficient load-balancing. Source code is made available at: https://github.com/pcdslab/lbdslim/tree/mpi

Share

COinS
 

Rights Statement

Rights Statement

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).