Date of this Version
7-5-2019
Document Type
Article
Rights
by-nc
Abstract
Motivation
Bacterial metagenomics profiling for metagenomic whole sequencing (mWGS) usually starts by aligning sequencing reads to a collection of reference genomes. Current profiling tools are designed to work against a small representative collection of genomes, and do not scale very well to larger reference genome collections. However, large reference genome collections are capable of providing a more complete and accurate profile of the bacterial population in a metagenomics dataset. In this paper, we discuss a scalable, efficient and affordable approach to this problem, bringing big data solutions within the reach of laboratories with modest resources. Results
We developed FLINT, a metagenomics profiling pipeline that is built on top of the Apache Spark framework, and is designed for fast real-time profiling of metagenomic samples against a large collection of reference genomes. FLINT takes advantage of Spark’s built-in parallelism and streaming engine architecture to quickly map reads against a large (170 GB) reference collection of 43 552 bacterial genomes from Ensembl. FLINT runs on Amazon’s Elastic MapReduce service, and is able to profile 1 million Illumina paired-end reads against over 40 K genomes on 64 machines in 67 s—an order of magnitude faster than the state of the art, while using a much larger reference collection. Streaming the sequencing reads allows this approach to sustain mapping rates of 55 million reads per hour, at an hourly cluster cost of $8.00 USD, while avoiding the necessity of storing large quantities of intermediate alignments.
DOI
10.1093/bioinformatics/btz356
Identifier
FIDC008156
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Recommended Citation
Valdes, Camilo; Stebliankin, Vitalii; and Narasimhan, Giri, "Large scale microbiome profiling in the cloud" (2019). Biomolecular Sciences Institute: Faculty Publications. 32.
https://digitalcommons.fiu.edu/biomolecular_fac/32
Rights Statement
In Copyright - Non-Commmercial Use Permitted. URI: http://rightsstatements.org/vocab/InC-NC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. In addition, no permission is required from the rights-holder(s) for non-commercial uses. For other uses you need to obtain permission from the rights-holder(s).
Comments
Originally published in Bioinformatics.