What is it about?
Sorted merge is very slow when the number of objects is large. Merging VCF file is a prime example and commonly faced in large scale sequencing-based genetics study. We showed ways to do this in the cloud using distributed system including MapReduce, HBase and Spark, which is much faster.
Why is it important?
We are solving a very practical problem in genetics using cloud computing.
Read the Original
This page is a summary of: Optimized distributed systems achieve significant performance improvement on sorted merging of massive VCF files, GigaScience, May 2018, Oxford University Press (OUP), DOI: 10.1093/gigascience/giy052.
You can read the full text:
The following have contributed to this page