Changepoint analysis for efficient variant calling

Author(s): Bloniarz, A; Talwalkar, A; Terhorst, J; Jordan, MI; Patterson, D; Yu, B; Song, YS | Abstract: We present CAGe, a statistical algorithm which exploits high sequence identity between sampled genomes and a reference assembly to streamline the variant calling process. Using a combination of changepoint detection, classification, and online variant detection, CAGe is able to call simple variants quickly and accurately on the 90-95% of a sampled genome which differs little from the reference, while correctly learning the remaining 5-10% that must be processed using more computationally expensive methods. CAGe runs on a deeply sequenced human whole genome sample in approximately 20 minutes, potentially reducing the burden of variant calling by an order of magnitude after one memory-efficient pass over the data. © 2014 Springer International Publishing Switzerland.