Technical Report CS-2013-05

Title: Iterative Referencing for Improving the Interpretation of DNA Sequence Data
Authors: Alaa Ghanayim and Dan Geiger
Abstract: Next-Generation Sequencing (NGS) facilitates genetic studies to discover SNP and INDEL variations associated with Mendelian and complex diseases. The measurement process, which generates millions of short DNA reads, creates various data processing and interpretation challenges for which a multitude of software tools are being developed. A key parameter in this process is mapping accuracy because reads mapped incorrectly to a reference genome dramatically increase false positive rates of discovered variations and lower the detection rate of true positive variations. We present a revised approach, Iterative Referencing (IR), that increases the accuracy of mapping the sequenced data by iteratively improving the reference genome via the Expectation Maximization (EM) algorithm. The idea is that if sufficient number of reads in the data set contain a specific homozygous variation that is not seen in the reference genome then the reference genome should be altered to contain that variation to better represent the data under study. The results demonstrate that IR improves the alignment process by up to 6.8%, increases the detection rate of true variations by up to 3.5%, and decreases the rate of false variations up to 1.5%, all measured with respect to the original reference genome.
CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the CS technical reports of 2013
To the main CS technical reports page

Computer science department, Technion