Technical Report MSC-2009-02

Title: Preparing SNP Data For Genetic Linkage Analysis
Authors: Anna Tzemach
Supervisors: Dan Geiger
Abstract: Single nucleotide polymorphisms (SNPs) are stably inherited, highly abundant, and distributed throughout the genome. Current estimates are that SNPs occur as frequently as every 100-300 bases. This implies that in an entire human genome there are approximately 10 to 30 million potential SNPs. More than 4 million SNPs have been identified and the information made public. Their large number makes SNPs good candidates for linkage analysis, but introduces new problems that were not significant for highly polymorphic markers. Part of the problems originate in the genotyping process, others are the result of restrictions of current linkage software. In the present thesis we propose an algorithm for preprocessing SNP data and implement a tool, the SNPdistiller, that handles the complete process of preparing SNPs for linkage analysis, from the data after genotyping to the creation of an input file suitable for currently available linkage analysis tools. The tool begins by removing erroneous and unlikely SNPs from the data and continues by organizing SNPs into clusters that simulate behavior of high polymorphic and informative markers. The algorithm takes into consideration both the genetic data and the capabilities of the linkage analysis software. Experimental results demonstrate the performance of SNPdistiller on simulated and real datasets. The thesis ends by proposing further enhancements to the algorithms implemented in SNPdistiller
CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2009
To the main CS technical reports page

Computer science department, Technion