Technical Report CS-2006-18

Title: RepeatsHunter: a tool for identifying and visualizing tandem repeats
Authors: Ydo Wexler, Amir Rix, Ron Lange, Avi Nehama, Yael Danin-Poleg, Michael Shmoish, Yechezkel Kashi and Dan Geiger
Abstract: Tandem Repeats (TRs) are head to tail perfect or approximate duplications that abundantly occur in genomic sequences. They tend to be highly polymorphic due to large variation in the number of repeats. These repeats are known to be the cause of several diseases as well as useful markers for genetic studies. In recent years several algorithms for detecting approximate tandem repeats were suggested.

However, often due to technological limitations, sequencers designate a base 'N' meaning "no call" when they are unable to call a base at a specific position, and many genomic sequences contain the letter 'N' in addition to the four letters of DNA. Current algorithms for detecting approximate tandem repeats are designed to process sequences with known symbols, and therefore, do not correctly detect TRs in sequences that contain the symbol 'N'. Here, we present an efficient algorithm for detecting approximate tandem repeats in genomic sequences that may contain the symbol 'N'. The ideas and methods underlying the algorithm are described and its effectiveness on genomic data is demonstrated. This algorithm is incorporated in a new tool called RepeatsHunter that enables to search perfect as well as approximate tandem repeats of different kinds, and then visualize them via the UCSC Genome Browser.

