Abstract:
We describe a novel method for efficient reconstruction of phylogenetic
trees, based on sequences of whole genomes or proteomes. The core of our
proposal is an algorithm for efficiently computing pairwise distances.
This is a simple string algorithm, which has a basis in information
theoretic measures.
The algorithm is fast enough to enable constructing the tree of two
hundreds species, and the forest of almost two thousand viruses. An
initial analysis of the results exhibits a remarkable agreement with the
"acceptable phylogenetic truth". We will compare it to other, known
whole-genome reconstruction techniques. This comparison shows a
significant improvement introduced by our method over the existing
approaches, both in terms of performance and of accuracy.
Joint work with David Burstein, Igor Ulitsky, and Tamir Tuller