Abstract:
Phylogenetic trees represent evolutionary history, by showing the course of
evolution from some extinct common ancestor to today's species. These trees
can reveal connections between different species, teach us about extinct
species, and help us understand the development of viruses, and other
organisms with high mutation rate.
The most common way of reconstructing phylogenetic trees is based on
sequencing DNA from different species, and using similarities between
sequences to infer the tree. This requires some model of how DNA evolves
between an ancestor species and its descendants. The simplest model assumes
that there are i.i.d mutations, and that each mutation is a substitution,
and under this model, there are provably good reconstruction algorithm.
However, recent studies show that insertions and deletions mutations are a
serious cause of reconstruction errors, and can not be ignored.
We present the first efficient algorithm for tree reconstruction when the
mutations can be substitutions, insertions and deletions. The algorithm uses
a clustering based approach, which builds small parts of the tree, and glues
them together. In addition, the algorithm outputs a global alignment of the
DNA sequences, which respects the evolutionary history.
Based on joint works with Alex Andoni, Mark Braverman, Costis Daskalakis and
Sebasiten Roch