Sagi Snir (Institute of Evolution, Haifa University)
Thursday, 12.3.2009, 13:30
Bioinformatics Forum: The reconstruction of evolutionary trees (also known as “phylogenies”) is central to many problems in Biology. With the massive amounts of molecular data being produced, attention has increasingly turned to phylogenetic analyses of larger datasets, containing several hundreds to several thousand nucleotide sequences. Accordingly, a new program, “Assembling the Tree of Life”, has set the goal of producing a highly accurate estimate of the evolutionary history of all life on earth. Key to this goal is the ability to estimate very accurate trees on different groups of organisms, and then combine these different trees into a tree on the full dataset. Although much advancement has been achieved in the first task, including analytical accurate maximum likelihood solutions for very small trees, little has been done in the latter step. This task, combining small trees into a big tree, is the supertree task– and no really accurate supertree method yet exists.
Perhaps the simplest version of this task that is still widely applicable, yet quite challenging, is quartet based reconstruction. This problem lies at the root of many tree reconstruction methods and theoretical as well as experimental results have been reported. Nevertheless, fundamental problems such as dealing with conflicting quartet trees or even with arbitrary congruent quartet trees remains problematic.
In a series of works we have developed a graph theoretically based approach for the supertree task. Our approach is based on a divide and conquer algorithm where our divide step uses a semi-definite programming (SDP) formulation of MaxCut in a graph representing relationships between the organisms. We devised an extremely fast SDP-like heuristic that allows us to extend the input data from several thousands of quartet trees over few dozens of species to tens of millions of quartet trees over thousands of species, while handling satisfactorily conflicts in the inputs. These results are promising in the realm of large scale phylogenetic reconstruction. We applied our algorithms in both quartet based and supertree reconstruction and showed improvements over widely used existing methods.
We also show theoretical guarantees for our method over a large family of inputs where the best known result is a random tree.
In the talk I will review open problems and future research directions. The talk is self-contained and requires no prior knowledge in Biology.
Based on works with Satish Rao, Tandy Warnow and Raphy Yuster.
Host: Ron Pinter