TR#: | CS-2006-06 |
Class: | CS |
Title: | Evaluation of scoring functions for protein multiple sequence |
Authors: | Sivan Yogev and Shlomo Moran |
CS-2006-06.pdf | |
Abstract: | The process of aligning a group of protein sequences to obtain a
meaningful Multiple Sequence Alignment (MSA) is a basic tool in
current bioinformatic research. The development of new MSA
algorithms raises the need for an efficient way to evaluate the
quality of an alignment, in order to select the best alignment
among the ones produced by the available algorithms. A natural way
to evaluate the quality of alignments is by the use of
scoring functions, which assigns for each alignment a number
reflecting its quality. Different scoring functions for MSA were
proposed over the years, which raised the need for methodological
ways to asses the quality of such functions.
Few methods for assessing the quality of scoring functions for pairwise alignments were proposed. These methods are based on comparing alignments which are optimal for a given scoring function to structural alignments (alignments obtained through analysis of the 3 dimensional structures of related proteins). A main obstacle in using the above methods for evaluating scoring functions for alignments of k > 2 sequences is the unavailability of efficient algorithms for computing optimal alignments (for a given scoring function) of even moderate number of sequences. We propose a framework for bypassing this difficulty, which is based on computing the correlation between suboptimal alignments. An inherent issue that needs to be addressed in our method is the identification of an appropriate sample set of alignments to be used in the correlation test. We describe this problem, suggest a solution and report results using this solution. Our results indicates that for most scoring functions, the addition of appropriate gap penalties improves the quality of the function. One notable exception is COFFEE, for which the average improvement after adding gap penalties was negligent in all of our experiments. COFFEE was also the best function in the average quality for the entire benchmark tested. |
Copyright | The above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information |
Remark: Any link to this technical report should be to this page (http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi/2006/CS/CS-2006-06), rather than to the URL of the PDF files directly. The latter URLs may change without notice.
To the list of the CS technical reports of 2006
To the main CS technical reports page