Dalia Alperovich, M.Sc. Thesis Seminar
Thursday, 30.6.2016, 10:00
It is often the case in biological measurement data that results are given as a ranked list of quantities, for example
differential expression (DE) of genes as inferred from microarrays or RNA-seq. Recent years brought considerable progress
in statistical tools for enrichment analysis in ranked lists. Several tools are now available that allow users to break
the fixed set paradigm in assessing statistical enrichment of sets of genes. Continuing with the example, these tools
identify factors that may be associated with measured differential expression. Further improving these tools, we would
like to address relationships between factors. For example, genes targeted by multiple miRNAs may play a central role in
measured DE signal but the effect of each single miRNA is too subtle to be detected.
We propose statistical and algorithmic approaches for selecting, from an input collection of factors, a sub-collection that
can be aggregated into one ranked list that is heuristically most associated with an input ranked list (pivot). A naive
approach to this task is exponential in the number of factors under consideration. We examine performance on simulated data
and apply our approach to cancer datasets. We find small sub-collections of miRNA that are statistically associated with
gene DE in several types of cancer, suggesting miRNA cooperativity in driving disease related processes. Many of our
findings are consistent with known roles of miRNAs in cancer, while others suggest previously unknown roles for certain miRNAs.