Abstract:
The cooperative binding of transcription factors to specific
regulatory sequence elements is a primary mechanism for controlling
gene expression. In the talk I will describe a framework for finding
recurrent regulatory sequence patterns in a selected set of genes and
scoring their statistical significance. Proceeding from a database of
identified binding site motifs and their genomic locations, we seek
motifs whose frequency in the selected set is different than in a
background set. I will present a statistical test designed for this
purpose. I will then provide a hashing algorithm for detecting
combinations of these motifs that co-occur in modules within the
selected genes. The significance of such co-occurrences is evaluated
using novel statistical scores. Our methods are combined in CREME,
a suite of software which includes a browser for viewing the pattern
of occurrence of selected modules. We applied CREME to find modules
within human-mouse conserved promoter segments, focusing on cell
cycle regulated genes and stress response related genes.
To validate the biological significance of the identified modules
we tested whether the associated genes tended to be co-expressed
or share similar function. In the cell cycle set five of the seven
identified sets of genes were coherently expressed. On the
stress response data four of the six detected sets fell predominantly
into well-defined functional sub-categories.
Joint work with Ivan Ovcharenko, Asa Ben-Hur and Richard Karp.