Technical Report PHD-2006-07

Title: Spotting Regulatory Elements by Micro- Arrays
Authors: Chaya Ben-Zaken Zilberstein
Supervisors: Zohar Yakhini
Abstract: The recent availability of human and mouse genomes has shown that the vast majority of their genes are almost identical. The differences between humans and mice most likely stem from variations in their gene regulatory networks. Gene regulatory networks determine the expression levels of the different genes in the genome. Though most cells in our bodies contain the same genes, not all of the genes are used in each cell. Some genes are turned on, or ”expressed” when needed. Many genes are used to specify features unique to each type of cell or condition.

To date, much of the regulatory networks is unknown. Even with the tremendous success of genome sequencing efforts and the numerous complete genome sequences available, much remains unknown regarding the regulatory networks of any of these sequenced genomes. The first step toward elucidating cellular regulatory networks is to identify the individual regulatory elements comprising them. In this thesis, we develop automatic methods to decipher some of these regulatory elements.

Microarray is a novel technology that enables the measurement of the expression levels of thousands of genes for any cell type or condition. In order to identify novel regulatory elements, we incorporate the measurements results of microarrays with genomic sequences. We seek elements, which are statistically significantly correlated with the expression levels of a single microarray measurement (we provide two statistical approaches to assess good correlation). For example, motifs enriched in the promoters of genes with high expression levels are good candidates for representing the binding sites of active transcription factors.

These transcription factors putatively enhance RNA transcription under the corresponding condition. Aside from transcription factor binding sites (TFBSs), we also allow the discovery of other regulatory elements, including: motifs controlling mRNA stability, and microRNAs.

Since using our approach involves the analysis of data in a genome-wide scale, severe computational efficiency problems arise. These obstacles are addressed in this thesis by developing novel algorithms, in order to provide practical and efficient solutions to the challenges involved.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the PHD technical reports of 2006
To the main CS technical reports page

Computer science department, Technion