Abstract:
The human genome, the hereditary material we pass on to our
progeny, can be seen as a 3 billion letter string over a DNA
alphabet of four. We currently understand 1.5% of this mass,
mostly in the form of genes, DNA substrings that explain how to
build proteins, the quintessential constituents of every living
cell. The remainder 98.5% of our genome was deemed as "junk".
This picture changed recently when we first obtained the genome
sequence of other species. By comparing these genomes to ours we
were able to pinpoint the locations of a staggering one million
additional human subsequences that must be important to the
human cell but do not encode proteins. The functions of these
regions remain largely unknown, and their sheer volume
overwhelms any comprehensive experimental approach.
Guided by experimental results for few of these subsequence, we
can use computational approaches to deal with the tremendous
challenge of understanding this data and providing key
biological observations.
I will describe a graph theoretic approach to understand these
regions, analyze some of the most perplexing regions within the
human genome, and track down a phenomenon of turning genomic
junk into gold.
The talk will assume no prior knowledge in Molecular Biology.