Doron Lipson's Abstracts
| N. Peled, G. Palmer, F. Hirsch,
M.W. Wynes, M. Ilouze, M. Varella-Garcia, L.
Soussan-Gutman, G.A. Otto, P.J. Stephens, J. Ross, M.T.
Cronin, D. Lipson, and V.A. Miller
Next-Generation Sequencing Identifies and Immunohistochemistry Confirms a Novel Crizotinib-Sensitive ALK Rearrangement in a Patient with Metastatic Non–Small-Cell Lung Cancer
The novel rearrangement identiﬁed in this case is complex
and was not detected by the Break-Apart FISH assay. NGS
showed that the patient’s tumor harbored a complex
EML4-ALK rearrangement at the genomic level. Clinical and
radiographic evidence conﬁrmed a rapid response to
crizotinib. NGS should be considered in NSCLC patients
with high likelihood of a driver kinase alteration when
none is identiﬁed by other methods.
Journal of Thoracic Oncology, 7(9):e14–e16, 2012. [pdf]
| D. Lipson*, M. Capelletti*, R.
Yelensky, G. Otto, A. Parker, M. Jarosz, J.A. Curran, S.
Balasubramanian, T. Bloom, K.W. Brennan, A. Donahue, S.R.
Downing, G.M. Frampton, L. Garcia, F. Juhn, K.C. Mitchell,
E. White, J. White, Z. Zwirko, T. Peretz, H. Nechushtan,
L. Soussan-Gutman, J. Kim, H. Sasaki, H.R. Kim, S. Park,
D. Ercan, C.E. Sheehan, J.S. Ross, M.T. Cronin, P.A.
Jänne and P.J. Stephens
Identification of new ALK and RET gene fusions from colorectal and lung cancer biopsies
Applying a next-generation sequencing assay targeting 145 cancer-relevant genes in 40 colorectal cancer and 24 non–small cell lung cancer formalin-fixed paraffin-embedded tissue specimens identified at least one clinically relevant genomic alteration in 59% of the samples and revealed two gene fusions, C2orf44-ALK in a colorectal cancer sample and KIF5B-RET in a lung adenocarcinoma. Further screening of 561 lung adenocarcinomas identified 11 additional tumors with KIF5B-RET gene fusions (2.0%; 95% CI 0.8–3.1%). Cells expressing oncogenic KIF5B-RET are sensitive to multi-kinase inhibitors that inhibit RET.
Nature Medicine 18, 382–384 (2012). [pdf]
| J.F. Thompson*, J.G.
Reifenberger*, E. Giladi, K. Kerouac, J. Gill, E. Hansen,
A. Kahvejian, P. Kapranov, T. Knope, D. Lipson, K.E.
Steinmann and P.M. Milos
Single-Step Capture and Sequencing of Natural DNA for Detection of BRCA1 Mutations
Genetic testing for disease risk is an increasingly important component of medical care. However, testing can be expensive which can lead to patients and physicians having limited access to the genetic information needed for medical decisions. To simplify DNA sample preparation and lower costs, we have developed a system in which any gene can be captured and sequenced directly from human genomic DNA without amplification, using no proteins or enzymes prior to sequencing. Extracted whole-genome DNA is acoustically sheared and loaded in a flow cell channel for single-molecule sequencing. Gene isolation, amplification, or ligation is not necessary. Accurate and low cost detection of DNA sequence variants is demonstrated for the BRCA1 gene. Disease-causing mutations as well as common variants from well-characterized samples are identified. Single-molecule sequencing generates very reproducible coverage patterns and these can be used to detect any size insertion or deletion directly, unlike PCR-based methods which require additional assays. Because no gene isolation or amplification is required for sequencing, the exceptionally low costs of sample preparation and analysis could make genetic tests more accessible to those who wish to know their own disease susceptibility. Additionally, this approach has applications for sequencing integration sites for gene therapy vectors, transposons, retroviruses, and other mobile DNA elements in a more facile manner than possible with other methods.
Genome Research, 22:340-345, 2012. [pdf]
| L.T. Sam, D. Lipson, T. Raz, X.
Cao, J.F. Thompson, P.M. Milos, D. Robinson, A.M.
Chinnaiyan, C. Kumar-Sinha, C.A. Maher
A Comparison of Single Molecule and Amplification Based Sequencing of Cancer Transcriptomes
The second wave of next generation sequencing technologies, referred to as single-molecule sequencing (SMS), carries the promise of profiling samples directly without employing polymerase chain reaction steps used by amplification-based sequencing (AS) methods. To examine the merits of both technologies, we examine mRNA sequencing results from single-molecule and amplification-based sequencing in a set of human cancer cell lines and tissues. We observe a characteristic coverage bias towards high abundance transcripts in amplification-based sequencing. A larger fraction of AS reads cover highly expressed genes, such as those associated with translational processes and housekeeping genes, resulting in relatively lower coverage of genes at low and mid-level abundance. In contrast, the coverage of high abundance transcripts plateaus off using SMS. Consequently, SMS is able to sequence lower- abundance transcripts more thoroughly, including some that are undetected by AS methods; however, these include many more mapping artifacts. A better understanding of the technical and analytical factors introducing platform specific biases in high throughput transcriptome sequencing applications will be critical in cross platform meta-analytic studies.
PLoS One, 6(3):e17305, 2011. [pdf]
| T. Raz, P. Kapranov, D. Lipson, S.
Letovsky, P.M. Milos, J.F. Thompson
Protocol Dependence of Sequencing-based Gene Expression Measurements
RNA Seq provides unparalleled levels of information about the transcriptome including precise expression levels over a wide dynamic range. It is essential to understand how technical variation impacts the quality and interpretability of results, how potential errors could be introduced by the protocol, how the source of RNA affects transcript detection, and how all of these variations can impact the conclusions drawn. Multiple human RNA samples were used to assess RNA fragmentation, RNA fractionation, cDNA synthesis, and single versus multiple tag counting. Though protocols employing polyA RNA selection generate the highest number of non-ribosomal reads and the most precise measurements for coding transcripts, such protocols were found to detect only a fraction of the non-ribosomal RNA in human cells. PolyA RNA excludes thousands of annotated and even more unannotated transcripts, resulting in an incomplete view of the transcriptome. Ribosomal-depleted RNA provides a more cost-effective method for generating complete transcriptome coverage. Expression measurements using single tag counting provided advantages for assessing gene expression and for detecting short RNAs relative to multi-read protocols. Detection of short RNAs was also hampered by RNA fragmentation. Thus, this work will help researchers choose from among a range of options when analyzing gene expression, each with its own advantages and disadvantages.
PLoS One, 6(5):e19287, 2011. [pdf]
| D.T. Ting*, D. Lipson*, S. Paul,
B.W. Brannigan, S. Akhavanfard, E.J. Coffman, G. Contino,
V. Deshpande, A.J. Iafrate, S. Letovsky, M.N. Rivera, N.
Bardeesy, S. Maheswaran, D.A. Haber
Aberrant Overexpression of Satellite Repeats in Pancreatic and Other Epithelial Cancers
Satellite repeats in heterochromatin are transcribed into noncoding RNAs that have been linked to gene silencing and maintenance of chromosomal integrity. Using digital gene expression analysis, we show that these transcripts are greatly overexpressed in mouse and human epithelial cancers. In 8 of 10 mouse pancreatic ductal adenocarcinomas (PDAC), pericentromeric satellites accounted for a mean 12% (range 1 to 50%) of all cellular transcripts, a mean 40-fold increase over normal tissue. In 15/15 human PDACs, alpha satellite transcripts were most abundant and HSATII transcripts were highly specific for cancer. Similar patterns were observed in cancers of lung, kidney, ovary, colon, and prostate. Derepression of satellite transcripts correlated with overexpression of the LINE-1 retrotransposon and with aberrant expression of neuroendocrine-associated genes proximal to LINE-1 insertions. The overexpression of satellite transcripts in cancer may reflect global alterations in heterochromatin silencing and could potentially be useful as a biomarker for cancer detection.
Science, 331(6017):593-6, 2011. [pdf]
| P. Kapranov, F. Ozsolak , S.W. Kim, S.
Foissac, D. Lipson, C. Hart, S. Roels, C. Borel, S.E.
Antonarakis, A.P. Monaghan, B. John, P.M. Milos
New class of gene-termini-associated human RNAs suggests a novel RNA copying mechanism
Small (< 200 nucleotide) RNA (sRNA) profiling of human cells using various technologies demonstrates unexpected complexity of sRNAs with hundreds of thousands of sRNA species present. Genetic and in vitro studies show that these RNAs are not merely degradation products of longer transcripts but could indeed have a function. Furthermore, profiling of RNAs, including the sRNAs, can reveal not only novel transcripts, but also make clear predictions about the existence and properties of novel biochemical pathways operating in a cell. For example, sRNA profiling in human cells indicated the existence of an unknown capping mechanism operating on cleaved RNA2, a biochemical component of which was later identified. Here we show that human cells contain a novel type of sRNA that has non-genomically encoded 5' poly(U) tails. The presence of these RNAs at the termini of genes, specifically at the very 3' ends of known mRNAs, strongly argues for the presence of a yet uncharacterized endogenous biochemical pathway in cells that can copy RNA. We show that this pathway can operate on multiple genes, with specific enrichment towards transcript-encoding components of the translational machinery. Finally, we show that genes are also flanked by sense, 3' polyadenylated sRNAs that are likely to be capped.
Nature, 466(7306):642-6, 2010. [pdf]
| Eldar Giladi, John Healy, Gene
Myers, Chris Hart, Phillip Kapranov, Doron Lipson, Steven
Roels, Edward Thayer, Stan Letovsky
Error tolerant indexing and alignment of short reads with covering template families
The rapid adoption of high-throughput next generation sequence data in biological research is presenting a major challenge for sequence alignment tools - specifically, the efficient alignment of vast amounts of short reads to large references in the presence of differences arising from sequencing errors and biological sequence variations. To address this challenge, we developed a short read aligner for high-throughput sequencer data that is tolerant of errors or mutations of all types - namely, substitutions, deletions, and insertions. The aligner utilizes a multi-stage approach in which template-based indexing is used to identify candidate regions for alignment with dynamic programming. A template is a pair of gapped seeds, with one used with the read and one used with the reference. In this article, we focus on the development of template families that yield error-tolerant indexing up to a given error-budget. A general algorithm for finding those families is presented, and a recursive construction that creates families with higher error tolerance from ones with a lower error tolerance is developed.
J. Computational Biology, 17(10):1279-1293, 2010. [pdf]
| Y. Buganim, I. Goldstein, D.
Lipson, M. Milyavsky, S. Polak-Charcon, C. Mardoukh, H.
Solomon, E. Kalo, S. Madar, R. Brosh, M. Perelman, R.
Navon, N. Goldfinger, I. Barshack, Z. Yakhini, V. Rotter
A Novel Translocation Breakpoint within the BPTF Gene Is Associated with a Pre-Malignant Phenotype
Partial gain of chromosome arm 17q is an abundant aberrancy in various cancer types such as lung and prostate cancer with a prominent occurrence and prognostic significance in neuroblastoma - one of the most common embryonic tumors. The specific genetic element/s in 17q responsible for the cancer-promoting effect of these aberrancies is yet to be defined although many genes located in 17q have been proposed to play a role in malignancy. We report here the characterization of a naturally-occurring, non-reciprocal translocation der(X)t(X;17) in human lung embryonal-derived cells following continuous culturing. This aberrancy was strongly correlated with an increased proliferative capacity and with an acquired ability to form colonies in vitro. The breakpoint region was mapped by fluorescence in situ hybridization (FISH) to the 17q24.3 locus. Further characterization by a custom-made comparative genome hybridization array (CGH) localized the breakpoint within the Bromodomain PHD finger Transcription Factor gene (BPTF), a gene involved in transcriptional regulation and chromatin remodeling. Interestingly, this translocation led to elevation in the mRNA levels of the endogenous BPTF. Knock-down of BPTF restricted proliferation suggesting a role for BPTF in promoting cellular growth. Furthermore, the BPTF chromosomal region was found to be amplified in various human tumors, especially in neuroblastomas and lung cancers in which 55% and 27% of the samples showed gain of 17q24.3, respectively. Additionally, 42% percent of the cancer cell lines comprising the NCI-60 had an abnormal BPTF locus copy number. We suggest that deregulation of BPTF resulting from the translocation may confer the cells with the observed cancer-promoting phenotype and that our cellular model can serve to establish causality between 17q aberrations and carcinogenesis.
PLoS ONE, 5(3): e9657, 2010 [pdf]
| Doron Lipson*, Tal Raz*, Alix Kieu,
Daniel R. Jones, Eldar Giladi, Edward Thayer, John F.
Thompson, Stan Letovsky, Patrice Milos, Marie Causey
Quantification of the Yeast Transcriptome by Single-molecule Sequencing
We present single-molecule sequencing digital gene expression (smsDGE), a high-throughput, amplification-free method for accurate quantification of the full range of cellular polyadenylated RNA transcripts using a Helicos Genetic Analysis system. smsDGE involves a reverse-transcription and polyA-tailing sample preparation procedure followed by sequencing that generates a single read per transcript. We applied smsDGE to the transcriptome of Saccharomyces cerevisiae strain DBY746, using 6 of the available 50 channels in a single sequencing run, yielding on average 12 million aligned reads per channel. Using spiked-in RNA, accurate quantitative measurements were obtained over four orders of magnitude. High correlation was demonstrated across independent flow-cell channels, instrument runs and sample preparations. Transcript counting in smsDGE is highly efficient due to the representation of each transcript molecule by a single read. This efficiency, coupled with the high throughput enabled by the single-molecule sequencing platform, provides an alternative method for expression profiling.
Nature Biotechnology, 27(7):652-8, 2009 [pdf]
Jayson Bowers, Judith Mitchell, Eric Beer, Philip R Buzby,
Marie Causey, J William Efcavitch, Mirna Jarosz, Edyta
Krzymanska-Olejnik, Li Kung, Doron Lipson, Geoffrey M
Lowman, Subramanian Marappan, Peter McInerney, Adam Platt,
Atanu Roy, Suhaib M Siddiqi, Kathleen Steinmann and John F
Virtual terminator nucleotides for next-generation DNA sequencing
We synthesized reversible terminators with tethered inhibitors for next-generation sequencing. These were efficiently incorporated with high fidelity while preventing incorporation of additional nucleotides, and we used them to sequence canine bacterial artificial chromosomes in a single-molecule system that provided even coverage for over 99% of the region sequenced. This single-molecule approach generated high-quality sequence data without the need for target amplification and thus avoided concomitant biases.
Nature Methods, 6, 593-595, 2009
| Margaret Taub, Doron Lipson and
Terence P. Speed
Methods for Allocating Ambiguous Short-reads
With the rise in prominence of biological research using new short-read DNA sequencing technologies comes the need for new techniques for aligning and assigning these reads to their genomic location of origin. Until now, methods for allocating reads which align with equal or similar fidelity to multiple genomic locations have not been model-based, and have tended to ignore potentially informative data. Here, we demonstrate that existing methods for assigning ambiguous reads can produce biased results. We then present new methods for allocating ambiguous reads to the genome, developed within a framework of statistical modeling, which show promise in alleviating these biases, both in simulated and real data.
Communications in Information and Systems, 10(2):69-82, 2010 [pdf]
| Eran Eden, Roy Navon, Israel
Steinfeld, Doron Lipson and Zohar Yakhini
GOrilla: a Tool for Discovery and Visualization of Enriched GO Terms in Ranked Gene Lists
Since the inception of the GO annotation project, a
variety of tools have been developed that support
exploring and searching the GO database. In particular, a
variety of tools that perform GO enrichment analysis are
currently available. Most of these tools require as input
a target set of genes and a background set and seek
enrichment in the target set compared to the background
set. A few tools also exist that support analyzing ranked
lists. The latter typically rely on simulations or on
union-bound correction for assigning statistical
significance to the results.
| S. Farkash-Amar, D. Lipson, A.
Polten, A. Goren, C. Helmstetter, Z. Yakhini and I. Simon
Global Organization of Replication Time Zones of the Mouse Genome
The division of genomes into distinct replication time zones has long been established. However, an in-depth understanding of their organization and their relationship to transcription is incomplete. Taking advantage of a novel synchronization method ("baby machine") and of genomic DNA microarrays, we have, for the first time, mapped replication times of the entire mouse genome at a high temporal resolution. Our data revealed that although most of the genome has a distinct time of replication either early, middle, or late S phase, a significant portion of the genome is replicated asynchronously. Analysis of the replication map revealed the genomic scale organization of the replication time zones. We found that the genomic regions between early and late replication time zones often consist of extremely large replicons. Analysis of the relationship between replication and transcription revealed that early replication is frequently correlated with the transcription potential of a gene and not necessarily with its actual transcriptional activity. These findings, along with the strong conservation found between replication timing in human and mouse genomes, emphasize the importance of replication timing in transcription regulation.
|Amir Ben-Dor, Doron Lipson, Anya
Tsalenko, Mark Reimers, Lars O. Baumbusch, Michael T.
Barrett, John N. Weinstein, Anne-Lise Borresen-Dale, Zohar
Framework for Identifying Common Aberrations in DNA Copy Number Data
High-resolution array comparative genomic hybridization
(aCGH) provides exon-level mapping of DNA aberrations in
cells or tissues. Such aberrations are central to
carcinogenesis and, in many cases, central to targeted
therapy of the cancers. Some of the aberrations are
sporadic, one-of-a-kind changes in particular tumor
samples; others occur frequently and reflect common themes
in cancer biology that have interpretable, causal
and Results. In this paper we present an efficient
computational framework for identification and statistical
characterization of genomic aberrations that are common to
multiple cancer samples in a CGH data set. We present and
compare three different algorithmic approaches within the
context of that framework. Finally, we apply our methods
to two datasets – a collection of 20 breast cancer samples
and a panel of 60 diverse human tumor cell lines (the
NCI-60). Those analyses identified both known and novel
common aberrations containing cancer related genes. The
potential impact of the analytical methods is well
demonstrated by new insights into the patterns of deletion
of CDKN2A (p16), a tumor suppressor gene crucial for the
genesis of many types of cancer.
|Eran Eden, Doron Lipson, Sivan Yogev,
Discovering Motifs in Ranked Lists of DNA Sequences
Computational methods for discovery of sequence elements that are enriched in a target set compared to a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP-chip (Chromatin Immuno-Precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (Discovery of Rank Imbalanced Motifs), which identifies sequence motifs in lists of ranked DNA sequences.
DRIM to ChIP-chip and CpG methylation data and
obtained the following results: (i) Identification
DRIM to ChIP-chip and CpG methylation data and
obtained the following results: (i) Identification
demonstrate that the statistical framework embodied in
the DRIM software tool is highly effective for
identifying regulatory sequence elements in a variety
of applications ranging from expression and ChIP-chip
to CpG methylation data. DRIM is publicly available
Overall, we demonstrate that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications ranging from expression and ChIP-chip to CpG methylation data. DRIM is publicly available at: http://bioinfo.cs.technion.ac.il/drim.
|E. Dehan, A. Ben-Dor, W. Liao, D.
Lipson, H. Frimer, S. Rienstein, D. Simansky, M. Krupsky,
P. Yaron, E. Friedman, G. Rechavi, M. Perlman, A.
Aviram-Goldring, S. Izraeli, M. Bittner, Z. Yakhini, N.
Chromosomal aberrations and gene expression profiles in non-small cell lung cancer
genomic content and changes in gene expression levels
are central characteristics of tumors and pivotal to
the tumorigenic process. We analyzed 23 non-small cell
lung cancer (NSCLC) tumors by array comparative
genomic hybridization (array CGH). Aberrant regions
identified included well-characterized chromosomal
aberrations such as amplifications of 3q and 8q and
deletions of 3p21.31. Less frequently identified
aberrations such as amplifications of 7q22.3-31.31 and
12p11.23-13.2, and previously unidentified aberrations
such as deletion of 11q12.3-13.3 were also detected.
To enhance our ability to identify key acting genes
residing in these regions, we combined array CGH
results with gene expression profiling performed on
the same tumor samples. We identified a set of genes
with concordant changes in DNA copy number and
expression levels, i.e. overexpressed genes located in
amplified regions and underexpressed genes located in
deleted regions. This set included members of the
Wnt/beta-catenin pathway, genes involved in DNA
replication, and matrix metalloproteases (MMPs).
Functional enrichment analysis of the genes both
overexpressed and amplified revealed a significant
enrichment for DNA replication and repair, and
extracellular matrix component gene ontology
annotations. We verified the changes in expressions of
MCM2, MCM6, RUVBL1, MMP1, MMP12 by real-time
quantitative PCR. Our results provide a high
resolution map of copy number changes in non-small
cell lung cancer. The joint analysis of array CGH and
gene expression analysis highlights genes with
concordant changes in expression and copy number that
may be critical to lung cancer development and
Alterations in genomic content and changes in gene expression levels are central characteristics of tumors and pivotal to the tumorigenic process. We analyzed 23 non-small cell lung cancer (NSCLC) tumors by array comparative genomic hybridization (array CGH). Aberrant regions identified included well-characterized chromosomal aberrations such as amplifications of 3q and 8q and deletions of 3p21.31. Less frequently identified aberrations such as amplifications of 7q22.3-31.31 and 12p11.23-13.2, and previously unidentified aberrations such as deletion of 11q12.3-13.3 were also detected. To enhance our ability to identify key acting genes residing in these regions, we combined array CGH results with gene expression profiling performed on the same tumor samples. We identified a set of genes with concordant changes in DNA copy number and expression levels, i.e. overexpressed genes located in amplified regions and underexpressed genes located in deleted regions. This set included members of the Wnt/beta-catenin pathway, genes involved in DNA replication, and matrix metalloproteases (MMPs). Functional enrichment analysis of the genes both overexpressed and amplified revealed a significant enrichment for DNA replication and repair, and extracellular matrix component gene ontology annotations. We verified the changes in expressions of MCM2, MCM6, RUVBL1, MMP1, MMP12 by real-time quantitative PCR. Our results provide a high resolution map of copy number changes in non-small cell lung cancer. The joint analysis of array CGH and gene expression analysis highlights genes with concordant changes in expression and copy number that may be critical to lung cancer development and progression.
|Doron Lipson, Zohar Yakhini, Yonatan
Optimization of Probe Coverage for High-Resolution Oligonucleotide aCGH
Motivation. The resolution at which genomic alterations can be mapped by means of oligonucleotide aCGH (array-based Comparative Genomic Hybridization) is limited by two factors: the availability of high-quality probes for the the target genomic sequence and the array real-estate. Optimization of the probe selection process is required for arrays that are designed to probe specific genomic regions in very high resolution without compromising probe quality constraints.
Results. In this paper we describe a well-defined optimization problem associated with the problem of probe selection for high-resolution aCGH arrays. We propose the whenever possible e-cover as a formulation that faithfully captures the requirement of probe selection problem, and provide a fast randomized algorithm that solves the optimization problem in O(n log n) time, as well as a deterministic algorithm with the same asymptotic performance. We apply the method in a typical high-definition array design scenario and demonstrate its superiority with respect to alternative approaches..
Availability. Address requests to
| Ilya Baskin,
Stav Zaitsev, Doron Lipson, Rachel Gilad, Kinneret
Keren, Gidi Ben-Yoseph, Uri Sivan
A Molecular Shift Register and its Utilization as an Autonomous DNA Synthesizer
We demonstrate a novel algorithmic approach to autonomous synthesis of fairly long DNA molecules with well-defined, non-recurring sequences. The scheme exploits chemical embodiment of shift registers to execute algorithms similar to those used to generate pseudo-random numbers on a computer. A collection of single stranded “rule” DNA molecules is added to a tube together with a single stranded “seed” molecule and polymerase. The “rule” molecules guide seed elongation according to the algorithm to produce the desired DNA molecule. The synthesis effort is exponentially small compared with all present strategies. The reduced effort is facilitated by the sliding reading frame of shift registers, namely, the utilization of a previously synthesized sequence for the synthesis of the next bases. A redundancy based error reduction scheme, similar to those used in communication, is utilized to systematically suppress synthesis errors.
|Doron Lipson, Yonatan Aumann, Amir
Ben-Dor, Nathan Linial, Zohar Yakhini
Efficient Calculation of Interval Scores for DNA Copy Number Data Analysis
DNA amplifications and deletions characterize cancer genome and are
often related to disease evolution. Microarray based
techniques for measuring these DNA copy-number changes
use fluorescence ratios at arrayed DNA elements (BACs,
cDNA or oligonucleotides) to provide signals at high
resolution, in terms of genomic locations. These data
are then further analyzed to map aberrations and
boundaries and identify biologically significant
Methods. We develop a
statistical framework that enables the casting of
several DNA copy number data analysis questions as
optimization problems over real valued vectors of
signals. The simplest form of the optimization problem
seeks to maximize over
all subintervals I in the input vector. We
present and prove a linear time approximation scheme for
this problem. Namely, a process with time complexity
outputs an interval for which is at least where Opt is the actual optimum and as
. We further develop practical implementations that
improve the performance of the naive quadratic approach
by orders of magnitude. We discuss properties of optimal
intervals and how they apply to the algorithm
Examples. We benchmark our
algorithms on synthetic as well as publicly available
DNA copy number data. We demonstrate the use of these
methods for identifying aberrations in single samples as
well as common alterations in fixed sets and subsets of
breast cancer samples.
Journal of Computational Biology, Vol. 13, No. 2:
|Michael T. Barrett, Alicia Scheffer,
Amir Ben-Dor, Nick Sampas, Doron Lipson, Robert Kincaid,
Peter Tsang, Bo Curry, Kristin Baird, Paul S. Meltzer,
Zohar Yakhini, Laurakay Bruhn, Stephen Laderman
Comparative Genomic Hybridization using Oligonucleotide Microarrays and Total Genomic DNA
Array-based comparative genomic hybridization (CGH) measures copy-number variations at multiple loci simultaneously, providing an important tool for studying cancer and developmental disorders and for developing diagnostic and therapeutic targets. Arrays for CGH based on PCR products representing assemblies of BAC or cDNA clones typically require maintenance, propagation, replication, and verification of large clone sets. Furthermore, it is difficult to control the specificity of the hybridization to the complex sequences that are present in each feature of such arrays. To develop a more robust and flexible platform, we created probe-design methods and assay protocols that make oligonucleotide microarrays synthesized in situ by inkjet technology compatible with array-based comparative genomic hybridization applications employing samples of total genomic DNA. Hybridization of a series of cell lines with variable numbers of X chromosomes to arrays designed for CGH measurements gave median ratios for X-chromosome probes within 6%of the theoretical values (0.5 for XY/XX, 1.0 for XX/XX, 1.4 for XXX/XX, 2.1 for XXXX/XX, and 2.6 for XXXXX/XX). Furthermore, these arrays detected and mapped regions of single-copy losses, homozygous deletions, and amplicons of various sizes in different model systems, including diploid cells with a chromosomal breakpoint that has been mapped and sequenced to a precise nucleotide and tumor cell lines with highly variable regions of gains and losses. Our results demonstrate that oligonucleotide arrays designed for CGH provide a robust and precise platform for detecting chromosomal alterations throughout a genome with high sensitivity even when using full-complexity genomic samples.
|Doron Lipson, Amir Ben-Dor, Elinor
Dehan, Zohar Yakhini
Joint Analysis of DNA Copy Numbers and Gene Expression Levels
Genomic instabilities, amplifications, deletions and translocations are often observed in tumor cells. In the process of cancer pathogenesis cells acquire multiple genomic alterations, some of which drive the process by triggering overexpression of oncogenes and by silencing tumor suppressors and DNA repair genes. We present data analysis methods designed to study the overall transcriptional effects of DNA copy number alterations. Alterations can be measured using several techniques including microarray based hybridization assays. The data have unique properties due to the strong dependence between measurement values in close genomic loci. To account for this dependence in studying the correlation of DNA copy number to expression levels we develop versions of standard correlation methods that apply to genomic regions and methods for assessing the statistical significance of the observed results. In joint DNA copy number and expression data we define significantly altered submatrices as submatrices where a statistically significant correlation of DNA copy number to expression is observed. We develop heuristic approaches to identify these structures in data matrices. We apply all methods to several datasets, highlighting results that can not be obtained by direct approaches or without using the regional view.
|Doron Lipson, Peter Webb, Zohar
Designing Specific Oligonucleotide Probes for the Entire S. cerevisiae Transcriptome
plays a central role in designing accurate microarray
hybridization assays. Current literature on specific probe
design studies algorithmic approaches and their
relationship with hybridization thermodynamics. In this
work we address probe specificity properties under a
stochastic model assumption and compare the results to
actual behavior in genomic data. We develop efficient
specificity search algorithms. Our methods incorporate
existing transcript expression level data and handle a
variety of cross-hybridization models. We analyze the
performance of our methods. Applying our algorithm to the
entire S. cerevisiae transcriptome we provide
probe specificity maps for all yeast ORFs that may be used
as the basis for selection of sensitive probes.
| Brian F.
Volkman, Doron Lipson, David E. Wemmer, Dorothee Kern
Two-State Allosteric Behavior in a Single-Domain Signaling Protein
Protein actions are usually
discussed in terms of static structures, but function
requires motion. We find a strong correlation between
phosphorylation-driven activation of the signaling protein
NtrC and microsecond time-scale backbone dynamics. Using
nuclear magnetic resonance relaxation, we characterized
the motions of NtrC in three functional states:
unphosphorylated (inactive), phosphorylated (active), and
a partially active mutant. These dynamics are indicative
of exchange between inactive and active conformations.
Both states are populated in unphosphorylated NtrC, and
phosphorylation shifts the equilibrium toward the active
species. These results support a dynamic population shift
between two preexisting conformations as the underlying
mechanism of activation.
| Nicola J. Turton, David J. Judah,
Joan Riley, Reginald Davies, Doron Lipson, Jerry A.
Styles, Andrew G. Smith, Timothy W. Gant
Gene expression and amplification in breast carcinoma cells with intrinsic and acquired doxorubicin resistance
resistance (MDR) phenotype is a major cause of cancer
treatment failure. Here the expressions of 4224 genes were
analysed for association with intrinsic or acquired
doxorubicin (DOX) resistance. A cluster of overexpressed
genes related to DOX resistance was observed. Included in
this cluster was ABCB1 the P-glycoprotein transporter
protein gene and MMP1 (Matrix Metalloproteinase 1),
indicative of the invasive nature of resistant cells, and
the oxytocin receptor (OXTR), a potential new therapeutic
target. Overexpression of genes associated with xenobiotic
transformation, cell transformation, cell signalling and
lymphocyte activation was also associated with DOX
resistance as was estrogen receptor negativity. In all
carcinoma cells, compared with HBL100 a putatively normal
breast epithelial cell line, a cluster of overexpressed
genes was identified which included several keratins, in
particular keratins 8 and 18 which are regulated through
the ras signalling pathway. Analysis of genomic
amplifications and deletions revealed specific genetic
alterations common to both intrinsic and acquired DOX
resistance including ABCB1, PGY3 (ABCB4) and BAK. The
findings shown here indicate new possibilities for the
diagnosis of DOX resistance using gene expression, and
potential novel therapeutic targets for pharmacological
Soen, Netta Cohen, Doron Lipson, and Erez Braun
Emergence of spontaneous rhythm disorders in self-assembled networks of heart cells
A non-invasive optical recording technique is introduced to study the spontaneous contractile activity in self-assembled heart cell networks. Continuous monitoring throughout the lifetime of the network reveals spontaneous appearance of various rhythm disorders. Analysis of two typical patterns, namely subharmonic structures and sudden alternations between two dominant rates, indicates the emergence of intrinsic pacemakers. A model of one or two slightly variable nonlinear oscillators, acting on an excitable element is shown to reproduce the main experimental results.
|Back to Doron's homepage|