Doron Lipson's Abstracts

N. Peled, G. Palmer, F. Hirsch, M.W. Wynes, M. Ilouze, M. Varella-Garcia, L. Soussan-Gutman, G.A. Otto, P.J. Stephens, J. Ross, M.T. Cronin, D. Lipson, and V.A. Miller
Next-Generation Sequencing Identifies and Immunohistochemistry Confirms a Novel Crizotinib-Sensitive ALK Rearrangement in a Patient with Metastatic Non–Small-Cell Lung Cancer

The novel rearrangement identified in this case is complex and was not detected by the Break-Apart FISH assay. NGS showed that the patient’s tumor harbored a complex EML4-ALK rearrangement at the genomic level. Clinical and radiographic evidence confirmed a rapid response to crizotinib. NGS should be considered in NSCLC patients with high likelihood of a driver kinase alteration when none is identified by other methods.

Journal of Thoracic Oncology, 7(9):e14–e16, 2012.   [pdf]   
D. Lipson*, M. Capelletti*, R. Yelensky, G. Otto, A. Parker, M. Jarosz, J.A. Curran, S. Balasubramanian, T. Bloom, K.W. Brennan, A. Donahue, S.R. Downing, G.M. Frampton, L. Garcia, F. Juhn, K.C. Mitchell, E. White, J. White, Z. Zwirko, T. Peretz, H. Nechushtan, L. Soussan-Gutman, J. Kim, H. Sasaki, H.R. Kim, S. Park, D. Ercan, C.E. Sheehan, J.S. Ross, M.T. Cronin, P.A. Jänne and P.J. Stephens
Identification of new ALK and RET gene fusions from colorectal and lung cancer biopsies

Applying a next-generation sequencing assay targeting 145 cancer-relevant genes in 40 colorectal cancer and 24 non–small cell lung cancer formalin-fixed paraffin-embedded tissue specimens identified at least one clinically relevant genomic alteration in 59% of the samples and revealed two gene fusions, C2orf44-ALK in a colorectal cancer sample and KIF5B-RET in a lung adenocarcinoma. Further screening of 561 lung adenocarcinomas identified 11 additional tumors with KIF5B-RET gene fusions (2.0%; 95% CI 0.8–3.1%). Cells expressing oncogenic KIF5B-RET are sensitive to multi-kinase inhibitors that inhibit RET.

Nature Medicine 18, 382–384 (2012).   [pdf] 
J.F. Thompson*, J.G. Reifenberger*, E. Giladi, K. Kerouac, J. Gill, E. Hansen, A. Kahvejian, P. Kapranov, T. Knope, D. Lipson, K.E. Steinmann and P.M. Milos
Single-Step Capture and Sequencing of Natural DNA for Detection of BRCA1 Mutations

Genetic testing for disease risk is an increasingly important component of medical care. However, testing can be expensive which can lead to patients and physicians having limited access to the genetic information needed for medical decisions. To simplify DNA sample preparation and lower costs, we have developed a system in which any gene can be captured and sequenced directly from human genomic DNA without amplification, using no proteins or enzymes prior to sequencing. Extracted whole-genome DNA is acoustically sheared and loaded in a flow cell channel for single-molecule sequencing. Gene isolation, amplification, or ligation is not necessary. Accurate and low cost detection of DNA sequence variants is demonstrated for the BRCA1 gene. Disease-causing mutations as well as common variants from well-characterized samples are identified. Single-molecule sequencing generates very reproducible coverage patterns and these can be used to detect any size insertion or deletion directly, unlike PCR-based methods which require additional assays. Because no gene isolation or amplification is required for sequencing, the exceptionally low costs of sample preparation and analysis could make genetic tests more accessible to those who wish to know their own disease susceptibility. Additionally, this approach has applications for sequencing integration sites for gene therapy vectors, transposons, retroviruses, and other mobile DNA elements in a more facile manner than possible with other methods.

Genome Research, 22:340-345, 2012.   [pdf]
L.T. Sam, D. Lipson, T. Raz, X. Cao, J.F. Thompson, P.M. Milos, D. Robinson, A.M. Chinnaiyan, C. Kumar-Sinha, C.A. Maher
A Comparison of Single Molecule and Amplification Based Sequencing of Cancer Transcriptomes

The second wave of next generation sequencing technologies, referred to as single-molecule sequencing (SMS), carries the promise of profiling samples directly without employing polymerase chain reaction steps used by amplification-based sequencing (AS) methods. To examine the merits of both technologies, we examine mRNA sequencing results from single-molecule and amplification-based sequencing in a set of human cancer cell lines and tissues. We observe a characteristic coverage bias towards high abundance transcripts in amplification-based sequencing. A larger fraction of AS reads cover highly expressed genes, such as those associated with translational processes and housekeeping genes, resulting in relatively lower coverage of genes at low and mid-level abundance. In contrast, the coverage of high abundance transcripts plateaus off using SMS. Consequently, SMS is able to sequence lower- abundance transcripts more thoroughly, including some that are undetected by AS methods; however, these include many more mapping artifacts. A better understanding of the technical and analytical factors introducing platform specific biases in high throughput transcriptome sequencing applications will be critical in cross platform meta-analytic studies.

PLoS One, 6(3):e17305, 2011.   [pdf]
T. Raz, P. Kapranov, D. Lipson, S. Letovsky, P.M. Milos, J.F. Thompson
Protocol Dependence of Sequencing-based Gene Expression Measurements

RNA Seq provides unparalleled levels of information about the transcriptome including precise expression levels over a wide dynamic range. It is essential to understand how technical variation impacts the quality and interpretability of results, how potential errors could be introduced by the protocol, how the source of RNA affects transcript detection, and how all of these variations can impact the conclusions drawn. Multiple human RNA samples were used to assess RNA fragmentation, RNA fractionation, cDNA synthesis, and single versus multiple tag counting. Though protocols employing polyA RNA selection generate the highest number of non-ribosomal reads and the most precise measurements for coding transcripts, such protocols were found to detect only a fraction of the non-ribosomal RNA in human cells. PolyA RNA excludes thousands of annotated and even more unannotated transcripts, resulting in an incomplete view of the transcriptome. Ribosomal-depleted RNA provides a more cost-effective method for generating complete transcriptome coverage. Expression measurements using single tag counting provided advantages for assessing gene expression and for detecting short RNAs relative to multi-read protocols. Detection of short RNAs was also hampered by RNA fragmentation. Thus, this work will help researchers choose from among a range of options when analyzing gene expression, each with its own advantages and disadvantages.

PLoS One, 6(5):e19287, 2011.   [pdf]
D.T. Ting*, D. Lipson*, S. Paul, B.W. Brannigan, S. Akhavanfard, E.J. Coffman, G. Contino, V. Deshpande, A.J. Iafrate, S. Letovsky, M.N. Rivera, N. Bardeesy, S. Maheswaran, D.A. Haber
Aberrant Overexpression of Satellite Repeats in Pancreatic and Other Epithelial Cancers

Satellite repeats in heterochromatin are transcribed into noncoding RNAs that have been linked to gene silencing and maintenance of chromosomal integrity. Using digital gene expression analysis, we show that these transcripts are greatly overexpressed in mouse and human epithelial cancers. In 8 of 10 mouse pancreatic ductal adenocarcinomas (PDAC), pericentromeric satellites accounted for a mean 12% (range 1 to 50%) of all cellular transcripts, a mean 40-fold increase over normal tissue. In 15/15 human PDACs, alpha satellite transcripts were most abundant and HSATII transcripts were highly specific for cancer. Similar patterns were observed in cancers of lung, kidney, ovary, colon, and prostate. Derepression of satellite transcripts correlated with overexpression of the LINE-1 retrotransposon and with aberrant expression of neuroendocrine-associated genes proximal to LINE-1 insertions. The overexpression of satellite transcripts in cancer may reflect global alterations in heterochromatin silencing and could potentially be useful as a biomarker for cancer detection.

Science, 331(6017):593-6, 2011.   [pdf]
P. Kapranov, F. Ozsolak , S.W. Kim, S. Foissac, D. Lipson, C. Hart, S. Roels, C. Borel, S.E. Antonarakis, A.P. Monaghan, B. John, P.M. Milos
New class of gene-termini-associated human RNAs suggests a novel RNA copying mechanism

Small (< 200 nucleotide) RNA (sRNA) profiling of human cells using various technologies demonstrates unexpected complexity of sRNAs with hundreds of thousands of sRNA species present. Genetic and in vitro studies show that these RNAs are not merely degradation products of longer transcripts but could indeed have a function. Furthermore, profiling of RNAs, including the sRNAs, can reveal not only novel transcripts, but also make clear predictions about the existence and properties of novel biochemical pathways operating in a cell. For example, sRNA profiling in human cells indicated the existence of an unknown capping mechanism operating on cleaved RNA2, a biochemical component of which was later identified. Here we show that human cells contain a novel type of sRNA that has non-genomically encoded 5' poly(U) tails. The presence of these RNAs at the termini of genes, specifically at the very 3' ends of known mRNAs, strongly argues for the presence of a yet uncharacterized endogenous biochemical pathway in cells that can copy RNA. We show that this pathway can operate on multiple genes, with specific enrichment towards transcript-encoding components of the translational machinery. Finally, we show that genes are also flanked by sense, 3' polyadenylated sRNAs that are likely to be capped.

Nature, 466(7306):642-6, 2010.   [pdf]
Eldar Giladi, John Healy, Gene Myers, Chris Hart, Phillip Kapranov, Doron Lipson, Steven Roels, Edward Thayer, Stan Letovsky
Error tolerant indexing and alignment of short reads with covering template families

The rapid adoption of high-throughput next generation sequence data in biological research is presenting a major challenge for sequence alignment tools - specifically, the efficient alignment of vast amounts of short reads to large references in the presence of differences arising from sequencing errors and biological sequence variations. To address this challenge, we developed a short read aligner for high-throughput sequencer data that is tolerant of errors or mutations of all types - namely, substitutions, deletions, and insertions. The aligner utilizes a multi-stage approach in which template-based indexing is used to identify candidate regions for alignment with dynamic programming. A template is a pair of gapped seeds, with one used with the read and one used with the reference. In this article, we focus on the development of template families that yield error-tolerant indexing up to a given error-budget. A general algorithm for finding those families is presented, and a recursive construction that creates families with higher error tolerance from ones with a lower error tolerance is developed.

J. Computational Biology, 17(10):1279-1293, 2010.   [pdf]
Y. Buganim, I. Goldstein, D. Lipson, M. Milyavsky, S. Polak-Charcon, C. Mardoukh, H. Solomon, E. Kalo, S. Madar, R. Brosh, M. Perelman, R. Navon, N. Goldfinger, I. Barshack, Z. Yakhini, V. Rotter
A Novel Translocation Breakpoint within the BPTF Gene Is Associated with a Pre-Malignant Phenotype

Partial gain of chromosome arm 17q is an abundant aberrancy in various cancer types such as lung and prostate cancer with a prominent occurrence and prognostic significance in neuroblastoma - one of the most common embryonic tumors. The specific genetic element/s in 17q responsible for the cancer-promoting effect of these aberrancies is yet to be defined although many genes located in 17q have been proposed to play a role in malignancy. We report here the characterization of a naturally-occurring, non-reciprocal translocation der(X)t(X;17) in human lung embryonal-derived cells following continuous culturing. This aberrancy was strongly correlated with an increased proliferative capacity and with an acquired ability to form colonies in vitro. The breakpoint region was mapped by fluorescence in situ hybridization (FISH) to the 17q24.3 locus. Further characterization by a custom-made comparative genome hybridization array (CGH) localized the breakpoint within the Bromodomain PHD finger Transcription Factor gene (BPTF), a gene involved in transcriptional regulation and chromatin remodeling. Interestingly, this translocation led to elevation in the mRNA levels of the endogenous BPTF. Knock-down of BPTF restricted proliferation suggesting a role for BPTF in promoting cellular growth. Furthermore, the BPTF chromosomal region was found to be amplified in various human tumors, especially in neuroblastomas and lung cancers in which 55% and 27% of the samples showed gain of 17q24.3, respectively. Additionally, 42% percent of the cancer cell lines comprising the NCI-60 had an abnormal BPTF locus copy number. We suggest that deregulation of BPTF resulting from the translocation may confer the cells with the observed cancer-promoting phenotype and that our cellular model can serve to establish causality between 17q aberrations and carcinogenesis.

PLoS ONE, 5(3): e9657, 2010   [pdf]
Doron Lipson*, Tal Raz*, Alix Kieu, Daniel R. Jones, Eldar Giladi, Edward Thayer, John F. Thompson, Stan Letovsky, Patrice Milos, Marie Causey
Quantification of the Yeast Transcriptome by Single-molecule Sequencing

We present single-molecule sequencing digital gene expression (smsDGE), a high-throughput, amplification-free method for accurate quantification of the full range of cellular polyadenylated RNA transcripts using a Helicos Genetic Analysis system. smsDGE involves a reverse-transcription and polyA-tailing sample preparation procedure followed by sequencing that generates a single read per transcript. We applied smsDGE to the transcriptome of Saccharomyces cerevisiae strain DBY746, using 6 of the available 50 channels in a single sequencing run, yielding on average 12 million aligned reads per channel. Using spiked-in RNA, accurate quantitative measurements were obtained over four orders of magnitude. High correlation was demonstrated across independent flow-cell channels, instrument runs and sample preparations. Transcript counting in smsDGE is highly efficient due to the representation of each transcript molecule by a single read. This efficiency, coupled with the high throughput enabled by the single-molecule sequencing platform, provides an alternative method for expression profiling.

Nature Biotechnology, 27(7):652-8, 2009   [pdf]
Jayson Bowers, Judith Mitchell, Eric Beer, Philip R Buzby, Marie Causey, J William Efcavitch, Mirna Jarosz, Edyta Krzymanska-Olejnik, Li Kung, Doron Lipson, Geoffrey M Lowman, Subramanian Marappan, Peter McInerney, Adam Platt, Atanu Roy, Suhaib M Siddiqi, Kathleen Steinmann and John F Thompson
Virtual terminator nucleotides for next-generation DNA sequencing

We synthesized reversible terminators with tethered inhibitors for next-generation sequencing. These were efficiently incorporated with high fidelity while preventing incorporation of additional nucleotides, and we used them to sequence canine bacterial artificial chromosomes in a single-molecule system that provided even coverage for over 99% of the region sequenced. This single-molecule approach generated high-quality sequence data without the need for target amplification and thus avoided concomitant biases.

Nature Methods, 6, 593-595, 2009  
Margaret Taub, Doron Lipson and Terence P. Speed
Methods for Allocating Ambiguous Short-reads

With the rise in prominence of biological research using new short-read DNA sequencing technologies comes the need for new techniques for aligning and assigning these reads to their genomic location of origin. Until now, methods for allocating reads which align with equal or similar fidelity to multiple genomic locations have not been model-based, and have tended to ignore potentially informative data. Here, we demonstrate that existing methods for assigning ambiguous reads can produce biased results. We then present new methods for allocating ambiguous reads to the genome, developed within a framework of statistical modeling, which show promise in alleviating these biases, both in simulated and real data.

Communications in Information and Systems, 10(2):69-82, 2010   [pdf]
Eran Eden, Roy Navon, Israel Steinfeld, Doron Lipson and Zohar Yakhini
GOrilla: a Tool for Discovery and Visualization of Enriched GO Terms in Ranked Gene Lists

Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database. In particular, a variety of tools that perform GO enrichment analysis are currently available. Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set. A few tools also exist that support analyzing ranked lists. The latter typically rely on simulations or on union-bound correction for assigning statistical significance to the results.
GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets. This is particularly useful in many typical cases where genomic data may be naturally represented as a ranked list of genes (e.g. by level of expression or of differential expression). GOrilla employs a flexible threshold statistical approach to discover GO terms that are significantly enriched at the top of a ranked gene list. Building on a complete theoretical characterization of the underlying distribution, called mHG, GOrilla computes an exact p-value for the observed enrichment, taking threshold multiple testing into account without the need for simulations. This enables rigorous statistical analysis of thousand of genes and thousands of GO terms in order of seconds. The output of the enrichment analysis is visualized as a hierarchical structure, providing a clear view of the relations between enriched GO terms.
GOrilla is an efficient GO analysis tool with unique features that make a useful addition to the existing repertoire of GO enrichment tools. GOrilla's unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation. GOrilla is publicly available at:

BMC Bioinformatics, 10(48) 2009.  [pdf]

S. Farkash-Amar, D. Lipson, A. Polten, A. Goren, C. Helmstetter, Z. Yakhini and I. Simon
Global Organization of Replication Time Zones of the Mouse Genome

The division of genomes into distinct replication time zones has long been established. However, an in-depth understanding of their organization and their relationship to transcription is incomplete. Taking advantage of a novel synchronization method ("baby machine") and of genomic DNA microarrays, we have, for the first time, mapped replication times of the entire mouse genome at a high temporal resolution. Our data revealed that although most of the genome has a distinct time of replication either early, middle, or late S phase, a significant portion of the genome is replicated asynchronously. Analysis of the replication map revealed the genomic scale organization of the replication time zones. We found that the genomic regions between early and late replication time zones often consist of extremely large replicons. Analysis of the relationship between replication and transcription revealed that early replication is frequently correlated with the transcription potential of a gene and not necessarily with its actual transcriptional activity. These findings, along with the strong conservation found between replication timing in human and mouse genomes, emphasize the importance of replication timing in transcription regulation.

Genome Research, 18(10):1562-70, 2008.  [pdf]

Amir Ben-Dor, Doron Lipson, Anya Tsalenko, Mark Reimers, Lars O. Baumbusch, Michael T. Barrett, John N. Weinstein, Anne-Lise Borresen-Dale, Zohar Yakhini
Framework for Identifying Common Aberrations in DNA Copy Number Data

Motivation. High-resolution array comparative genomic hybridization (aCGH) provides exon-level mapping of DNA aberrations in cells or tissues. Such aberrations are central to carcinogenesis and, in many cases, central to targeted therapy of the cancers. Some of the aberrations are sporadic, one-of-a-kind changes in particular tumor samples; others occur frequently and reflect common themes in cancer biology that have interpretable, causal ramifications.
Hence, the difficult task of identifying and mapping common, overlapping genomic aberrations (including amplifications and deletions) across a sample set is an important one; it can provide insight for the discovery of oncogenes, tumor suppressors, and the mechanisms by which they drive cancer development.

Methods and Results. In this paper we present an efficient computational framework for identification and statistical characterization of genomic aberrations that are common to multiple cancer samples in a CGH data set. We present and compare three different algorithmic approaches within the context of that framework. Finally, we apply our methods to two datasets – a collection of 20 breast cancer samples and a panel of 60 diverse human tumor cell lines (the NCI-60). Those analyses identified both known and novel common aberrations containing cancer related genes. The potential impact of the analytical methods is well demonstrated by new insights into the patterns of deletion of CDKN2A (p16), a tumor suppressor gene crucial for the genesis of many types of cancer.

Proceedings of RECOMB '07, LNCS 4453, 122-136, Springer, 2007.  [pdf]

Eran Eden, Doron Lipson, Sivan Yogev, Zohar Yakhini
Discovering Motifs in Ranked Lists of DNA Sequences
Computational methods for discovery of sequence elements that are enriched in a target set compared to a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP-chip (Chromatin Immuno-Precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (Discovery of Rank Imbalanced Motifs), which identifies sequence motifs in lists of ranked DNA sequences.

We applied DRIM to ChIP-chip and CpG methylation data and obtained the following results: (i) Identification
of 50 novel putative transcription factor (TF) binding sites in yeast ChIP-chip data. The biological function of some of them was further investigated and used in order to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked.

Overall, we demonstrate that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications ranging from expression and ChIP-chip to CpG methylation data. DRIM is publicly available at:

PLoS Computational Biology 3(3):e39, 2007. [pdf] [application]

E. Dehan, A. Ben-Dor, W. Liao, D. Lipson, H. Frimer, S. Rienstein, D. Simansky, M. Krupsky, P. Yaron, E. Friedman, G. Rechavi, M. Perlman, A. Aviram-Goldring, S. Izraeli, M. Bittner, Z. Yakhini, N. Kaminski
Chromosomal aberrations and gene expression profiles in non-small cell lung cancer

Alterations in genomic content and changes in gene expression levels are central characteristics of tumors and pivotal to the tumorigenic process. We analyzed 23 non-small cell lung cancer (NSCLC) tumors by array comparative genomic hybridization (array CGH). Aberrant regions identified included well-characterized chromosomal aberrations such as amplifications of 3q and 8q and deletions of 3p21.31. Less frequently identified aberrations such as amplifications of 7q22.3-31.31 and 12p11.23-13.2, and previously unidentified aberrations such as deletion of 11q12.3-13.3 were also detected. To enhance our ability to identify key acting genes residing in these regions, we combined array CGH results with gene expression profiling performed on the same tumor samples. We identified a set of genes with concordant changes in DNA copy number and expression levels, i.e. overexpressed genes located in amplified regions and underexpressed genes located in deleted regions. This set included members of the Wnt/beta-catenin pathway, genes involved in DNA replication, and matrix metalloproteases (MMPs). Functional enrichment analysis of the genes both overexpressed and amplified revealed a significant enrichment for DNA replication and repair, and extracellular matrix component gene ontology annotations. We verified the changes in expressions of MCM2, MCM6, RUVBL1, MMP1, MMP12 by real-time quantitative PCR. Our results provide a high resolution map of copy number changes in non-small cell lung cancer. The joint analysis of array CGH and gene expression analysis highlights genes with concordant changes in expression and copy number that may be critical to lung cancer development and progression.

 Lung Cancer, 56(2):175-184, 2007. [pdf]

Doron Lipson, Zohar Yakhini, Yonatan Aumann
Optimization of Probe Coverage for High-Resolution Oligonucleotide aCGH

Motivation. The resolution at which genomic alterations can be mapped by means of oligonucleotide aCGH (array-based Comparative Genomic Hybridization) is limited by two factors: the availability of high-quality probes for the the target genomic sequence and the array real-estate. Optimization of the probe selection process is required for arrays that are designed to probe specific genomic regions in very high resolution without compromising probe quality constraints.

Results. In this paper we describe a well-defined optimization problem associated with the problem of probe selection for high-resolution aCGH arrays. We propose the whenever possible e-cover as a formulation that faithfully captures the requirement of probe selection problem, and provide a fast randomized algorithm that solves the optimization problem in O(n log n) time, as well as a deterministic algorithm with the same asymptotic performance. We apply the method in a typical high-definition array design scenario and demonstrate its superiority with respect to alternative approaches..

Availability. Address requests to the authors.

Bioinformatics 23(2):e77-e83, 2007. [pdf]

Ilya Baskin, Stav Zaitsev, Doron Lipson, Rachel Gilad, Kinneret Keren, Gidi Ben-Yoseph, Uri Sivan
A Molecular Shift Register and its Utilization as an Autonomous DNA Synthesizer

We demonstrate a novel algorithmic approach to autonomous synthesis of fairly long DNA molecules with well-defined, non-recurring sequences. The scheme exploits chemical embodiment of shift registers to execute algorithms similar to those used to generate pseudo-random numbers on a computer. A collection of single stranded “rule” DNA molecules is added to a tube together with a single stranded “seed” molecule and polymerase. The “rule” molecules guide seed elongation according to the algorithm to produce the desired DNA molecule. The synthesis effort is exponentially small compared with all present strategies. The reduced effort is facilitated by the sliding reading frame of shift registers, namely, the utilization of a previously synthesized sequence for the synthesis of the next bases. A redundancy based error reduction scheme, similar to those used in communication, is utilized to systematically suppress synthesis errors.

 Physical Review Letters 97, 208103, 2006. [pdf

Doron Lipson, Yonatan Aumann, Amir Ben-Dor, Nathan Linial, Zohar Yakhini
Efficient Calculation of Interval Scores for DNA Copy Number Data Analysis

Background. DNA amplifications and deletions characterize cancer genome and are often related to disease evolution. Microarray based techniques for measuring these DNA copy-number changes use fluorescence ratios at arrayed DNA elements (BACs, cDNA or oligonucleotides) to provide signals at high resolution, in terms of genomic locations. These data are then further analyzed to map aberrations and boundaries and identify biologically significant structures

Methods. We develop a statistical framework that enables the casting of several DNA copy number data analysis questions as optimization problems over real valued vectors of signals. The simplest form of the optimization problem seeks to maximize  over all subintervals I in the input vector. We present and prove a linear time approximation scheme for this problem. Namely, a process with time complexity  that outputs an interval for which is at least where Opt is the actual optimum and  as . We further develop practical implementations that improve the performance of the naive quadratic approach by orders of magnitude. We discuss properties of optimal intervals and how they apply to the algorithm performance.

Examples. We benchmark our algorithms on synthetic as well as publicly available DNA copy number data. We demonstrate the use of these methods for identifying aberrations in single samples as well as common alterations in fixed sets and subsets of breast cancer samples.

Proceedings of RECOMB '05, LNCS 3500, p. 83, Springer-Verlag, 2005. [pdf] [poster] [application]

Journal of Computational Biology, Vol. 13, No. 2: 215-228, 2006.

Michael T. Barrett, Alicia Scheffer, Amir Ben-Dor, Nick Sampas, Doron Lipson, Robert Kincaid, Peter Tsang, Bo Curry, Kristin Baird, Paul S. Meltzer, Zohar Yakhini, Laurakay Bruhn, Stephen Laderman
Comparative Genomic Hybridization using Oligonucleotide Microarrays and Total Genomic DNA

Array-based comparative genomic hybridization (CGH) measures copy-number variations at multiple loci simultaneously, providing an important tool for studying cancer and developmental disorders and for developing diagnostic and therapeutic targets. Arrays for CGH based on PCR products representing assemblies of BAC or cDNA clones typically require maintenance, propagation, replication, and verification of large clone sets. Furthermore, it is difficult to control the specificity of the hybridization to the complex sequences that are present in each feature of such arrays. To develop a more robust and flexible platform, we created probe-design methods and assay protocols that make oligonucleotide microarrays synthesized in situ by inkjet technology compatible with array-based comparative genomic hybridization applications employing samples of total genomic DNA. Hybridization of a series of cell lines with variable numbers of X chromosomes to arrays designed for CGH measurements gave median ratios for X-chromosome probes within 6%of the theoretical values (0.5 for XY/XX, 1.0 for XX/XX, 1.4 for XXX/XX, 2.1 for XXXX/XX, and 2.6 for XXXXX/XX). Furthermore, these arrays detected and mapped regions of single-copy losses, homozygous deletions, and amplicons of various sizes in different model systems, including diploid cells with a chromosomal breakpoint that has been mapped and sequenced to a precise nucleotide and tumor cell lines with highly variable regions of gains and losses. Our results demonstrate that oligonucleotide arrays designed for CGH provide a robust and precise platform for detecting chromosomal alterations throughout a genome with high sensitivity even when using full-complexity genomic samples.

PNAS Vol. 101, No. 51, 21 December 2004, pp. 17765-17770. [Supporting Information] [pdf


Doron Lipson, Amir Ben-Dor, Elinor Dehan, Zohar Yakhini
Joint Analysis of DNA Copy Numbers and Gene Expression Levels

Genomic instabilities, amplifications, deletions and translocations are often observed in tumor cells. In the process of cancer pathogenesis cells acquire multiple genomic alterations, some of which drive the process by triggering overexpression of oncogenes and by silencing tumor suppressors and DNA repair genes. We present data analysis methods designed to study the overall transcriptional effects of DNA copy number alterations. Alterations can be measured using several techniques including microarray based hybridization assays. The data have unique properties due to the strong dependence between measurement values in close genomic loci. To account for this dependence in studying the correlation of DNA copy number to expression levels we develop versions of standard correlation methods that apply to genomic regions and methods for assessing the statistical significance of the observed results. In joint DNA copy number and expression data we define significantly altered submatrices as submatrices where a statistically significant correlation of DNA copy number to expression is observed. We develop heuristic approaches to identify these structures in data matrices. We apply all methods to several datasets, highlighting results that can not be obtained by direct approaches or without using the regional view.

Proceedings of WABI 2004, LNCS 3240, p. 135, Springer-Verlag, 2004. [Web Supplement] [pdf


Doron Lipson, Peter Webb, Zohar Yakhini
Designing Specific Oligonucleotide Probes for the Entire S. cerevisiae Transcriptome

Probe specificity plays a central role in designing accurate microarray hybridization assays. Current literature on specific probe design studies algorithmic approaches and their relationship with hybridization thermodynamics. In this work we address probe specificity properties under a stochastic model assumption and compare the results to actual behavior in genomic data. We develop efficient specificity search algorithms. Our methods incorporate existing transcript expression level data and handle a variety of cross-hybridization models. We analyze the performance of our methods. Applying our algorithm to the entire S. cerevisiae transcriptome we provide probe specificity maps for all yeast ORFs that may be used as the basis for selection of sensitive probes.

Proceedings of WABI 2002, LNCS 2452, pp. 491-505, Springer-Verlag, 2002. [pdf]


Brian F. Volkman, Doron Lipson, David E. Wemmer, Dorothee Kern
Two-State Allosteric Behavior in a Single-Domain Signaling Protein

Protein actions are usually discussed in terms of static structures, but function requires motion. We find a strong correlation between phosphorylation-driven activation of the signaling protein NtrC and microsecond time-scale backbone dynamics. Using nuclear magnetic resonance relaxation, we characterized the motions of NtrC in three functional states: unphosphorylated (inactive), phosphorylated (active), and a partially active mutant. These dynamics are indicative of exchange between inactive and active conformations. Both states are populated in unphosphorylated NtrC, and phosphorylation shifts the equilibrium toward the active species. These results support a dynamic population shift between two preexisting conformations as the underlying mechanism of activation.

Science Vol. 291, 23 March 2001, pp. 2429-2433. [pdf]


Nicola J. Turton, David J. Judah, Joan Riley, Reginald Davies, Doron Lipson, Jerry A. Styles, Andrew G. Smith, Timothy W. Gant
Gene expression and amplification in breast carcinoma cells with intrinsic and acquired doxorubicin resistance

The multidrug resistance (MDR) phenotype is a major cause of cancer treatment failure. Here the expressions of 4224 genes were analysed for association with intrinsic or acquired doxorubicin (DOX) resistance. A cluster of overexpressed genes related to DOX resistance was observed. Included in this cluster was ABCB1 the P-glycoprotein transporter protein gene and MMP1 (Matrix Metalloproteinase 1), indicative of the invasive nature of resistant cells, and the oxytocin receptor (OXTR), a potential new therapeutic target. Overexpression of genes associated with xenobiotic transformation, cell transformation, cell signalling and lymphocyte activation was also associated with DOX resistance as was estrogen receptor negativity. In all carcinoma cells, compared with HBL100 a putatively normal breast epithelial cell line, a cluster of overexpressed genes was identified which included several keratins, in particular keratins 8 and 18 which are regulated through the ras signalling pathway. Analysis of genomic amplifications and deletions revealed specific genetic alterations common to both intrinsic and acquired DOX resistance including ABCB1, PGY3 (ABCB4) and BAK. The findings shown here indicate new possibilities for the diagnosis of DOX resistance using gene expression, and potential novel therapeutic targets for pharmacological intervention.

Oncogene, March 2001, Vol. 20, No. 11 pp.1300-1306. [pdf]


Yoav Soen, Netta Cohen, Doron Lipson, and Erez Braun
Emergence of spontaneous rhythm disorders in self-assembled networks of heart cells

A non-invasive optical recording technique is introduced to study the spontaneous contractile activity in self-assembled heart cell networks. Continuous monitoring throughout the lifetime of the network reveals spontaneous appearance of various rhythm disorders. Analysis of two typical patterns, namely subharmonic structures and sudden alternations between two dominant rates, indicates the emergence of intrinsic pacemakers. A model of one or two slightly variable nonlinear oscillators, acting on an excitable element is shown to reproduce the main experimental results.

Physical Review Letters, Vol. 82, Num. 17 , 26 April 1999, pp. 3556-3559. [pdf]


Back to Doron's homepage