Technical Report PHD-2021-02

Title: Synthetic DNA libraries and their applications in data storage and biological assays
Authors: Leon Anavy
Supervisors: Zohar Yakhini
PDFCurrently accessibly only within the Technion network
Abstract: Synthetic biology, which is based on utilizing approaches from engineering to study and manipulate biological systems, lies in the meeting point of biology, biotechnology, computer science and other fields. Interdisciplinary and collaborative work in the field produced scientific breakthroughs including design and production of de-novo biological systems. Examples include systems for mass production of chemical compounds, high throughput assays, versatile sensors and biological computing devices.

Recently, two major developments had dramatically influenced the field. High throughput DNA synthesis technology enables the production of synthetic Oligo Libraries (OLs) used for various purposes. CRISPR genome editing, a Nobel winning discovery, has the potential to revolutionize genetic engineering and the treatment of genetic disorders. The work presented in this thesis is closely related to both CRISPR genome editing and synthetic OLs. I present a detailed descriptions of four completed research projects for which I was either a lead author or a co-advisor.

In one project we developed a novel coding scheme that utilizes composite DNA letters to increase the logical density of DNA based storage. We analyzed the theoretical properties of the coding scheme, and performed large scale molecular implementation to demonstrate its feasibility and explore its limitations and practical properties. In this implementation we obtained a 25% increase in logical density, as compared to state of the art systems. We investigated the potential effect of composite DNA on the cost of DNA based storage systems using an analytical cost model and simulations.

In a collaboration with the Amit Lab from the Technion, we performed a systematic exploration of the gene regulatory elements of E. coli. Using OL based high throughput assays and advanced statistical models we identified sequence variants that control a novel gene silencing mechanism.

In a collaboration with Eitan Yaakobi from the Technion we developed SOLQC, a software tool for quality control of OLs. SOLQC can be integrated in analysis pipelines of OL based projects and highlights OL error patterns, thus enabling troubleshooting and optimization.

In a collaboration with the Handel Lab from BIU we developed CRISPECTOR, a tool for high sensitivity assessment of off-target CRISPR editing activity. CRISPECTOR uses machine learning to analyze NGS data and quantify editing activity. CRISPECTOR is especially useful in off-target sites with very low, but statistically significant, editing activity rates. It is also the first tool that supports the detection of translocation events, from a multiplex PCR assay. All these research directions are likely to continue and produce more findings by using a combination of synthetic biology, algorithmics and statistics. In particular, cutting edge data science and machine learning will continue to be incorporated and in high throughput complex biological studies using synthetic DNA.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the PHD technical reports of 2021
To the main CS technical reports page

Computer science department, Technion