Projects on DNA Problems
DNA-based storage has attracted significant attention due to recent demonstrations of the viability of storing information in macromolecules. Unlike classical optical and magnetic storage technologies, DNA-based storage does not require electrical supply to maintain data integrity and given the trends in cost decreases of DNA synthesis and sequencing, it is now acknowledged that within the next 10-15 years DNA storage may become a highly competitive archiving technology.
A DNA storage system consists of three important entities. The first is a DNA synthesizer that produces the strands that encode the data to be stored in DNA. To produce strands with acceptable error rate, the length of the strands is typically limited to no more than 250 nucleotides. The second part is a storage container with compartments that stores the DNA strands, however unordered. Lastly, a DNA sequencer reads back the strands and transfers them back to digital data. The encoding and decoding stages are two external processes to the storage systems which convert the binary user data into strands of DNA in such a way that even in the presence of errors (the nucleotides in red in Fig. 1), it will be possible to revert to the original binary data of the user. DNA as a storage system has several attributes which distinguish it from any other storage system. The most outstanding one is that the strands are not ordered in the memory and thus it is not possible to know the order in which they were stored. Usually, this constraint can be overcome by using block addresses, also called indices, that are stored as part of the strand. Errors in DNA are typically substitutions, insertions, and deletions, where most published studies report that either substitutions or deletions are the most prominent ones, depending upon the specific technology for synthesis and sequencing. For example, in column-based DNA oligo synthesis the dominant errors are deletions that result from either failure to remove the dimethoxy trityl (DMT) or combined inefficiencies in the coupling and capping steps. For sequencing, a recent study supports this behavior which becomes more severe if the GC content is highly biased.
The goal of the proposed projects is the design of solutions for several problems related to the current and future methods to store data in DNA strands. These methods include synthesis and sequencing technologies that are used to write and read information to the DNA.
15 projects are offered. Their description can be found here: