Pantheon DNA Data Storage CODEC: Experiences, Challenges, and Innovations

Mon Sep 18 | 2:30pm
Location:
Salon IV
Abstract

There are several well-known advantages of using synthetic DNA for cold-data storage, such as higher density, reduced energy consumption, and durability compared with the standard storage mediums used for the same purpose. The enablement of this technology in the market involves the development of cost-effective DNA synthesizers that can write the data at an appropriate throughput speed and a CODEC able to handle data from different synthesis and sequencing technologies. In the last two years, the Prometheus project, a partnership between Lenovo and IPT Institute in Brazil, has significantly progressed in developing DNA writing machines and a versatile CODEC. This presentation offers a comprehensive overview of the DNA data storage pipeline, providing real-world experiences from data encoding to storage and retrieval. Our primary goal is to provide the audience with valuable insights and practical knowledge regarding coding and decoding techniques, specifically emphasizing our designed error correction architecture. The CODEC developed includes not only the standard methods for storage systems, such as encoding and decoding algorithms, addressing, and error correction coding, but also comprises the application of standard techniques in the bioinformatics field known as sanitizing process, such as the removal of low-quality reads, adapter removal and filter for contaminants, followed by alignment, and clustering of sequenced reads. The last released version of the Lenovo DNA Data Storage CODEC, named Pantheon, is already applying the Sector scheme proposed by the SNIA DNA Archive Rosetta Stone (DARS) technical working group. Exciting results from experiments with this CODEC will be demonstrated and discussed. Finally, our presentation will inspire participants and provide a comprehensive overview of the complexity of implementing coding and adaptative decoding techniques for a functional DNA data storage system, including practical considerations, potential roadblocks, and viable solutions, drawing from our real-world experiences.

Learning Objectives

  • Lenovo DNA Data Storage CODEC advances
  • Tests with the TWG DNA Archive Rosetta Stone (DARS) Sector Scheme
  • Challenges and new proposals in processing DNA-sequenced reads using state-of-the-art computational methods

Abstract

There are several well-known advantages of using synthetic DNA for cold-data storage, such as higher density, reduced energy consumption, and durability compared with the standard storage mediums used for the same purpose. The enablement of this technology in the market involves the development of cost-effective DNA synthesizers that can write the data at an appropriate throughput speed and a CODEC able to handle data from different synthesis and sequencing technologies. In the last two years, the Prometheus project, a partnership between Lenovo and IPT Institute in Brazil, has significantly progressed in developing DNA writing machines and a versatile CODEC. This presentation offers a comprehensive overview of the DNA data storage pipeline, providing real-world experiences from data encoding to storage and retrieval. Our primary goal is to provide the audience with valuable insights and practical knowledge regarding coding and decoding techniques, specifically emphasizing our designed error correction architecture. The CODEC developed includes not only the standard methods for storage systems, such as encoding and decoding algorithms, addressing, and error correction coding, but also comprises the application of standard techniques in the bioinformatics field known as sanitizing process, such as the removal of low-quality reads, adapter removal and filter for contaminants, followed by alignment, and clustering of sequenced reads. The last released version of the Lenovo DNA Data Storage CODEC, named Pantheon, is already applying the Sector scheme proposed by the SNIA DNA Archive Rosetta Stone (DARS) technical working group. Exciting results from experiments with this CODEC will be demonstrated and discussed. Finally, our presentation will inspire participants and provide a comprehensive overview of the complexity of implementing coding and adaptative decoding techniques for a functional DNA data storage system, including practical considerations, potential roadblocks, and viable solutions, drawing from our real-world experiences.

Learning Objectives

  • Lenovo DNA Data Storage CODEC advances
  • Tests with the TWG DNA Archive Rosetta Stone (DARS) Sector Scheme
  • Challenges and new proposals in processing DNA-sequenced reads using state-of-the-art computational methods


---

Related Sessions