Polypeptides consisting of amino acid(AA)sequences are suitable for high-density information storage.However,the lack of suitable encoding systems,which accommodate the characteristics of polypeptide synthesis,storage...Polypeptides consisting of amino acid(AA)sequences are suitable for high-density information storage.However,the lack of suitable encoding systems,which accommodate the characteristics of polypeptide synthesis,storage and sequencing,impedes the application of polypeptides for large-scale digital data storage.To address this,two reliable and highly efficient encoding systems,i.e.RaptorQ-Arithmetic-Base64-Shuffle-RS(RABSR)and RaptorQArithmetic-Huffman-Rotary-Shuffle-RS(RAHRSR)systems,are developed for polypeptide data storage.The two encoding systems realized the advantages of compressing data,correcting errors of AA chain loss,correcting errors within AA chains,eliminating homopolymers,and pseudo-randomized encrypting.The coding efficiency without arithmetic compression and error correction of audios,pictures and texts by the RABSR system was 3.20,3.12 and 3.53 Bits/AA,respectively.While that using the RAHRSR system reached 4.89,4.80 and 6.84 Bits/AA,respectively.When implemented with redundancy for error correction and arithmetic compression to reduce redundancy,the coding efficiency of audios,pictures and texts by the RABSR system was 4.43,4.36 and 5.22 Bits/AA,respectively.This efficiency further increased to 7.24,7.11 and 9.82 Bits/AA by the RAHRSR system,respectively.Therefore,the developed hexadecimal polypeptide-based systems may provide a new scenario for highly reliable and highly efficient data storage.展开更多
基金supported by the National Key Research and Development Program of China (2018YFA0902600,2021YFF1200300,and 2020YFA0712102)the National Natural Science Foundation of China (21877104,21834007,22107097,21878258,22020102003,and 22125701)+2 种基金K.C.Wong Education Foundation (GJTD-2018-09)the Youth Innovation Promotion Association of CAS (2021226)the Zhejiang Provincial Natural Science Foundation of China (Y20B060027).
文摘Polypeptides consisting of amino acid(AA)sequences are suitable for high-density information storage.However,the lack of suitable encoding systems,which accommodate the characteristics of polypeptide synthesis,storage and sequencing,impedes the application of polypeptides for large-scale digital data storage.To address this,two reliable and highly efficient encoding systems,i.e.RaptorQ-Arithmetic-Base64-Shuffle-RS(RABSR)and RaptorQArithmetic-Huffman-Rotary-Shuffle-RS(RAHRSR)systems,are developed for polypeptide data storage.The two encoding systems realized the advantages of compressing data,correcting errors of AA chain loss,correcting errors within AA chains,eliminating homopolymers,and pseudo-randomized encrypting.The coding efficiency without arithmetic compression and error correction of audios,pictures and texts by the RABSR system was 3.20,3.12 and 3.53 Bits/AA,respectively.While that using the RAHRSR system reached 4.89,4.80 and 6.84 Bits/AA,respectively.When implemented with redundancy for error correction and arithmetic compression to reduce redundancy,the coding efficiency of audios,pictures and texts by the RABSR system was 4.43,4.36 and 5.22 Bits/AA,respectively.This efficiency further increased to 7.24,7.11 and 9.82 Bits/AA by the RAHRSR system,respectively.Therefore,the developed hexadecimal polypeptide-based systems may provide a new scenario for highly reliable and highly efficient data storage.