DNA存储场景下基于引物索引矩阵的文件高效随机检索方法

Efficient File Random Access Method Based on Primer Index Matrix in DNA Storage Scenarios

下载PDF

导出

摘要 DNA分子具有密度高和稳定性的优势,有望成为下一代海量数据存储需求的介质,近年来受到广泛关注。目前将引物作为文件的唯一标识,基于聚合酶链式反应(PCR)扩增技术可实现对DNA池存储文件的随机检索,但对引物与文件之间的分配和映射关系没有进行深入研究,仍然采用随机分配的方式来关联引物与文件,这会导致目标引物序列的查找效率降低,且保存引物与文件的映射关系表会造成大量的数据冗余。为了提供一种高效的硅基计算设备与碳基存储系统的连接桥梁,有效降低存储引物与文件映射关系所带来的数据冗余,该文提出一种基于引物索引矩阵的DNA存储随机检索方法。该方法通过将存储文件集按照文件的不同属性进行划分来构建引物索引矩阵,同时将引物库中的引物按照转换规则转化为有序引物库,最后优化引物与文件之间的映射关系,以实现对文件的高效、多维度检索。实验结果表明,在存储不同规模的文件集时,运用所提算法建立对应的引物索引矩阵,可将引物检索效率提高为常数级时间复杂度,并且存储引物与文件的映射关系所需要的额外存储空间从原来的线性增长优化为对数增长。 DNA molecules have the advantages of high density and stability,and are expected to become the medium for the next generation of massive data storage needs,which has received widespread attention in recent years.Currently,primers are used as the unique identifier for files,and random retrieval of DNA pool storage files can be achieved based on polymerase chain reaction(PCR)amplification technology.However,the allocation and mapping relationship between primers and files have not been thoroughly studied,and random allocation is still used to associate primers and files.This will lead to a decrease in the search efficiency of the target primer sequence,and saving the mapping relationship table between primers and files will cause a lot of data redundancy.In order to provide an efficient connection bridge between silicon-based computing devices and carbon-based storage systems,and effectively reduce the data redundancy caused by storing primer-file mapping relationships,a random retrieval method for DNA storage based on the primer index matrix is proposed in this paper.This method constructs a primer index matrix by dividing the stored file set according to different attributes of the file,and converts the primers in the primer library into an ordered primer library according to conversion rules.Finally,the mapping relationship between primers and files is optimized to achieve efficient and multi-dimensional retrieval during file random retrieval.The experimental results show that when storing file sets of different sizes,the efficiency of primer retrieval is improved to a constant level of time complexity by establishing the corresponding primer index matrix using the proposed algorithm in this paper,and the extra storage space required to store the mapping relationship between primers and files is optimized from linear growth to logarithmic growth.

作者张淑芳李予辉李炳志 ZHANG Shufang;LI Yuhui;LI Bingzhi(School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China;Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering(Ministry of Education),School of Chemical Engineering and Technology,Tianjin University,Tianjin 300072,China;Frontiers Research Institute for Synthetic Biology,Tianjin University,Tianjin 300072,China)

机构地区天津大学电气自动化与信息工程学院天津大学化工学院系统生物工程教育部重点实验室和合成生物学前沿科学中心天津大学合成生物前沿研究院

出处《电子与信息学报》 EI CAS CSCD 北大核心 2024年第6期2568-2577,共10页 Journal of Electronics & Information Technology

基金天津市科技计划项目(22JCYBJC01390)。

关键词 DNA存储随机检索引物索引矩阵 DNA storage Random search Primer index matrix

分类号 TN911.7 [电子电信—通信与信息系统] TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1周楚雯.浅析新兴构式“XX刺客”[J].现代语言学,2023,11(4):1431-1436.
2宋志伟,韩素丽,秦鹏举,苗晨曦,谢明星.氧化钙改性赤泥固化黄土的电阻率特性及机理[J].太原理工大学学报,2024,55(2):331-337.
3何琦,郭勇,杨红.FABP4基因在3个贵州地方鸡种中多态性研究[J].现代畜牧科技,2023(11):49-52.
4刘敬鹄,徐志浩.随机两体耗散诱导的非厄米多体局域化[J].物理学报,2024,73(7):300-306.
5苏比娜·肖克来提,李群,鲁思梦,宁雪飞,周继阳,阮悦,康彩彦,王贤磊.甜瓜皱叶基因wl的遗传分析与定位[J].分子植物育种,2024,22(5):1480-1487.
6陈欣怡,梁宏,赤列旺姆,扎西旺杰,德吉玉珍,唐文强,赵海龙.西藏中部地区散养牦牛和绵羊细粒棘球绦虫cox1和nad1的多态性[J].中国高原医学与生物学杂志,2024,45(2):107-114.
7潘爽,刘金花,孙巧巧,李鹏举.宽体金线蛭COI基因特异性引物的设计及验证[J].现代中药研究与实践,2024,38(2):13-17.
8刘明俊,周芬,朱振,黄鹏涛,姚志鹏,潘牧.高电位下PEMFC阴极催化层Pt对碳腐蚀的影响[J].电源技术,2024,48(6):1003-1010.

电子与信息学报

2024年第6期

浏览历史

内容加载中请稍等...

DNA存储场景下基于引物索引矩阵的文件高效随机检索方法

相关作者

相关机构

相关主题

浏览历史