一种面向密文基因数据的子序列外包查询方法被引量：1

Subsequence Outsourcing Query Method over Encrypted Genomic Data

下载PDF

导出

摘要精准医疗是一种强烈依赖病人基因组分析结果的医疗模式,而子串检索是执行基因组分析的重要方法。近年来,基因数据的数据量急剧增长,其存储代价和处理复杂度已远超医疗方可承受的范围。于是,利用云服务提供商廉价的存储设备和强大的计算能力,将基因数据托管至云服务提供商成为切实可行的解决方案。考虑到云服务提供商并不完全可信,在数据上传至云端之前执行数据加密是保证数据安全性和隐私性的有效方法。然而,如何基于加密数据执行序列检索成为亟待解决的问题。针对这一问题,对基因数据处理和密文检索领域进行调研,提出采用q-gram技术对序列数据的定长窗口创建前缀签名的方案,并在执行查询时在每个窗口中完成前缀查询的解决方案。在子序列查询过程中,云端并不能获取用户数据明文。最后通过实验验证了所提方案具有较好的性能和存储开销,例如当窗口大小为100且q取6时,对100000长序列串执行构建索引耗时15.06s。与GPSE相比,所提方法的性能更优。 Precision medicine is a medical model that relies heavily on patient genome analysis.The subsequence search plays an important role in performing genome analysis.Recently,the amount of genomic data are increasing dramatically,and the storage cost and processing complexity of them have been far beyond the capacity of hospitals.So,utilizing the powerful cloud computing capability to analyze and process such massive genomic sequence data is becoming popular.Considering that cloud service provider is not completely trusted,encrypting genomic data before uploading is a straightforward and effective solution to guarantee the privacy and security of DNA sequence data.However,how to perform queries over the encrypted genomic sequence data becomes another difficult problem.To address this problem,this paper made a detailed survey on genomic data processing and full-text retrieval fields.It constructed indexes on fixlength windows of the genomic sequence using q-gram mapping,and performed queries in every window.If the query sequence is the prefix of any window in genomic sequence,the query hits.Throughout all the processes,cloud service provider stores indexes and performs subsequence query,without obtaining any privacy details.Moreover,this paper set up the system model and several security assumptions,and proved their security.Experiments were carried out to evaluate the performance of scheme on a public dataset.The results show that the proposed solution achieves better performance in time cost and storage cost,i.e.when wis 100 and qis 6,the building index algorithm costs 15.60 sfor sequence of100000 length.Compared with GPSE,the proposed solution has higher execution efficiency in performing queries.

作者王占兵宋伟彭智勇杨先娣崔一辉申远 WANG Zhan-bing;SONG Wei;PENG Zhi-yong;YANG Xian-di;CUI Yi-hui;SHEN Yuan(School of Computer,Wuhan Universit)

机构地区武汉大学计算机学院

出处《计算机科学》 CSCD 北大核心 2018年第6期51-56,共6页 Computer Science

基金国家自然科学基金(61232002 61572378)资助

关键词精准医疗子序列检索密文查询全文检索 Precision medicine Subsequence query Ciphertext query Full-text query

分类号 TP309.2 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献2

1Yusei Kobori,Satoshi Mizuta.Similarity Estimation Between DNA Sequences Based on Local Pattern Histograms of Binary Images[J].Genomics, Proteomics & Bioinformatics,2016,14(2):103-112. 被引量：1
2王佳英,王斌,杨晓春.面向压缩生物基因数据的高效的查询方法[J].软件学报,2016,27(7):1715-1728. 被引量：2

二级参考文献21

1Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970;48:443- 53.
2Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol 1981;147:195 -7.
3Durbin R, Eddy SR, Krogh A, Mitchison G. Biological sequence analysis. Cambridge: Cambridge University Press; 1998.
4Roy A, Raychaudhury C, Nandy A. Novel techniques of graphical representation and analysis of DNA sequences-a review. J Biosci 1998;23:55-71.
5Zhang ZJ. DV-curve: a novel intuitive tool for visualizing and analyzing DNA sequences. Bioinformatics 2009;25:1112-7.
6Huang G, Zhou H, Li Y, Xu L. Alignment-free comparison of genome sequences by a new numerical characterization. J Theor Biol 2011;281:107-12.
7Xie G, Mo Z. Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications. J Theor Biol 2011;269:123-30.
8Waz P, Bielifiska-Wag, D. Non-standard similarity/dissimilarity analysis of DNA sequences. Genomics 2014;104:464-71.
9Liao B, Li R, Zhu W, Xiang X. On the similarity of DNA primary sequences based on 5-D representation. J Math Chem 2007;42:47-57.
10Swain M J, Ballard DH. Color indexing. Int J Comput Vis 1991;7:11-32.

共引文献1

1姬龙涛.科学技术数据库资源共享存取优化仿真[J].计算机仿真,2017,34(6):398-401. 被引量：1

同被引文献3

1李远铭,严迎建,李伟.基于粗粒度可重构密码阵列的AES算法映射实现[J].计算机应用与软件,2018,35(3):304-308. 被引量：6
2朱敬华,明骞.LBSN中融合信任与不信任关系的兴趣点推荐[J].通信学报,2018,39(7):157-165. 被引量：13
3马行坡,梁俊斌,马文鹏,李银,李然,奎晓燕.面向双层传感网的安全Top-k查询协议[J].计算机研究与发展,2018,55(11):2490-2500. 被引量：2

引证文献1

1梁丽莎,卢来,吴卫祖.LBS中外包空间数据的kNN安全查询方法[J].计算机应用与软件,2021,38(1):325-329. 被引量：4

二级引证文献4

1陈勇,刘胜宗.一种对等网络环境下多维数据分布式处理框架[J].内蒙古大学学报（自然科学版）,2021,52(5):520-529. 被引量：1
2岳喜超,王勇,陈乐,王超群.结合主成分与熵权的关键变量筛选算法[J].中国电子科学研究院学报,2023,18(7):671-679.
3杨阳.地理信息系统空间数据库中混合数据的近邻查询研究[J].资源导刊,2023(22):32-34.
4易叶青,易颖杰,刘云如,毛伊敏.面向电力物联网流数据的一种具有隐私保护的KNN查询方法[J].计算机应用研究,2024,41(4):1198-1207. 被引量：1

1邓芳亚.浅谈数字化影像制作流程中DIT的地位和作用[J].明日风尚,2018,0(9):298-298.
2重剑.手机微信群里传播的远控木马[J].电脑爱好者,2018,0(13):60-60.
3MUC.多云发展的5大驱动因素[J].电脑知识与技术（经验技巧）,2018,0(6):105-106.
4林靖生,吴韬.基于SLAM技术的医疗服务机器人[J].科技创新与应用,2018,8(22):74-76. 被引量：3
5卓豪ManageEngine助力IT合规性管理和审计[J].网络安全和信息化,2018,0(6):136-137.
6南敬昌,贾晓濛.锁位式RFID双前缀探针防碰撞算法[J].计算机应用研究,2018,35(3):742-744. 被引量：2
7乔丽媛,拓宁,伍涵宇,殷倩,张晓萍,姚新灵.StH2A-1正向调控本氏烟顶端优势和正常开花[J].农业生物技术学报,2017,25(12):1940-1949.
8薛生玲,江敏,常嘉琪,段妍雯,陈涛,郝宇,张芬,孙勃.芥蓝牻牛儿基牻牛儿基焦磷酸合成酶基因BoaGGPPS1的克隆及表达分析[J].江苏农业学报,2018,34(2):259-265. 被引量：3
9王亚楠.A/B测试云服务提供商吆喝科技获千万元A+轮融资由澳银资本领投[J].科技与金融,2018,0(5):41-41.

计算机科学

2018年第6期

浏览历史

内容加载中请稍等...

一种面向密文基因数据的子序列外包查询方法被引量：1

参考文献2

二级参考文献21

共引文献1

同被引文献3

引证文献1

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

一种面向密文基因数据的子序列外包查询方法 被引量：1

参考文献2

二级参考文献21

共引文献1

同被引文献3

引证文献1

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

一种面向密文基因数据的子序列外包查询方法被引量：1