期刊文献+

一种面向密文基因数据的子序列外包查询方法 被引量:1

Subsequence Outsourcing Query Method over Encrypted Genomic Data
下载PDF
导出
摘要 精准医疗是一种强烈依赖病人基因组分析结果的医疗模式,而子串检索是执行基因组分析的重要方法。近年来,基因数据的数据量急剧增长,其存储代价和处理复杂度已远超医疗方可承受的范围。于是,利用云服务提供商廉价的存储设备和强大的计算能力,将基因数据托管至云服务提供商成为切实可行的解决方案。考虑到云服务提供商并不完全可信,在数据上传至云端之前执行数据加密是保证数据安全性和隐私性的有效方法。然而,如何基于加密数据执行序列检索成为亟待解决的问题。针对这一问题,对基因数据处理和密文检索领域进行调研,提出采用q-gram技术对序列数据的定长窗口创建前缀签名的方案,并在执行查询时在每个窗口中完成前缀查询的解决方案。在子序列查询过程中,云端并不能获取用户数据明文。最后通过实验验证了所提方案具有较好的性能和存储开销,例如当窗口大小为100且q取6时,对100000长序列串执行构建索引耗时15.06s。与GPSE相比,所提方法的性能更优。 Precision medicine is a medical model that relies heavily on patient genome analysis.The subsequence search plays an important role in performing genome analysis.Recently,the amount of genomic data are increasing dramatically,and the storage cost and processing complexity of them have been far beyond the capacity of hospitals.So,utilizing the powerful cloud computing capability to analyze and process such massive genomic sequence data is becoming popular.Considering that cloud service provider is not completely trusted,encrypting genomic data before uploading is a straightforward and effective solution to guarantee the privacy and security of DNA sequence data.However,how to perform queries over the encrypted genomic sequence data becomes another difficult problem.To address this problem,this paper made a detailed survey on genomic data processing and full-text retrieval fields.It constructed indexes on fixlength windows of the genomic sequence using q-gram mapping,and performed queries in every window.If the query sequence is the prefix of any window in genomic sequence,the query hits.Throughout all the processes,cloud service provider stores indexes and performs subsequence query,without obtaining any privacy details.Moreover,this paper set up the system model and several security assumptions,and proved their security.Experiments were carried out to evaluate the performance of scheme on a public dataset.The results show that the proposed solution achieves better performance in time cost and storage cost,i.e.when wis 100 and qis 6,the building index algorithm costs 15.60 sfor sequence of100000 length.Compared with GPSE,the proposed solution has higher execution efficiency in performing queries.
作者 王占兵 宋伟 彭智勇 杨先娣 崔一辉 申远 WANG Zhan-bing;SONG Wei;PENG Zhi-yong;YANG Xian-di;CUI Yi-hui;SHEN Yuan(School of Computer,Wuhan Universit)
出处 《计算机科学》 CSCD 北大核心 2018年第6期51-56,共6页 Computer Science
基金 国家自然科学基金(61232002 61572378)资助
关键词 精准医疗 子序列检索 密文查询 全文检索 Precision medicine Subsequence query Ciphertext query Full-text query
  • 相关文献

参考文献2

二级参考文献21

  • 1Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970;48:443- 53.
  • 2Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol 1981;147:195 -7.
  • 3Durbin R, Eddy SR, Krogh A, Mitchison G. Biological sequence analysis. Cambridge: Cambridge University Press; 1998.
  • 4Roy A, Raychaudhury C, Nandy A. Novel techniques of graphical representation and analysis of DNA sequences-a review. J Biosci 1998;23:55-71.
  • 5Zhang ZJ. DV-curve: a novel intuitive tool for visualizing and analyzing DNA sequences. Bioinformatics 2009;25:1112-7.
  • 6Huang G, Zhou H, Li Y, Xu L. Alignment-free comparison of genome sequences by a new numerical characterization. J Theor Biol 2011;281:107-12.
  • 7Xie G, Mo Z. Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications. J Theor Biol 2011;269:123-30.
  • 8Waz P, Bielifiska-Wag, D. Non-standard similarity/dissimilarity analysis of DNA sequences. Genomics 2014;104:464-71.
  • 9Liao B, Li R, Zhu W, Xiang X. On the similarity of DNA primary sequences based on 5-D representation. J Math Chem 2007;42:47-57.
  • 10Swain M J, Ballard DH. Color indexing. Int J Comput Vis 1991;7:11-32.

共引文献1

同被引文献3

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部