期刊文献+

基于de Bruijn图的基因组索引结构设计

Design of de Bruijn graph-based genome indexing structure
下载PDF
导出
摘要 随着高通量测序技术的快速发展和测序成本的逐渐降低,个体基因组测序已成为研究不同物种的基因型、变异情况和相关疾病的重要手段。然而,由于基因组上的大量重复序列和高变异区域,日益增大的测序数据量以及测序技术的局限等因素,如何准确且快速地将大量测序数据比对到参考基因组面临巨大挑战。阐述基于哈希思想的基因组数据的存储和索引方法。本文说明基于seed-and-extension思想的基本比对思路。本文提出一个基于de Bruijn图模型的索引结构DBG-index以及该索引的3层结构数据存储方式。分析该索引结构的特性并提出种子的基本操作方法。该索引结构利用图模型特性可以有效组织基因组上的重复序列,从而在整体上减少了候选种子数量并极大提高了比对速度。 With the rapid development of sequencing technology and its gradual cost reduction,individual genome sequencing has become the main approach to study the genotypes of different species,variation knowledge and the related diseases.However,due to the massive repetitive sequences and high complex genomic regions,the ever-increasing sequencing data size and the technical limitations of sequencing technology,how to effectively and efficiently map the amount of reads to reference genomes is still facing the great challenges.This thesis introduces the hash table-based genomic data storage and indexing method and the basic idea of seed-and-extension scheme.A de Bruijn graph-based indexing structure named as DBG-index and its three-level storage mode are proposed.Moreover,several basic corresponding operations are put forward based on the index characteristics.It demonstrates that this structure could effectively organize and index the repetitive sequences on the genomes in such a way that the number of candidate seeds could be decreased and the mapping speed could greatly increase.
作者 国宏哲 王亚东 GUO Hongzhe;WANG Yadong(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
出处 《智能计算机与应用》 2019年第1期1-5,13,共6页 Intelligent Computer and Applications
基金 国家重点研发计划(2017YFC0907503)。
关键词 基因组 索引 序列映射 DE BRUIJN图 genome index reads mapping de Bruijn graph
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部