期刊文献+

一种针对大规模Read Mapping的高效DBG索引方法

An Efficient DBG Indexing Method for Large-Scale Read Mapping
下载PDF
导出
摘要 为了能够回答生物信息学中关于de Bruijn graph(DBG)的两个问题——①对于任意的k-mer,回答其是否为DBG的顶点,②对于DBG的任意顶点,回答其邻接信息(入边和出边),提出了一种针对大规模read mapping的高效DBG索引方法.本文将以上两个问题转化为非重复多路径上的k-mer和(k+1)-mer的确切查找问题,并利用FM-index进行解决.首先,对给定的参考序列进行压缩,即非重复多路径的发现,从而压缩了序列中大量存在的重复(k+1)-mer.其次,基于非重复多路径FM-index对DBG进行索引.查找k-mer是否出现在DBG上,若找到,给出该k-mer的直接前驱和直接后继结点,从而提高时空效率.最后,在62种大肠杆菌菌株的基因组上进行实验.实验结果表明,所提出的方法可以高效地对多参考序列的DBG进行索引. In order to answer two questions about de Bruijn graph(DBG)in bioinformatics:①For any k-mer,answer whether it is the vertex of DBG;②For any vertex of DBG,answer its adjacency information(incoming edge and outgoing edge),an efficient DBG indexing method for large-scale read mapping is proposed.In this paper,the above two questions are transformed into the exact finding problem of k-mer and(k+1)-mer on non-repetitive multipath,and are solved by using FM-index.Firstly,a given reference sequence is compressed,which is the discovery of non-repetitive multipaths,thereby compressing the presence of large-scale repetitive(k+1)-mers in the sequence.Secondly,the DBG is indexed based on the non-repetitive multipath FM-index.The k-mer is searched whether it appears on the DBG,and if so,the direct predecessor and direct successor nodes of the k-mer are provided to improve the spatial-temporal efficiency.Finally,experiments are performed on the genomes of 62 E.coli strains.The experimental results show that the method proposed can efficiently index the DBG of multi-reference sequences.
作者 于长永 李俊杰 马海涛 赵宇海 YU Chang-yong;LI Jun-jie;MA Hai-tao;ZHAO Yu-hai(School of Computer Science&Engineering,Northeastern University,Shenyang 110169,China)
出处 《东北大学学报(自然科学版)》 EI CAS CSCD 北大核心 2023年第6期770-776,共7页 Journal of Northeastern University(Natural Science)
基金 国家自然科学基金资助项目(61772124).
关键词 de Bruijn graph 索引 read mapping(序列映射) FM-index 参考序列 de Bruijn graph index read mapping FM-index reference sequence
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部