期刊文献+

高错误率长序列的高敏感度比对 被引量:1

Sensitive Alignment for Long Read with High Error Rate
下载PDF
导出
摘要 将第三代测序平台产生的高错误率的长序列(long read)与参考基因组进行映射比对,需要高的编辑距离阈值.为此种求解长序列比对问题,将高错误率的长序列分割成较短的片段,借鉴全映射比对的思想,寻找所有满足编辑距离阈值的序列片段的候选位置;采用对高编辑距离更敏感的基于Hash索引的变长种子播种算法,定位序列片段在参考基因组上的候选位置,将连续“插入删除”相同碱基的编辑距离设置为1,使得算法可以处理第三代测序数据中新出现的“均聚物(homopolymer)”类型错误,以提升序列比对的敏感度;对片段侯选位置数量进行统计分析,求出片段候选位置质量分数,过滤掉质量不高的片段侯选位置;根据序列片段间的位置关系,动态连接片段的侯选位置,连接时对不同错误类型给予不同罚分,以去除假阳性的候选位置,确保比对的准确度.在模拟和真实数据集上的实验结果表明,与同类方法相比,本文方法在获得相同高的准确度的同时,提升了比对查全率和敏感度. When the reference genome and long reads with high error rate generated by the third-generation sequencing platform are aligned,a high threshold of edit distance is required.The long reads with high error rate are divided into some short sequence segments to find the candidate locations with the threshold of editing distance.The candidate positions of the sequence segments that meet the edit distance threshold are searched by the idea of all-mapping alignment,and the positions of the sequence segments occurring in the reference genome are located by applying the hash index-based variable-length seeding algorithm which is highly sensitive to the editing distance.The value of edit distance for continuous indel operations of the same base is assigned to 1 to handle the new type error called homopolymer occurring in the third-generation sequencing data and improve the sensitivity of sequence alignment.The number of candidate positions is counted and the mass fraction of the candidate position is obtained to filter out the positions of low quality.The candidate positions of the fragments are dynamically linked according to the position relationship between sequence fragments,and different penalty points are assigned to different types of errors during linking to remove the false positive candidate positions and ensure the alignment accuracy.Experimental results on simulated and real datasets show that compared with the existing algorithms,the proposed algorithm can obtain higher sensitivity and recall without sacrificing alignment precision.
作者 罗贤橦 钟诚 黎瑶 LUO Xian-tong;ZHONG Cheng;LI Yao(School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2020年第11期2442-2448,共7页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61962004)资助.
关键词 长序列比对 高错误率 分割映射 编辑距离 敏感度 long-read alignment high error rate split-read mapping edit distance sensitivity
  • 相关文献

参考文献1

二级参考文献1

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部