摘要
序列比对和变异检测是基因组数据分析的基础步骤,是后续各种功能性分析的前提,也是基因组数据分析中最耗时的环节。为有效处理高通量测序技术产生的海量基因组大数据,采用OpenMP、MPI等技术,对序列比对算法和SNP检测算法进行了多级并行优化,并对相关算法进行了改进。在不同数据集和并行规模下的测试中,核心算法加速比达到9倍以上,大规模测试中算法的并行效率保持在60%以上,在保证精度的前提下获得了良好的并行性能和可扩展性,有效提高了基因组大数据变异检测的能力。
Sequence alignment and mutation detection are the basic steps of genomic data analysis.They are the premise of subsequent functional analysis,and the most time-consuming steps.In order to effectively deal with the massive genomic big data brought by high-throughput sequencing technology,MPI,OpenMP and other technologies to perform multi-level parallel optimization of sequence alignment algorithm and SNP detection algorithm were used.By testing on different data sets and parallel scales,the core algorithm reached more than 9x speedup,and the parallel efficiency remained above 60% in large-scale test.The improved algorithms obtain good parallel performance and scalability,that effectively improves the ability of genomic big data mutation detection.
作者
崔英博
黄春
唐滔
杨灿群
廖湘科
彭绍亮
CUI Yingbo;HUANG Chun;TANG Tao;YANG Canqun;LIAO Xiangke;PENG Shaoliang(College of Computer,National University of Defense Technology,Changsha 410073,China;College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China;National Supercomputer Center in Changsha,Changsha 410082,China)
出处
《大数据》
2020年第5期16-28,共13页
Big Data Research
基金
国家重点研发计划基金资助项目(No.2018YFB0204301,No.2017YFB0202602)
国家自然科学基金资助项目(No.61772543,No.61972408)。