摘要
针对基因变异鉴定存在数据规模大、复杂度高、计算时间过长等问题,提出一种基于MapReduce的多样本基因鉴定并行优化模型。该模型主要将基因数据划分为小规模数据组并分配在多个节点并行,并对高性能计算平台节点的负载分配优先级进行动态判定。实验结果表明,使用该模型的并行数量平均每增加一倍,运算时间平均减少32%,计算效率也有较大提升。
Aiming at the problems of large-scale data, high complexity, and long computation time in gene identification, proposes a parallel optimization model for multi-samples genetic variation identification based on MapReduce. The model tries to divide the input data into smallscale data pieces and distributed them on multiple nodes in parallel. Meanwhile, load allocation priority of the high-performance computing nodes is dynamically determined. The experimental show that when the average number of parallel operations doubles, the average computation time decreases by 32%, and the computational efficiency also improves significantly.
作者
刘佳俊
胡大裟
蒋玉明
LIU Jia-jun;HU Da-sha;JIANG Yu-ming(College of Computer Science, Sichuan University, Chengdu 610065)
出处
《现代计算机》
2019年第3期11-15,共5页
Modern Computer
基金
郑州烟草研究院院长科技发展基金(No.DZ2016003)
关键词
多样本
基因鉴定
并行计算
单核苷酸多态性
Multiple Samples
Gene Identification
Parallel Computation
Single Nucleotide Polymorphism