摘要
求解三倍体个体单体型对于探索三倍体物种的遗传特性和表型差异等方面的研究具有重要的推动作用。针对带基因型信息的最少错误更正(MEC/GI)模型,提出了一种基于枚举策略的三倍体个体单体型重建算法EHTR。该算法依次重建3条单体型上的每一个单核苷酸多态性位点取值,对于给定位点,首先根据其基因型取值枚举该位点的3种单体型取值情况,然后选择片段支持度最高的取值作为该位点的重建值,算法的总时间复杂度为O(mn+mlogm+cnl)。采用CELSIM和MetaSim两种测序片段模拟生成器生成实验测试数据,在片段覆盖率、错误率、单片段长度、单体型长度和单体型海明距离等参数的不同设置下,对算法EHTR,GTIHR,W-GA和Q-PSO的重建率和运行时间进行对比分析。实验结果显示,算法EHTR在不同的参数设置下均能以更短的运行时间获得更高的重建率。
Determining triploid individual haplotype plays an important role in promoting the study of exploring genetic traits and phenotypic differences of triploid species. In this paper,based on the minimum error correction with genotype information (MEC/GI) model, an enumeration strategy based enumeration haplotyping triploid (EHTR) algorithm was proposed for solving triploid individual haplotype reconstruction problem. The EHTR algorithm reconstructs the SNP sites of the three haplotypes one after another. When reconstructing a given SNP site,it enumerates three kinds of SNP values in terms of the genotype of the site, and chooses one with the most high support degree coming from the SNP fragments that are covering the corresponding SNP site The total time complexity is O(rnn+rnlogrn-bcnl). In the ex- periments, two kinds of simulators CELSIM and MetaSim were invoked to generate SNP fragments. The reconstruction rate and running time were compared and analyzed among algorithms EHTR, GTIHR, W-GA and Q-PSO with different parameters setting, such as fragment coverage, error rate, single fragment length, haplotype length and haplotype ham- ruing distance. Under different parameter setting, the EHTR algorithm can obtain higher reconstruction rate in shorter running time, which is proved by a number of experiments.
作者
张倩
吴璟莉
ZHANG Qian WU Jing-li(College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China Guangxi Key Lab of Multi-source Information Mining ~ Security, Guangxi Normal University, Guilin 541004, China Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing, Guilin 541004, China)
出处
《计算机科学》
CSCD
北大核心
2017年第1期75-79,112,共6页
Computer Science
基金
国家自然科学基金项目(61363035
61502111)
广西自然科学基金项目(2015GXNSFAA139288
2013GXNSFBA019263
2012GXNSFAA053219)
"八桂学者"工程专项
广西多源信息挖掘与安全重点实验室系统性研究基金项目(14-A-03-02
15-A-03-02)资助
关键词
序列分析
三倍体
单体型
基因型
最少错误更正
枚举
Sequence analysis, Triploid, Haplotype, Genotype, Minimum error correction, Enumeration