摘要
个体单体型问题指如何利用个体DNA测序片断数据,根据不同的优化准则确定该个体单体型的计算问题.因为技术上的限制,DNA测序实验中能直接测定的片断长度是有限的,一个片断所覆盖的最大SNP位点数k1通常小于10;出于时间和金钱的考虑,覆盖一个SNP位点的最大片断数k2也不是很大,通常约为10左右;与要测定的单体型SNP位点总数n及所测序的DNA片断总数m相比,k1和k2均很小.在此基础上,文中对个体单体型问题最少SNP位点删除MSR和最少片段删除MFR模型进行了参数化,提出了时间复杂度分别为O(nk1k2+mlogm+mk1)和O(mk22+mk1k2+mlogm+nk2)求解无空隙MSR和MFR的精确算法.和Bafna等提出的时间复杂度为O(mn2)和O(m2n+m3)的精确算法相比,文中的算法效率提高了很多,具有较高的实用价值.
The Individual Haplotyping Problem is the computing problem of inducing an individual's haplotypes based on several optimal criteria from one's DNA sequence fragment data. Limited by current sequencing techniques, the maximum length of a fragment sequenced directly by DNA sequencers is not large, and the maximum number k1 of SNP sites covered by a fragment is usually smaller than 10. In order to save money and time, the maximum number k2 of fragments covering a SNP site is also small and about 10. Compared with the number n of SNPs of interest and the number m of fragments, k1 and k2 are both very small. Based on the above fact, this paper parameterizes MSR(Minimum SNP Removal) and MFR(Minimum Fragment Removal) models of the individual haplotyping problem, and introduces the corresponding exact algorithms to solve the gapless MSR and MFR problems in the time complexity O(nk1 k2 +m log m+mk1) and O(mk2^2+mk1k2+mlogm+nk2) respectively. Compared with Bafna et al. 's algorithms of time complexity O(mn^2) and O(m^2n+m^3) respectively, the algorithms are more efficient and applicable.
出处
《计算机学报》
EI
CSCD
北大核心
2009年第8期1637-1650,共14页
Chinese Journal of Computers
基金
国家自然科学基金(60773111)
国家"九七三"重点基础研究发展规划前期研究专项基金(2008CB317107)
长江学者和创新团队发展计划(IRT0661)
湖南省自然科学基金(09JJ3116)
中国博士后科学基金
中南大学博士后科学基金资助~~
关键词
单核苷酸多态性
单体型
参数化算法
最少SNP位点删除
最少片断删除
SNPs (Single-Nucleotide Polymorphisms)
haplotype
parameterized algorithm
Minimum SNP Removal (MSR)
Minimum Fragment Removal (MFR)