期刊文献+

单核苷酸多态性数据缺失值填补方法研究 被引量:3

Study on method of missing data imputation for SNPs test
原文传递
导出
摘要 目的 通过对单核苷酸多态性(SNPs)数据展开缺失值填补影响因素与填补效果的研究,为利用SNPs数据进行基因与疾病的关联研究提供科学依据。方法 以国际人类基因组单体型图计划(Hap Map计划)数据为原始数据,利用HAPGEN2软件,依据原始数据生成SNP基因型模拟数据,人为产生缺失数据并进行缺失值的填补,分析不同条件(4个水平的缺失比例、4个水平的参考数据样本量)的填补错误率。结果 数据缺失比例越小、参考数据样本量越大,填补的错误率越低(样本量50、100、150和200的平均错误率分别为7.01%、5.92%、5.67%和5.26%);2种缺失模式在缺失比例较大时(r^2=0.825),随机缺失填补(平均5.64%)较固定缺失填补(平均9.10%)填补错误率低,而当缺失比例较小时(r^2=0.9),固定位点缺失模式的填补错误率较低(平均4.96%),在各种条件下IMPUTE2的填补错误率为3%~13%。结论 缺失比例、参考数据样本量以及缺失模式对缺失数据填补的准确性有一定影响;对标签SNP数据进行缺失值填补,再进一步分析是一种有效的策略。 Objective To study the effect and influencing factors of missing data imputation of single nucleotide polymorphisms (SNPs) test and to provide a scientific basis for the use of SNPs data in gene and disease association studies. Methods Human genome from International HapMap Project was used as raw data and Haploview software was used for tag SNP screening. HAPGEN2 software was adopted to simulate SNP reference data and the research data with simulated missing data. Then the research data were imputed with IMPUTE2 software based on reference data and the error rates of the imputations at different conditions( four levels of the missing ratio and the sample size of reference data) were compared. Results The imputation error rate was positively associated with the proportion of missing data and inversely with the sample size of reference data, with the error rates of 7. 01%, 5.92 % ,5.67 %, and 5.26 % corresponding to the reference data sample sizes of 50,100,150, and 200 repectively. The error rate of random site imputation( 5.64% ) was lower than that of tag SNP imputation( 9.10% ) when there was a large missing proportion( r^2 = 0. 825 ) and on the other hand using tag SNP imputation could fill the data at a lower error rate(4. 96% ) when there was a small missing proportion( r^2 = 0.9 ). The imputation results showed that IMPUTE2 software resulted in low error rates (3 % - 13 % ) at different situations. Conclusion The proportion of missing data, reference data sample size, and different missing patterns have influences on imputation error rate. Selecting a subset of aim gene and then imputing the data is a good strategy in analyses.
出处 《中国公共卫生》 CAS CSCD 北大核心 2014年第12期1576-1582,共7页 Chinese Journal of Public Health
基金 国家自然科学基金(81172741 30972537)
关键词 单核苷酸多态性(SNPs) 数据模拟 缺失值填补 标签SNP SNP data simulation data imputation tag SNP
  • 相关文献

参考文献14

  • 1International HapMap Consortium. A haplotype map of the hu-man genome[ J], Nature,2005,437(7063) :1299 - 1320.
  • 2Frazer KA, Ballinger DG,Cox DR, et al. A second generationhuman haplotype map of over 1 million SNPs[ J]. Nature,2007,449(7164) :851 -861.
  • 3李晓婷,詹思延.单核苷酸多态性检测技术及其应用研究进展[J].中国公共卫生,2009,25(2):250-252. 被引量:4
  • 4Ku CS, Loy EY, Pawitan Y,et al. The pursuit of genome-wideassociation studies: where are we now. [J]. J Hum Genet,2010,55(4) :195 -206.
  • 5凃欣,石立松,汪樊,王擎.全基因组关联分析的进展与反思[J].生理科学进展,2010,41(2):87-94. 被引量:36
  • 6Fridley BL,Jenkins G,Deyo-Svendsen ME,et al. Utilizing geno-type imputation for the augmentation of sequence data[ J]. PLoS0ne,2010,5(6) ;ell018.
  • 7Su Z, Marchini J, Donnelly P. HAPGEN2 : simulation of multipledisease SNPs[ J]. Bioinformatics,2011,27(16) :2304 -2305.
  • 8Li N, Stephens M. Modeling linkage disequilibrium and identif-ying recombination hotspots using single-nucleotide polymor-phism data[ J]. Genetics,2003 ,165 (4) :2213 -2233.
  • 9Huang L,Li Y,Singleton AB,et al. Genotype-imputation accuracyacross worldwide human populations [ J ] . The American Journal ofHuman Genetics,2009,84(2) :235 - 250.
  • 10Marchini J,Howie B,Myers S,et al. A new multipoint methodfor genome-wide association studies by imputation of genotypes[J]. Nat Genet,2007’39(7) :906 -913.

二级参考文献38

  • 1刘雅诚,郝金萍,严江伟,唐晖,王静,任嘉诚.用dHPLC技术检测线粒体DNA编码区单核苷酸多态性[J].中国法医学杂志,2006,21(3):142-146. 被引量:8
  • 2秦效英,李国选,江滨,陆道培.应用变性高效液相色谱检测CD_(31)563位点单核苷酸多态性[J].中华检验医学杂志,2006,29(7):627-630. 被引量:9
  • 3Sherry ST, Ward MH, Kholodov M, et al. dbSNP: the NCBI database of genetic variation[ J]. Nucleic Acids Res,2001,29:308 - 311.
  • 4Gilles PN,Wu D J, Foster CB, et al. Single nucleotide polymorphie discrimination by an electric dot blot assay on semiconductor microchips[ J ]. Nature Biotechnology, 1999,17:365 - 370.
  • 5Pastinen T, Raitio M, Lindroos K, et al. A system for specific, high-throughput genotyping by allele-specific primer extension on microarrays [ J ]. Genome Res, 2000,10 : 1031 - 1042.
  • 6Pastinen T, Kurg A,Metspalu A, et al. Minisequencing :a specific tool for DNA analysis and diagnostics on oligonucleotide arrays [ J]. Genome Res, 1997,7:606 - 614.
  • 7Sauser S, Lechner D, Berlin K, et al. A novel procedure for efficient genotyping of single nucleotide polymorphisras [ J ]. Nucleic Acids Res,2000,28 : 13.
  • 8Wood LD, Parsons DW, Jones S, et al. The Genomic landscapes of human breast and coloreetal cancers[J]. Science, 2007,318 : 1108-1113.
  • 9Easton DF, Pooley KA, Dunning AM, et al. Genome-wide association study identifies novel breast cancer susceptibility loci [ J ]. Nature,2007,447 : 1087 - 1093.
  • 10Chew DP, Bhatt DL, Robbins MA, et al. Incremental prognostic value of elevated baseline C-reactive protein among established markers of risk in percutaneous coronary intervention [ J]. Circulation,2001,104:992.

共引文献38

同被引文献20

引证文献3

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部