期刊文献+

adaboost分类器的构建及其对肝癌非编码区有害突变的鉴定 被引量:1

Establishment of model of adaboost classifier and evaluation of harmful mutations in non-coding regions of liver cancer cells
下载PDF
导出
摘要 目的建立adaboost分类器模型,评估肝癌非编码区疾病相关突变的可能性,识别非编码区的有害突变。方法利用人类基因突变数据库(HGMD)疾病相关的非编码区突变共13 108个作为实验组,中性单核苷酸多态性(SNP)作为对照,结合非编码区的调控因子,如保守区、进化性的RNA保守结构、高表达基因、DNA酶Ⅰ超敏感位点、转录因子结合位点、组蛋白修饰和早期复制基因等指标,建立adaboost分类器,分析以上指标对预测非编码区中有害突变的价值。构建预测概率的受试者工作特征(ROC)曲线,计算其相应的ROC曲线下面积(AUCROC)。分别利用全基因组关联研究(GWAS)和Clin Var疾病相关的突变数据库对模型进行验证。结果对疾病相关突变鉴别的重要性由大到小分别是保守区、早期复制基因、非翻译区(UTR)、启动子、高表达区、H3K36me3和保守性的转录因子结合位点等。应用adaboost分类器的预测概率建立ROC曲线,其AUCROC为0.90。GWAS和ClinVar疾病相关突变的平均得分显著高于中性SNP(P<0.05)。结论adaboost分类器有助于评估肝癌非编码区有害突变的可能性,是一种准确率高的预测工具。 Objective To establish a model of adaboost classifier, evaluate the possibility of disease related mutations in non-coding regions of liver cancer ceils, and identify harmful mutations in non-coding regions. Methods A total of 13 108 disease related mutations in non-coding regions were selected from HGMD database and used as subjects and neutral SNPs were used as controls. Combined with regulatory factors of non-coding regions, such as conserved regions, evolutionary RNA conservative structures, high-expressed genes, DNAse I hypersensitive sites, transcription factor binding sites, histone modification, and early replicated genes, the model of adaboost classifier was established. The value of these factors for predicting harmful mutations in non- coding regions was analyzed. The receiver operating characteristic (ROC) curve was plotted and the area under the ROC curve (AUCRoc) was calculated. The genome-wide association study (GWAS) and GlinVar disease- associated variants database were used to verify the model. Results Factors sorted by the imPortance for identifying disease related mutations were conserved regions, early replicated genes, untranslated Regions (UTR), promoters, high-expressed regions, H3K36me3, and conserved TFBSs. The ROC curve was established by using the prediction probability of adaboost classifier and the AUGRoc was 0.90. The average scores of GWAS and ClinVar disease-associated variants were siguificandy higher than that of neutral SNPs (P〈0.05). Conclusion The adaboost classifier is helpful for evaluating the possibility of harmful mutations in non-coding regions of liver cancer cells and is an accurate prediction tool.
出处 《上海交通大学学报(医学版)》 CAS CSCD 北大核心 2015年第6期819-823,共5页 Journal of Shanghai Jiao tong University:Medical Science
关键词 肝癌 非编码区突变 ADABOOST分类器 liver cancer non-coding variant adaboost classifier
  • 相关文献

参考文献19

  • 1Vogelstein B, Papadopoulos N, Velculescu VE, et al. Cancer genome landscapes[J]. Science, 2013, 339(6127) : 1546 - 1558.
  • 2Kumar P, Hcnikoff S, Ng PC. Predicting the effects of coding non- synonymous variants on protein function using the SIFT algorithm [J]. Nat Protoc, 2009, 4(7): 1073-1081.
  • 3Adzhubei IA, Schmidt S, Peshkin L, et al. A method and server for predicting damaging missense mutations[ J]. Nat Methods, 2010, 7 (4) : 248 -249.
  • 4Hindorff LA, Sethupathy P, Junkins HA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits [ J]. Proc Natl Aead Sci U S A, 2009, 106(23) : 9362 -9367.
  • 5Boyle AP, Hong EL, Hariharan M, et al. Annotation of functional variation in personal genomes using regulomeDB [ J ] Genome Res, 2012, 22(9) :1790 - 1797.
  • 6Khurana E, Fu Y, Colonna V, et al. Integrative annotation of variants from 1092 humans: application to cancer genomics [ ] ]. Science, 2013, 342(6154) : 1235587.
  • 7Freund Y, Schapire R. Experiment with a new boosting algorithm. In Proe of the 13th international conference on machine learning [ M ]. San Francisco : Morgan Kaufmann,1996 : 148 - 156.
  • 8Harrow J, Frankish A, Gonzalez JM, el al. GENCODE: the reference human genome annotation for the ENCODE project [ J ]. Genome Res, 2012, 22(9): 1760-1074. 1000.
  • 9Genomes Project Consortium, Abecasis GR, Auton A, et al.An integrated map of genetic variation from 1,092 human genomes [J]. Nature, 2012, 491(7422): 56 -65.
  • 10Rosenbloom KR, Sloan CA, Malladi VS, et al the UCSC genome browser: year 5 update[ J]. 2013( Database issue) : D56 - D63.

同被引文献4

引证文献1

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部