期刊文献+

基于几种机器学习算法的致病遗传基因位点分析 被引量:1

Analysis of pathogenic genetic loci based on several machine learning algorithms
下载PDF
导出
摘要 基因中的SNP位点的识别与筛选已成为复杂疾病与基因关联分析研究中日益重要的课题.本文首先对某类疾病基因库采用医学上常用的位点分类方式,分别统计样本总体各个位点的基因频率,从而确定主等位基因与次等位基因,将每个位点的碱基对(A,T,C,G)信息编码转化为数值编码.其次,采用卡方检验方法粗略筛选出可能的SNP位点,最后应用随机森林算法、Bagging、AdaBoost算法、Lasso Logistic算法等机器学习算法筛选出判别结果具有一致性的基因位点,并采用Cross-Validation方法对筛选结果的有效性进行了验证. The identification and screening of SNP locus in genes has become an increasingly important topic in the study of complex diseases and gene associations.Firstly,This paper adopts the commonly used site classification methods for certain disease gene banks to count the individual sites’ gene frequency which is of the sample separately.This operation can help us determine the primary allele and the minor allele and encode the base pair (A,T,C,G) information of each locus into a numerical code.Secondly,using the chi-square test method to roughly screen the possible SNP loci were used.Finally,the machine learning algorithm such as Random Forest algorithm,Bagging,AdaBoost algorithm and Lasso Logistic algorithm was used to screen the loci with consistent results.The Cross-Validation method was used to check the validity of the screening results.
作者 方雅兰 库在强 FANG Ya-lan;KU Zai-qiang(College of Mathematics and Statistics, Huanggang Normal University,Huanggang 438000,Hubei, China)
出处 《黄冈师范学院学报》 2019年第3期1-5,共5页 Journal of Huanggang Normal University
基金 2018年黄冈师范学院教育硕士教学案例项目(JYJXAL2018001)
关键词 SNP位点 随机森林算法 BAGGING算法 ADABOOST算法 SNP locus Random Forest algorithm Bagging algorithm AdaBoost algorithm
  • 相关文献

参考文献1

二级参考文献24

  • 1Freund Y, Schapire R E. Experiments with a new boosting algorithm. In: International Conference on Machine Learning (ICML), Bari, 1996.
  • 2Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci, 1997, 55: 119-139.
  • 3Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Mach Learn, 1999, 36: 105-139.
  • 4Dietterich T. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Mach Learn, 2000, 40: 139-157.
  • 5Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Kauai, 2001.
  • 6Caruana R, Niculescu-Mizil A. An empirical comparison of supervised learning algorithms. In: 23th International Conference on Machine Learning (ICML), Pittsburgh, 2006.
  • 7Friedman J, Hastie T, Tibshirani R. Additive logistic regression: A statistical view of boosting. Ann Stat, 2000, 28: 337-407.
  • 8Jiang W. Process consistency for adaboost. Ann Stat, 2004, 32: 13-29.
  • 9Lugosi W, Vayatis N. On the bayes-risk consistency of regularized boosting methods. Ann Stat, 2004, 32: 30-55.
  • 10Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization. Ann Stat, 2004, 32: 56-85.

同被引文献8

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部