基于几种机器学习算法的致病遗传基因位点分析被引量：1

Analysis of pathogenic genetic loci based on several machine learning algorithms

下载PDF

导出

摘要基因中的SNP位点的识别与筛选已成为复杂疾病与基因关联分析研究中日益重要的课题.本文首先对某类疾病基因库采用医学上常用的位点分类方式,分别统计样本总体各个位点的基因频率,从而确定主等位基因与次等位基因,将每个位点的碱基对(A,T,C,G)信息编码转化为数值编码.其次,采用卡方检验方法粗略筛选出可能的SNP位点,最后应用随机森林算法、Bagging、AdaBoost算法、Lasso Logistic算法等机器学习算法筛选出判别结果具有一致性的基因位点,并采用Cross-Validation方法对筛选结果的有效性进行了验证. The identification and screening of SNP locus in genes has become an increasingly important topic in the study of complex diseases and gene associations.Firstly,This paper adopts the commonly used site classification methods for certain disease gene banks to count the individual sites’ gene frequency which is of the sample separately.This operation can help us determine the primary allele and the minor allele and encode the base pair (A,T,C,G) information of each locus into a numerical code.Secondly,using the chi-square test method to roughly screen the possible SNP loci were used.Finally,the machine learning algorithm such as Random Forest algorithm,Bagging,AdaBoost algorithm and Lasso Logistic algorithm was used to screen the loci with consistent results.The Cross-Validation method was used to check the validity of the screening results.

作者方雅兰库在强 FANG Ya-lan;KU Zai-qiang(College of Mathematics and Statistics, Huanggang Normal University,Huanggang 438000,Hubei, China)

机构地区黄冈师范学院数学与统计学院

出处《黄冈师范学院学报》 2019年第3期1-5,共5页 Journal of Huanggang Normal University

基金 2018年黄冈师范学院教育硕士教学案例项目(JYJXAL2018001)

关键词 SNP位点随机森林算法 BAGGING算法 ADABOOST算法 SNP locus Random Forest algorithm Bagging algorithm AdaBoost algorithm

分类号 O29 [理学—应用数学]

引文网络
相关文献

参考文献1

1WANG LiWei,DENG XiaoCheng,JING ZhaoXiang,FENG JuFu.Further results on the margin explanation of boosting:new algorithm and experiments[J].Science China(Information Sciences),2012,55(7):1551-1562. 被引量：1

二级参考文献24

1Freund Y, Schapire R E. Experiments with a new boosting algorithm. In: International Conference on Machine Learning (ICML), Bari, 1996.
2Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci, 1997, 55: 119-139.
3Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Mach Learn, 1999, 36: 105-139.
4Dietterich T. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Mach Learn, 2000, 40: 139-157.
5Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Kauai, 2001.
6Caruana R, Niculescu-Mizil A. An empirical comparison of supervised learning algorithms. In: 23th International Conference on Machine Learning (ICML), Pittsburgh, 2006.
7Friedman J, Hastie T, Tibshirani R. Additive logistic regression: A statistical view of boosting. Ann Stat, 2000, 28: 337-407.
8Jiang W. Process consistency for adaboost. Ann Stat, 2004, 32: 13-29.
9Lugosi W, Vayatis N. On the bayes-risk consistency of regularized boosting methods. Ann Stat, 2004, 32: 30-55.
10Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization. Ann Stat, 2004, 32: 56-85.

同被引文献8

1初芹,焦士会,王雅春,刘林,刘爱荣,吴宏军,谢振全,侯诗宇,耿繁军,汪聪勇,黄锡霞,谭世新,谈锐,张毅,俞英,张沅.牛蜘蛛腿综合征两个致病位点检测方法的建立[J].遗传,2013,35(5):623-627. 被引量：5
2谷春雷.家畜遗传病的检出及控制[J].畜牧与饲料科学,2014,35(12):79-81. 被引量：1
3Xiaojun Ding,Xuan Guo.A Survey of SNP Data Analysis[J].Big Data Mining and Analytics,2018,1(3):173-190. 被引量：1
4杨利英,殷黎洋,袁细国,张军英.富集分析框架下的致病SNP位点识别[J].西安电子科技大学学报,2016,43(3):43-48. 被引量：2
5王宇琛,赵诗瑶.具有遗传性疾病和性状的遗传位点分析[J].数学的实践与认识,2017,47(14):78-88. 被引量：1
6庞加平,黄姚宇翀,朱一琳.具有遗传性疾病和性状的遗传位点分析[J].数学的实践与认识,2017,47(14):107-117. 被引量：1
7毕然,何怡刚,史露强,程彤彤.基于卡方检验的莱斯信道统计特性可信性评估[J].计算机工程与设计,2019,40(3):632-637. 被引量：7
8杨俊闯,赵超.K-Means聚类算法研究综述[J].计算机工程与应用,2019,55(23):7-14. 被引量：281

引证文献1

1张恒益,郑惠玲.利用K均值聚类算法识别遗传疾病致病SNP位点[J].家畜生态学报,2020,41(12):25-31. 被引量：1

二级引证文献1

1豆小妮,王君琴.基于文献挖掘的偏头痛症状体征、中医证素分布特点及其相关性探析[J].中医临床研究,2023,15(4):145-148. 被引量：2

1李琳,杨日东,王哲,张学良,周毅.基于机器学习方法的原发性肝癌患者预后预测研究[J].中国数字医学,2019,14(3):34-37. 被引量：12
2Nat Genetics:大数据揭示疾病的基因关联[J].健康大视野,2018,0(24):12-12.
3郭冰楠,吴广潮.改进的随机平衡采样Bagging算法的网络贷款研究[J].计算机与现代化,2019(4):11-16. 被引量：1
4程丹婷,梁冀,潘小芳,卢英梅,黄世海,谭丽美.基于Tensorflow的人脸识别系统设计与实现[J].信息记录材料,2019,20(4):241-243. 被引量：3
5王宇飞,郭浩琳.智能型三相用电检查仪的研制[J].河南科技,2019,38(4):63-65. 被引量：1
6缪思斯.转变理念,促进人才培养[J].中国乡村医药,2019,26(11):46-47.
7李秀芳,黄志国,陈孝伟.Bagging集成方法在保险欺诈识别中的应用研究[J].保险研究,2019(4):66-84. 被引量：12
8李娟,秦铭,彭延波.急性脑梗死预后简易评估量表3个月预后价值分析[J].世界最新医学信息文摘,2019,0(26):172-173. 被引量：3
9铁丹丹,赵春燕,范聪聪,江海洋,王丽波.长春地区幽门螺杆菌耐药性和克拉霉素耐药基因突变位点分析[J].中华微生物学和免疫学杂志,2019,39(4):264-269. 被引量：14
10朱兴动,章思宇,范加利.基于Fisher判别法的飞发故障分系统预测模型研究[J].指挥控制与仿真,2019,41(3):71-75. 被引量：3

黄冈师范学院学报

2019年第3期

浏览历史

内容加载中请稍等...

基于几种机器学习算法的致病遗传基因位点分析被引量：1

参考文献1

二级参考文献24

同被引文献8

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于几种机器学习算法的致病遗传基因位点分析 被引量：1

参考文献1

二级参考文献24

同被引文献8

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于几种机器学习算法的致病遗传基因位点分析被引量：1