四种机器学习算法预测大豆蛋白质定位对比研究被引量：1

Comparative Study of Four Machine Learning Algorithms for Soybean Protein Localization Predicting

下载PDF

导出

摘要为探索不同缺失程度大豆蛋白质亚细胞定位预测的有效方法,提升大豆蛋白质亚细胞定位预测能力,本研究以1万条已知亚细胞定位位置的大豆蛋白质序列数据为研究对象,进行5%、10%、15%、20%和30%不同缺失比例完全随机缺失,分别运用SVM算法、朴素贝叶斯算法和随机森林算法和决策树4种机器学习算法预测缺失序列的亚细胞位置,对原始位置和预测后的位置进行相关性分析,对比分析不同算法的准确性和性能。结果显示:随机森林算法预测的准确率最高;朴素贝叶斯算法的运行速度最快;朴素贝叶斯算法的运行内存最小。在不考虑运行时间和运行内存因素,且对预测的准确率要求较高的情况下,随机森林算法的预测效果要优于另外3种算法;同种情况下,若对运行内存要求较高时,可优先考虑朴素贝叶斯算法。结果说明不同机器学习方法在不同缺失程度的预测需求下的适用性,可应用于大豆蛋白质数据的定位预测。 In order to explore an effective method for predicting the subcellular localization of soybean protein with different degrees of deletion, and improve the prediction ability of soybean protein subcellular localization, this study took 10 000 soybean protein sequence data with known subcellular localization positions as the research object, and carried out 5%, 10%, 15%, 20% and 30% sequences missing at random. Four machine learning methods, namely SVM algorithm, Naive Bayes algorithm, Random Forest algorithm and Decision Tree algorithm, were used to predict the subcellular position of the missing sequence. Correlation analysis was performed between the original position and the predicted position, and the accuracy and performance of different algorithms were compared and analyzed. The results showed that the prediction accuracy of Random Forest algorithm was the highest, the running speed of Naive Bayes algorithm was the fastest, and the running memory of Naive Bayes algorithm was the smallest. When the running time and running memory factors were not considered, and the prediction accuracy was high, the prediction effect of the random forest algorithm was better than the other three algorithms. In the same situation, if the running memory requirements are high, the Naive Bayes algorithm may be preferred. The results show the applicability of different machine learning methods under the prediction requirements of different degrees of missingness, and can be applied to the localization prediction of soybean protein data.

作者李佳楠高兴泉李卓滕小华黄斌张继成唐友 LI Jia-nan;GAO Xing-quan;LI Zhuo;TENG Xiao-hua;HUANG Bin;ZHANG Ji-cheng;TANG You(Electrical and Information Engineering College,Jilin Agricultural Science and Technology University,Jilin 132101,China;School of Information and Control Engineering,Jilin Institute of Chemical Technology,Jilin 132000,China;College of Electronic and Information,Northeast Agricultural University,Harbin 150030,China)

机构地区吉林农业科技学院电气与信息工程学院吉林化工学院信息与控制工程学院东北农业大学电气与信息工程学院

出处《大豆科学》 CAS CSCD 北大核心 2022年第3期337-344,共8页 Soybean Science

基金吉林省特色高水平学科新兴交叉学科“数字农业”(2018) 吉林省智慧农业工程研究中心项目(2016) 国家自然科学基金(31801441)。

关键词支持向量机算法朴素贝叶斯算法决策树算法随机森林算法大豆蛋白质完全随机缺失序列位置预测 Support Vector Machines algorithm Naive Bayesian algorithm Decision Tree algorithm Random Forest algorithm soybean protein completely random missing sequence position prediction

分类号 S565.1 [农业科学—作物学] TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献6

1白辉,王宪云,曹英豪,李晓明,李莉云,陈浩,刘丽娟,朱健辉,刘国振.水稻叶绿体蛋白质在生长发育过程中的表达研究[J].生物化学与生物物理进展,2010,37(9):988-995. 被引量：12
2赵丽,周巧霞,王拴,严赫,王雪尔,范飞,黄飞骏.线粒体分裂和融合相关蛋白质的研究进展[J].生理学报,2018,70(4):424-432. 被引量：9
3唐友,郑萍,王嘉博,张继成.对比Bayesian B等多种方法的大豆全基因组选择应用研究[J].大豆科学,2018,37(3):353-358. 被引量：4
4Bin Liu,Hao Wu,Kuo-Chen Chou.Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences[J].Natural Science,2017,9(4):67-91. 被引量：12
5未丽,刘建利.植物蛋白质亚细胞定位相关研究概述[J].植物科学学报,2021,39(1):93-101. 被引量：7
6于合龙,刘雨帆,张继成,唐友.基于多种机器学习方法填补大豆基因组缺失的比较研究[J].大豆科学,2021,40(1):122-129. 被引量：2

二级参考文献52

1张丽军,谢锦云,李选文,梁宋平.真核细胞质膜蛋白质组研究进展[J].生命科学,2005,17(5):398-403. 被引量：7
2Knaff D B, Amon D I. On two photoreactions in system II of plant photosynthesis. Biochim Biophys Acta, 1971, 2(226): 400-408.
3Raines C A. The Calvin cycle revisited. Photosynth Res, 2003, 1(75): 1 -10.
4Mittler R, Vanderauwera S, Gollery M, et al. Reactive oxygen gene network of plants. Trends Plant Sci, 2004, 10(9): 490-498.
5Apel K, Hirt H. Reactive oxygen species: metabolism, oxidative stress, and signal transduction. Annu Rev Plant Biol, 2004, 55:373-399.
6Ledford H K, Niyogi K K. Singlet oxygen and photo-oxidative stress management in plants and algae. Plant Cell Environ, 2005, 28(8): 1037-1045.
7Voesenek L A, Bailey-Serres J. Plant biology: Genetics of high-rise rice. Nature, 2009, 7258(460): 959-960.
8Cui S, Huang F, Wang J, et al. A proteomic analysis of cold stress responses in rice seedlings. Proteomics, 2005, 12(5): 3162-3172.
9Hajduch M, Rakwal R, Agrawal G K, et al. High-resolution two- dimensional electrophoresis separation of proteins from metalstressed riee(Oryza sativa L.) leaves: drastic of ribulose-1,5-bisphosphate carboxylase/oxygenase and induction of stress-related proteins. Electrophoresis, 2001, 13 (22): 2824-2831.
10Kim D W, Rakwal R, Agrawal G K, et al. A hydroponic rice seedling culture model system for investigating proteome of salt stress in rice leaf. Electrophoresis, 2005, 23(26): 4521-4539.

共引文献40

1赵美玲,林玉霞,吴彪,麦尔比娅·阿布力米提,巩月红,王建华.去氢骆驼蓬碱对细粒棘球蚴的线粒体Drp1、Mfn2蛋白表达以及线粒体途径凋亡的影响[J].中国病原生物学杂志,2023,18(1):52-57.
2苏艳红,袁乾坤,肖蓉,陈娟,李强,张世超.抗阻训练对增龄大鼠骨骼肌线粒体功能的影响[J].中国应用生理学杂志,2020(2):165-170. 被引量：6
3刘国振,刘斯奇,吴琳,徐宁志.基于抗体的水稻蛋白质组学——开端与展望[J].中国科学：生命科学,2011,41(3):173-177. 被引量：13
4王宪云,刘钊,曹英豪,白辉,刘丽娟,李莉云,刘国振.水稻骨干亲本不同发育时期蛋白质表达的多态性[J].中国农业科学,2011,44(14):2849-2856. 被引量：3
5兰金苹,李莉云,贾霖,曹英豪,白辉,陈浩,刘胜南,吴琳,刘国振.叶绿体基因编码蛋白质在水稻叶片生长过程中的表达研究[J].生物化学与生物物理进展,2011,38(7):652-660. 被引量：13
6魏健,李莉云,曹英豪,刘雨萌,巩校东,刘丽娟,张园园,刘国振.水稻类Tubby蛋白质在叶片生长和白叶枯病抗性反应中的表达[J].植物学报,2011,46(5):525-533. 被引量：6
7刘钊,贾霖,贾盟,关明俐,曹英豪,刘丽娟,曹振伟,李莉云,刘国振.水稻PP2Ac类磷酸酶蛋白质在盐胁迫下的表达[J].中国农业科学,2012,45(12):2339-2345. 被引量：7
8刘雨萌,兰金苹,曹英豪,刘钊,刘丽娟,李莉云,曹振伟,刘国振.水稻类钙调磷酸酶亚基B蛋白质在叶片生长和白叶枯病抗性反应中的表达[J].植物学报,2012,47(5):483-490. 被引量：7
9梁振普,宋小凤,张小霞,许锋,邵新峰.河南省杨树新生黄叶病害的差异蛋白质组学分析[J].植物病理学报,2012,42(5):486-496.
10贾霖,刘雨萌,范伟,关明俐,贾盟,窦世娟,魏健,彭业博,刘丽娟,李莉云,刘国振.水稻类钙调磷酸酶亚基B蛋白质在逆境胁迫下的表达[J].中国农业科学,2013,46(1):1-8. 被引量：7

同被引文献6

1胡祖光.基尼系数理论最佳值及其简易计算公式研究[J].经济研究,2004,39(9):60-69. 被引量：280
2宁忠华,贺振华,黄德济.基于地震资料的高灵敏度流体识别因子[J].石油物探,2006,45(3):239-241. 被引量：93
3Ye Yuan,Yang Liu,Jingyu Zhang,Xiucheng Wei,Tiansheng Chen.Reservoir prediction using multi-wave seismic attributes[J].Earthquake Science,2011,24(4):373-389. 被引量：1
4宋建国,杨璐,高强山,刘炯.强容噪性随机森林算法在地震储层预测中的应用[J].石油地球物理勘探,2018,53(5):954-960. 被引量：18
5李文秀,文晓涛,李天,刘松鸣,李雷豪,杨吉鑫.基于近似支持向量机的流体识别因子融合[J].地球物理学进展,2020,35(1):139-144. 被引量：3
6段友祥,李根田,孙歧峰.卷积神经网络在储层预测中的应用研究[J].通信学报,2016,37(S1):1-9. 被引量：22

引证文献1

1饶骁驰,杨昊,喻辉,文武,周航,陈敏.基于极端随机树算法的流体识别研究[J].物探化探计算技术,2023,45(5):566-578.

1李兴鑫,朱友文,王箭.安全高效的加密数据朴素贝叶斯训练和分类[J].密码学报,2022,9(3):448-467. 被引量：1
2胡晓乐,冯跃东,夏一雪.基于AHP-决策树算法的突发事件虚假信息识别与治理研究[J].新媒体研究,2022,8(7):8-13. 被引量：1
3张姝.基于情感分析改进的在线评论分类研究[J].软件工程与应用,2022,11(3):445-455.
4朱津成,丁云飞.基于机器学习的风机叶片结冰预测方法综述[J].中国工程机械学报,2022,20(2):129-133. 被引量：2
5裴笑笑,宣磊,华建峰,殷云龙.淹水胁迫下‘中山杉406’ThPDC基因的克隆及功能分析[J].分子植物育种,2022,20(10):3230-3238.
6顾莹,陈毅,岳霞,程恩思,宋平.6-苄基腺嘌呤对大豆内部水分分布及其生长状态的影响[J].农业工程学报,2022,38(5):303-308. 被引量：4
7签证移民[J].侨园,2022(3):50-51.
8闫冬雪,陈颖丽.基于序列信息的长链非编码RNA的亚细胞多定位预测[J].内蒙古大学学报（自然科学版）,2022,53(1):38-47.
9吴沛瑾.基于聚类算法的英语动词词形分类方法研究[J].赤峰学院学报（自然科学版）,2022,38(6):22-26.
10胡伟,杨谋存,陆金桂,张明杨,周爽.基于改进PSO-SVM的球磨机出粉量估算[J].煤矿机械,2022,43(7):25-27.

大豆科学

2022年第3期

浏览历史

内容加载中请稍等...

四种机器学习算法预测大豆蛋白质定位对比研究被引量：1

参考文献6

二级参考文献52

共引文献40

同被引文献6

引证文献1

相关作者

相关机构

相关主题

浏览历史

四种机器学习算法预测大豆蛋白质定位对比研究 被引量：1

参考文献6

二级参考文献52

共引文献40

同被引文献6

引证文献1

相关作者

相关机构

相关主题

浏览历史

四种机器学习算法预测大豆蛋白质定位对比研究被引量：1