期刊文献+

四种机器学习算法预测大豆蛋白质定位对比研究 被引量:1

Comparative Study of Four Machine Learning Algorithms for Soybean Protein Localization Predicting
下载PDF
导出
摘要 为探索不同缺失程度大豆蛋白质亚细胞定位预测的有效方法,提升大豆蛋白质亚细胞定位预测能力,本研究以1万条已知亚细胞定位位置的大豆蛋白质序列数据为研究对象,进行5%、10%、15%、20%和30%不同缺失比例完全随机缺失,分别运用SVM算法、朴素贝叶斯算法和随机森林算法和决策树4种机器学习算法预测缺失序列的亚细胞位置,对原始位置和预测后的位置进行相关性分析,对比分析不同算法的准确性和性能。结果显示:随机森林算法预测的准确率最高;朴素贝叶斯算法的运行速度最快;朴素贝叶斯算法的运行内存最小。在不考虑运行时间和运行内存因素,且对预测的准确率要求较高的情况下,随机森林算法的预测效果要优于另外3种算法;同种情况下,若对运行内存要求较高时,可优先考虑朴素贝叶斯算法。结果说明不同机器学习方法在不同缺失程度的预测需求下的适用性,可应用于大豆蛋白质数据的定位预测。 In order to explore an effective method for predicting the subcellular localization of soybean protein with different degrees of deletion, and improve the prediction ability of soybean protein subcellular localization, this study took 10 000 soybean protein sequence data with known subcellular localization positions as the research object, and carried out 5%, 10%, 15%, 20% and 30% sequences missing at random. Four machine learning methods, namely SVM algorithm, Naive Bayes algorithm, Random Forest algorithm and Decision Tree algorithm, were used to predict the subcellular position of the missing sequence. Correlation analysis was performed between the original position and the predicted position, and the accuracy and performance of different algorithms were compared and analyzed. The results showed that the prediction accuracy of Random Forest algorithm was the highest, the running speed of Naive Bayes algorithm was the fastest, and the running memory of Naive Bayes algorithm was the smallest. When the running time and running memory factors were not considered, and the prediction accuracy was high, the prediction effect of the random forest algorithm was better than the other three algorithms. In the same situation, if the running memory requirements are high, the Naive Bayes algorithm may be preferred. The results show the applicability of different machine learning methods under the prediction requirements of different degrees of missingness, and can be applied to the localization prediction of soybean protein data.
作者 李佳楠 高兴泉 李卓 滕小华 黄斌 张继成 唐友 LI Jia-nan;GAO Xing-quan;LI Zhuo;TENG Xiao-hua;HUANG Bin;ZHANG Ji-cheng;TANG You(Electrical and Information Engineering College,Jilin Agricultural Science and Technology University,Jilin 132101,China;School of Information and Control Engineering,Jilin Institute of Chemical Technology,Jilin 132000,China;College of Electronic and Information,Northeast Agricultural University,Harbin 150030,China)
出处 《大豆科学》 CAS CSCD 北大核心 2022年第3期337-344,共8页 Soybean Science
基金 吉林省特色高水平学科新兴交叉学科“数字农业”(2018) 吉林省智慧农业工程研究中心项目(2016) 国家自然科学基金(31801441)。
关键词 支持向量机算法 朴素贝叶斯算法 决策树算法 随机森林算法 大豆蛋白质 完全随机缺失 序列位置预测 Support Vector Machines algorithm Naive Bayesian algorithm Decision Tree algorithm Random Forest algorithm soybean protein completely random missing sequence position prediction
  • 相关文献

参考文献6

二级参考文献52

  • 1张丽军,谢锦云,李选文,梁宋平.真核细胞质膜蛋白质组研究进展[J].生命科学,2005,17(5):398-403. 被引量:7
  • 2Knaff D B, Amon D I. On two photoreactions in system II of plant photosynthesis. Biochim Biophys Acta, 1971, 2(226): 400-408.
  • 3Raines C A. The Calvin cycle revisited. Photosynth Res, 2003, 1(75): 1 -10.
  • 4Mittler R, Vanderauwera S, Gollery M, et al. Reactive oxygen gene network of plants. Trends Plant Sci, 2004, 10(9): 490-498.
  • 5Apel K, Hirt H. Reactive oxygen species: metabolism, oxidative stress, and signal transduction. Annu Rev Plant Biol, 2004, 55:373-399.
  • 6Ledford H K, Niyogi K K. Singlet oxygen and photo-oxidative stress management in plants and algae. Plant Cell Environ, 2005, 28(8): 1037-1045.
  • 7Voesenek L A, Bailey-Serres J. Plant biology: Genetics of high-rise rice. Nature, 2009, 7258(460): 959-960.
  • 8Cui S, Huang F, Wang J, et al. A proteomic analysis of cold stress responses in rice seedlings. Proteomics, 2005, 12(5): 3162-3172.
  • 9Hajduch M, Rakwal R, Agrawal G K, et al. High-resolution two- dimensional electrophoresis separation of proteins from metalstressed riee(Oryza sativa L.) leaves: drastic of ribulose-1,5-bisphosphate carboxylase/oxygenase and induction of stress-related proteins. Electrophoresis, 2001, 13 (22): 2824-2831.
  • 10Kim D W, Rakwal R, Agrawal G K, et al. A hydroponic rice seedling culture model system for investigating proteome of salt stress in rice leaf. Electrophoresis, 2005, 23(26): 4521-4539.

共引文献40

同被引文献6

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部