摘要
如今,随机森林因具有降噪能力强,预测精度高,高维数据无需降维等优点,已成为集成学习中的一大经典模型。将随机森林算法应用于植物叶片的识别,数据选自于UCI Machine Learning数据集中的Leaf数据。首先,简述一下决策树及随机森林算法的相关理论知识;其次,对Leaf数据进行预处理,构建随机森林对数据进行预测分析;最后,使用精确率、召回率和F1值等指标验证模型的纵向准确性,接着基于均方根误差横向对比随机森林和其他三种机器学习算法,并使用基尼指数对特征重要性排序度量,实验结果表明,随机森林在植物叶片识别领域预测准确率较高,在特征重要性排序中,偏心距重要性程度最大。
Nowadays,Random Forest(RF)has strong noise reduction ability,high prediction accuracy and high dimension data without dimensionali ty reduction has become a classical model in Ensemble Learning.Applies the Random Forest algorithm to the identification field of plant leaves.The data is selected from Leaf Data in UCI Machine Learning dataset.First of all,briefly introduces the related theoretical knowl edge of Decision Tree and Random Forest algorithm;secondly,preprocesses the Leaf data,constructs the random forest to predict and ana lyze the data;finally,uses precision rate,recall rate and F1 measure to verify the longitudinal accuracy of the model,and then compares the random horizontally based on Root Mean Square Error(RMSE).Forest and other three machine learning algorithms,using Gini index to rank the importance of features,the experimental results show that the prediction accuracy of Random Forest is higher in the field of plant leaf recognition,and the eccentricity is the most important in the ranking of the importance of features.
作者
钱亮亮
QIAN Liang-liang(College of Management Science and Engineering,Anhui University of Finance and Economics,Bengbu 233030)
出处
《现代计算机》
2019年第29期44-47,51,共5页
Modern Computer
关键词
随机森林
机器学习
叶片识别
决策树
混淆矩阵
Random Forest
Machine Learning
Leaf Recognition
Decision Tree
Confusion Matrix