摘要
为充分了解县域耕地资源,提高耕地质量评价效率和准确度,减少在评价过程中主观因素的影响,以河南省三门峡市陕州区为研究对象,对其做耕地质量分等定级。根据数据特征将特征变量划分为分类型变量和数值型变量,对于数值型变量采用卡方分箱的方法处理,将处理后的结果与分类型变量用one-hot编码赋值,对于样本类别数较少的数据采用合成少数类过采样技术(SMOTE)平衡样本数据。随后引入机器学习的方法,利用XGBoost、LightGBM、ANN以及XGBoost-LightGBM-ANN组合模型对耕地质量进行分等定级。结果表明,将处理后的未采样数据作为训练集,代入耕地质量评价单一模型以及组合模型进行训练并预测时,准确率、精确率、召回率和F1分数等指标值均达到了0.97以上,其中单一模型ANN、LightGBM、XGBoost对应的各指标值呈递增的趋势,表明机器学习模型应用到耕地质量评价中产生了良好的效果。与未采样相比,经过采样后的数据作为训练集明显提升了模型性能,单一以及组合模型预测中各指标值均达到了0.99以上,其中基于集成学习投票的算法XGBoostLightGBM-ANN组合模型有明显优势,准确率达到了0.9983。
In order to fully understand the county‑level cultivated land resources,improve the efficiency and accuracy of cultivated land quality evaluation,and reduce the influence of human subjective factors,we selected Shanzhou District of Sanmenxia City in Henan Province as the research object and graded its cultivated land quality in this paper.The characteristic variables were divided into categorical variables and numerical variables according to the characteristics of data.For numerical variables,the Chi‑square binning method was used for processing.The processed results and categorical variables were assigned by one‑hot encoding,and the SMOTE method was used to balance the sample data for data with a small number of sample categories.Then the method of machine learning was used,and the quality of arable land was graded using XGBoost,LightGBM,ANN,and a combined XGBoost‑LightGBM‑ANN model.The results showed that,when the processed unsampled data were used as the training set and substituted into the single models as well as combined models for training and prediction of cropland quality evaluation,metrics such as accuracy,precision,recall,and F1 scores all reached values of over 0.97.Among them,the values of the indicators corresponding to the single models ANN,LightGBM and XGBoost were increasing,indicating that the application of machine learning models in the evaluation of cropland quality produced good results.Compared with the unsampled data,the sampled data as a training set produced a significant improvement in the model performance.In the prediction of single models and the combined model,the values of each index reached more than 0.99,and the XGBoost‑LightGBM‑ANN combined model based on the integrated learning voting algorithm had an obvious advantage,with an accuracy of 0.9983.
作者
程桂芳
王钰鑫
申会诗
CHENG Guifang;WANG Yuxin;SHEN Huishi(School of Mathematics and Statistics,Zhengzhou University,Zhengzhou 450001,China)
出处
《河南农业科学》
北大核心
2023年第8期155-162,共8页
Journal of Henan Agricultural Sciences
基金
河南省高等教育教学改革研究与实践项目(2021SJGLX060)。