Soil diagnostic horizons, which each have a set of quantified properties, play a key role in soil classification. However, they are difficult to predict, and few attempts have been made to map their spatial occurrence...Soil diagnostic horizons, which each have a set of quantified properties, play a key role in soil classification. However, they are difficult to predict, and few attempts have been made to map their spatial occurrence. We evaluated and compared four machine learning algorithms, namely, the classification and regression tree(CART), random forest(RF), boosted regression trees(BRT), and support vector machine(SVM), to map the occurrence of the soil mattic horizon in the northeastern Qinghai-Tibetan Plateau using readily available ancillary data. The mechanisms of resampling and ensemble techniques significantly improved prediction accuracies(measured based on area under the receiver operator characteristic curve score(AUC)) and produced more stable results for the BRT(AUC of 0.921 ± 0.012, mean ± standard deviation) and RF(0.908 ± 0.013) algorithms compared to the CART algorithm(0.784 ± 0.012), which is the most commonly used machine learning method. Although the SVM algorithm yielded a comparable AUC value(0.906 ± 0.006) to the RF and BRT algorithms, it is sensitive to parameter settings, which are extremely time-consuming.Therefore, we consider it inadequate for occurrence-distribution modeling. Considering the obvious advantages of high prediction accuracy, robustness to parameter settings, the ability to estimate uncertainty in prediction, and easy interpretation of predictor variables, BRT seems to be the most desirable method. These results provide an insight into the use of machine learning algorithms to map the mattic horizon and potentially other soil diagnostic horizons.展开更多
基金supported by the National Natural Science Foundation of China (Nos. 41501229, 41371224, 41130530, and 91325301)the China Postdoctoral Science Foundation (No. 2015M581876)
文摘Soil diagnostic horizons, which each have a set of quantified properties, play a key role in soil classification. However, they are difficult to predict, and few attempts have been made to map their spatial occurrence. We evaluated and compared four machine learning algorithms, namely, the classification and regression tree(CART), random forest(RF), boosted regression trees(BRT), and support vector machine(SVM), to map the occurrence of the soil mattic horizon in the northeastern Qinghai-Tibetan Plateau using readily available ancillary data. The mechanisms of resampling and ensemble techniques significantly improved prediction accuracies(measured based on area under the receiver operator characteristic curve score(AUC)) and produced more stable results for the BRT(AUC of 0.921 ± 0.012, mean ± standard deviation) and RF(0.908 ± 0.013) algorithms compared to the CART algorithm(0.784 ± 0.012), which is the most commonly used machine learning method. Although the SVM algorithm yielded a comparable AUC value(0.906 ± 0.006) to the RF and BRT algorithms, it is sensitive to parameter settings, which are extremely time-consuming.Therefore, we consider it inadequate for occurrence-distribution modeling. Considering the obvious advantages of high prediction accuracy, robustness to parameter settings, the ability to estimate uncertainty in prediction, and easy interpretation of predictor variables, BRT seems to be the most desirable method. These results provide an insight into the use of machine learning algorithms to map the mattic horizon and potentially other soil diagnostic horizons.