摘要
针对随机森林(RF)模型进行泥石流易发性评价过程中存在连续型因子依靠主观意识分级、随机选取的非泥石流样本准确度较低等问题,以位于四川西南部的凉山彝族自治州为研究区,提出基于统计学先验模型抽样的随机森林对研究区进行泥石流易发性评价分区。利用累计灾害频率等曲线的相对变化对连续型因子进行分级处理;采用粗糙集理论(RS)和信息量法(IV)计算加权信息量值,划定极低和低易发性区并从中选择负样本数据。通过袋外误差(OOB)变化曲线确定RF模型的最佳树棵数n_estimators和分裂特征数max_features,随后构建加权信息量-随机森林(RSIV-RF)模型预测凉山州泥石流易发性。进一步地,与从全区随机选择非泥石流样本的RF模型开展对比研究。结果表明,训练集和测试集下RSIV-RF模型的准确度分别为0.89,0.83,且对应的ROC曲线的AUC值分别为0.920,0.895,均高于单独的RF模型;RSIV-RF绘制的泥石流易发性评价图与历史灾害分布较为一致,较高和高易发性等级区域占研究区面积比为18.625%,包含了78.57%的泥石流点。性能评估和易发性统计结果均表明基于RSIV-RF能够解决单独模型存在的非泥石样本采样不准确的问题,其泥石流易发性预测精度更高,在凉山州地区泥石流易发性评价研究中具有较好的适应性。
[Objective]In employing the random forest(RF)model for debris flow susceptibility assessment,challenges arose,including subjectivity in classifying continuous factors and the low accuracy of randomly selected nondebris flow samples.Taking Liangshan Yi Autonomous Prefecture in southwestern Sichuan Province as the study area,a random forest based on statistical prior model sampling was proposed to evaluate the debris flow susceptibility in the study area.[Methods]Continuous factors are classified by the relative changes in cumulative disaster frequency and other curves.Rough set theory(RS)and the information value method(IV)were used to calculate the weighted information values,delimit the extremely low-and low-prone areas and selecting the negative sample data.The optimal number of trees n_estimators and the number of feature splits max_features for the RF model were determined from the out-of-bag error(OOB)change curves.Subsequently,a weighted information random forest(RSIV-RF)model was constructed to predict the vulnerability of debris flow in Liangshan Prefecture.Furthermore,a comparative analysis with the RF model randomly selecting non-debris flow samples revealed the superior performance of the RSIV-RF model.[Results]The results show that the accuracy of the RSIV-RF model in the training set and the test set is 0.89 and 0.83,respectively,and the AUC value of the corresponding ROC curve is 0.920 and 0.895,respectively,which are higher than that of the RF model alone.The assessment map of debris flow susceptibility drawn by RSIV-RF is consistent with the distribution of historical disasters.The areas with high and higher susceptibility levels account for 18.625%of the study area,including 78.57%of debris flow points.[Conclusion]The results of the performance evaluation and susceptibility statistics show that RSIV-RF can solve the problem of inaccurate sampling of nondebris samples in a single model,and its prediction accuracy of debris flow susceptibility is higher.It has good adaptability in the study of debris flow susceptibility evaluation in Liangshan Prefecture.
作者
饶姗姗
冷小鹏
RAO Shanshan;LENG Xiaopeng(School of Computer and Network Security(Brooks College,Oxford),Chengdu University of Technology,Chengdu 610059,China)
出处
《地质科技通报》
CAS
CSCD
北大核心
2024年第1期275-287,共13页
Bulletin of Geological Science and Technology
基金
四川省科技厅应用基础研究项目(2021YJ0335)
四川省高校气象灾害预测预警研究项目(ZHYJ21-ZC01)。