摘要
滑坡易发性预测建模中如何选择非滑坡是影响建模结果的重要不确定因素。为研究不同非滑坡选择方式的影响规律,拟用5种方式,即全区随机、坡度低于5°区域、滑坡缓冲300 m外区域、信息量(IV)法、半监督法来选择出与滑坡等比例的非滑坡样本;进一步将各选择方式与随机森林(RF)耦合构建随机RF、低坡度RF、缓冲区RF、IV–RF及半监督RF等模型。以江西南康区为例,获取高程、岩性、公路密度等19种环境因子和233个滑坡编录,将滑坡编录划分为2598个滑坡栅格单元构建上述耦合模型的输入–输出数据集。再采用预测精度和易发性指数分布等指标分析其建模不确定性。进一步针对耦合模型预测的滑坡易发性指数分布不合理等问题,在半监督RF建模时采用滑坡与非滑坡比例为1∶2的样本集开展建模并与1∶1等比例样本集工况作对比。结果表明:1)低坡度RF、缓冲区RF、IV–RF和半监督RF等模型的预测精度均大幅优于随机RF模型,可见准确选择非滑坡样本对易发性建模至关重要;2)半监督RF模型选择非滑坡样本的建模性能最优,且半监督RF在滑坡∶非滑坡=1∶2比其在1∶1时预测的易发性指数分布规律更准确可信。后续研究中有必要更深入探索滑坡与非滑坡样本的比例问题。
How to select non-landslide samples for landslide susceptibility prediction(LSP)modeling is an important uncertainty affecting the LSP results.To study the influence of different non-landslide sample selection methods on LSP modeling,five sampling methods were proposed(Randomly selected from the whole area,from the specific attribute area with a slope lower than 5°,from the area outside buffer zone which is 300 m from each landslide,selected by information value method,selected by Semi-supervised machine learning)with the same number of landslide grid units,and coupled with Random Forest(RF)to construct random selection-RF,low-slope RF,buffer-based RF,IV-RF,and semi-supervised RF models for LSP.Taking Nankang County of Jiangxi province as the study area,a total of 19 environmental factors such as elevation,slope,population density,and road density were acquired,and 233 landslide inventories were obtained.The landslide inventory was divided into 2598 grids as landslide samples to construct the input-output of the above-coupled model.Then,the prediction accuracy and the distribution characteristics of predicted landslide susceptibility indexes were used to analyze the LSP modeling uncertainty.To further solve the problem of unreasonable distribution of landslide susceptibility indexes predicted by the coupled model,a sample set with a 1∶2 ratio of landslide to non-landslide was used for LSP,and the condition of the sample set with equal proportion was compared in semi-supervised RF.Results showed that:1)The prediction accuracy of models such as low-slope RF,buffer-based RF,IV-RF,and semi-supervised RF was substantially better than that of the random selection-RF model,suggesting that accurate selection of non-landslide samples was critical for LSP.2)The modeling performance of the semi-supervised RF was optimal,which predicted the distribution characteristics of landslide susceptibility indexes more accurately and reliably at landslide∶non-landslide=1∶2 than at 1∶1.It is necessary to explore the ratio of landslide to non-landslide samples in depth in future studies.
作者
黄发明
曾诗怡
姚池
熊浩文
范宣梅
黄劲松
HUANG Faming;ZENG Shiyi;YAO Chi;XIONG Haowen;FAN Xuanmei;HUANG Jinsong(School of Infrastructure Eng.,Nanchang Univ.,Nanchang 330031,China;State Key Lab.of Geohazard Prevention and Geoenvironment Protection,Chengdu Univ.of Technol.,Chengdu 610059,China;ARC Centre of Excellence for Geotechnical Sci.and Eng.,Univ.of Newcastle,Newcastle 2287,Australia)
出处
《工程科学与技术》
EI
CAS
CSCD
北大核心
2024年第1期169-182,共14页
Advanced Engineering Sciences
基金
国家自然科学基金项目(41807285,42377164,42272326)。
关键词
滑坡易发性预测
非滑坡样本选择
半监督机器学习
信息量
随机森林
landslide susceptibility prediction
non-landslide samples selection
semi-supervised machine learning
information value
random forest