摘要
利用人口密度随机森林模型探讨人口密度与影响因子之间的非线性关系,是当前人口分布研究的前沿,但人口统计数据在空间分解过程中非正规约束下的最优输运问题尚未妥善解决。本文基于面积加权法,以矢量格式的村人口数据集为起点,以矢量格式的聚落和公顷网格数据集为约束,设计了一套顾及聚落分布的人口统计数据空间分解算法。通过将村常住人口数据依次分解至聚落和公顷网格之中,获得了栅格人口密度数据集(SJZ_RK)。分析表明,SJZ_RK数据集的人口总数为1039.60万人,仅产生0.04%的误差,说明本文提出的人口统计数据空间分解算法具有较高准确度。经测算,SJZ_RK数据集的人口分布基尼系数(0.8909)>GHS_POP(0.8548)>SJZ_CUN_RK(0.5898)>GPWv4(0.5897),说明考虑聚落分布状况的SJZ_RK数据集很好地刻画了人口分布的空间集聚和异质性特征,为构建人口密度随机森林模型等监督类机器学习模型训练样本提供了高质量的人口密度标签数据。在刻画非聚落区、城市聚落区、值域范围方面,SJZ_RK数据集更接近实际情况,其在前两方面优于GHS_POP数据集,其在这3个方面均显著优于SJZ_CUN_RK和GPWv4两个数据集。本文算法破解了2个难题:①优化了获取高精度栅格人口密度数据集的计算程序,实现了相对准确的人口分布离散化表达;②统一了人口密度标签数据和影响因子数据的粒度,从而为人口密度随机森林模型训练样本摆脱MAUP的困扰,为克服人口密度随机森林模型的区群谬误问题,创造了必要条件。
Exploring the non-linear relationship between population density and impact factors with random forest model of population density is the frontier of current population distribution research.However,the problem of optimal transport of demographic data under informal constraints in the process of spatial dis-ag-gregation has not been properly addressed.Based on an areal weighting technique,this study took into account the settlement distribution and developed a spatial dis-aggregation algorithm for demographic data.The al-gorithm began with a spatial dataset of the village population in vector format and used the settlements and hectare grid datasets as constraints.The raster dataset of population density(SJZ_RK)was obtained by dis-ag-gregating the village resident population data into settlements and hectare grids.The analysis demonstrated that the total population of the SJZ_RK dataset is 10.396 million,with only 0.04%error,indicating that the spatial dis-aggregation algorithm for demographic data proposed in this paper has high accuracy.The Gini coefficient of population distribution in SJZ_RK(0.8909)is greater than that in GHS_POP(0.8548),SJZ_CUN_RK(0.5898),and GPWv4(0.5897).This indicates that the SJZ_RK,which considers the distribution of settle-ments,effectively characterizes the spatial agglomeration and heterogeneity characteristics of population distri-bution.It provides high-quality population density label data for the construction of supervised machine learn-ing model training samples such as population density random forest models.In terms of depicting non-settle-ment areas,urban settlement areas,and value domain ranges,the SJZ_RK was more accurate than the GHS_POP in the first two aspects,and significantly outperformed GPWv4 and SJZ_CUN_RK in these three aspects.The algorithm in this article resolved two problems.1)The program for calculating a high precision population density raster dataset was optimized,resulting in a relatively precise discrete representation of popu-lation distribution.2)The raster granularity of the population density labeled data and the influence factor data was unified,so that the training samples of the population density random forest model were free from the MAUP,and the necessary conditions were created to overcome the ecological fallacy.
作者
李艳成
温佩璋
刘劲松
Li Yancheng;Wen Peizhang;Liu Jinsong(School of Geographical Sciences,Hebei Normal University,Shijiazhuang 050024,Hebei,China;Hebei Technology Innovation Center for Remote Sensing Identification of Environmental Change,Shijiazhuang 050024,Hebei,China;Geographic Experiment Teaching Demonstration Center of Hebei Province,Shijiazhuang 050024,Hebei,China;Hebei Key Laboratory of Environmental Change and Ecological Construction,Shijiazhuang 050024,Hebei,China)
出处
《地理科学》
CSCD
北大核心
2024年第7期1196-1205,共10页
Scientia Geographica Sinica
基金
国家自然科学基金项目(42071167,40871073)
第二次青藏高原综合科学考察研究(2019QZKK0406)
河北省自然科学基金项目(D2007000272)
河北师范大学重点发展基金项目(L2024ZD07)资助。
关键词
人口密度
面积加权
分解算法
聚落
population density
areal weighting
dis-aggregation algorithm
settlement