摘要
【目的】联合国可持续发展目标(Sustainable Development Goals,SDGs)已经成为全球最重要的可持续发展问题。然而,SDGs指标相关数据高缺失率的现状严重影响了联合国对各国可持续发展目标实行过程的有效监测。研究如何对SDGs中的相关缺失数据进行补全具有重大的技术挑战,也对鞭策各国完成可持续发展目标具备重大意义。【方法】本文提出了一种融合MIC(最大信息系数)进行特征选择的迁移学习方法TLM(—种融合最大信息系数和迁移学习的方法),其能通过其它公开数据为目标变量构造特征,并联合相关回归技术建立数据预测模型,以达到对目标变量的缺失值进行预测的目的。【结果】本文以特定国家中SDGs指标3.2.1的数据集为例,使用TLM方法对目标变量的缺失值进行预测并补全,验证了TLM方法的有效性。【局限】由于影响SDGs指标的波动因素众多,因此,探索更多相关性分析方法并结合TLM方法对缺失值进行更加精确的预测是今后进一步研究的重点方向。【结论】结合了MIC和迁移学习的TLM方法能提升数据预测的准确率,可为SDGs相关领域工作者在处理数据缺失问题时提供重要的参考价值。
[Objective]The Sustainable Development Goals(SDGs)have become the most important sustainable development issue in the world.However,the high rate of missing data related to SDGs indicators has affected the UN's effective monitoring of implementation of sustainable development goals in various countries.Completion of the missing data in SDGs is technically challenging,and is of great significance in urging countries to achieve sustainable development goals.[Methods]This paper proposes a transfer learning method named TLM,which incorporates with MIC(maximal information coefficient)for feature selection.It can construct features for the target data from other public data and build a prediction model with related regression technology to predict the missing values of the target data.[Results]This article takes the data set of SDGs indicator 3.2.1 in a specific country as an example and uses TLM to predict the missing values of target data.The efiectiveness of TLM is verified.[Limitations]Due to the many factors that can affect SDGs indicators,exploring more correlation analysis methods which can be combined with TLM to make more accurate predictions of missing values is the focus of our future research.[Conclusions]The TLM method which combines with MIC and transfer learning can improve the accuracy of data prediction.Besides,it can provide effective reference value predictions for researchers in the related fields of SDGs when dealing with data missing problems.
作者
陈通宝
温亮明
黎建辉
Chen Tongbao;Wen Liangming;Li Jianhui(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《数据与计算发展前沿》
2020年第2期145-154,共10页
Frontiers of Data & Computing
基金
中国科学院战略性先导科技专项(A类)子课题:“大数据资源库与门户系统”(XDA19020104)。
关键词
联合国可持续发展目标
迁移学习
回归
数据缺失
数据补全方法
sustainable development goals
transfer learning
regression
data missing
data completion methods