摘要
数据缺失是众多影响数据质量的因素中最常见的一种.若缺失数据处理不当,将直接影响分析结果的可靠性,进而达不到分析的目的.本文针对随机缺失偏正态数据,研究了偏正态众数混合专家模型的参数估计.将众数回归插补与聚类相结合,提出分层众数回归插补方法.利用机器学习插补和统计学插补的方法,进一步比较研究三种机器学习插补方法:支持向量机插补、随机森林插补和神经网络插补,三种统计学插补方法:分层均值插补、众数回归插补和分层众数回归插补的缺失数据处理效果.通过Monte Carlo模拟和实例分析结果表明,分层众数回归插补的优良性.
Data missing is one of the most common factors affecting data quality.If the missing data is not handled properly,the analysis process will be affected and the result will be unreliable.We investigate the parameter estimation of the mixture of expert model of the skew-normal mode for random missing skew-normal data,and we propose a new method called hierarchical mode regression interpolation,which is a mixture of mode regression interpolation and clustering.Then we compare the performance of three machine learning interpolation methods-support vector machine interpolation,random forest interpolation,neural network interpolation-and three statistical interpolation methods-hierarchical mean interpolation,mode regression interpolation,hierarchical mode regression interpolation.The results of Monte Carlo simulation and a real example show that the hierarchical mode regression interpolation achieve good result.
作者
鲁钰
吴刘仓
王格格
LU Yu;WU Liucang;WANG Gege(Faculty of Science,Kunming University of Science and Technology,Kunming 650504,China)
出处
《应用数学》
北大核心
2023年第2期474-486,共13页
Mathematica Applicata
基金
国家自然科学基金项目(11861041,12261051)。
关键词
缺失偏正态数据
众数混合专家模型
支持向量机插补
随机森林插补
BP神经网络插补
分层众数回归插补
Missing skew-normal data
Mode mixture of experts model
Support vector machine interpolation
Random forest interpolation
BP neural network interpolation
Hierarchical mode regression interpolation.