摘要
均值填补是常用的数据填补方式,但往往忽略了相邻变量之间的相互关系,又对噪声数据极为敏感。将主成份分析算法应用到均值填补算法中,提取相邻各属性的特征重要度,并采用属性重要度作为权重,以均值填补的计算方式算出缺失数据相邻矩阵的加权平均值,将其作为相邻属性对于均值填补的影响偏移值,加入到均值填补的均值计算中。通过对UCI数据集的仿真实验可知,基于PCA改进的算法填补的准确性明显优于均值填补算法。
Mean filling algorithm is a commonly-adopted way to fill missing data.However the correlation between these variables is ignored and also extremely sensitive to noise data.In this paper,the principal component analysis(PCA)algorithm is applied to mean filling algorithm,and the characteristics of adjacent properties are proposed.The weighted mean value of the adjoining matrix of the missing data is calculated by using the attribute importance as the weight.As an adjacent property,the offset value of the mean value is added to the mean calculation of the mean filling.According to results of the UCI dataset simulation experiment,the accuracyof the improved complement algorithm based on PCA is clearly higher than that of the mean filling algorithm.
作者
谢霖铨
毕永朋
廖龙龙
XIE Lin-quan;BI Yong-peng;LIAO Long-long(Faculty of Science,Jiangxi University of Science and Technology,Ganzhou 341000,China)
出处
《软件导刊》
2018年第6期67-69,76,共4页
Software Guide
基金
国家自然科学基金项目(61762047)
江西省科技厅青年科学基金项目(20161BAB211015)