摘要
针对传统数据填补方法难以有效利用标签信息和缺失数据的随机信息的不足,提出面向混合型特征的粒子群优化填补算法。将连续型特征取值建模为高斯分布,均值和标准差作为优化参数。将离散型特征的取值概率作为参数进行优化。使用分类正确率作为优化目标,充分利用标签信息和缺失数据的随机信息。采用4种基于统计的方法和2种基于演化算法的填补方法作为对比,在6个典型的分类数据集上进行实验。结果表明,提出的方法在分类正确率指标上显著优于其他对比算法,同时具有较优的时间开销,能够有效解决混合特征数据缺失的问题。
Aiming at the deficiency of traditional data imputation methods in effectively using the label information and random characteristics of missing data,a particle swarm optimization based imputation method for mixed features was proposed.The value of continuous feature was modeled as Gaussian distribution,and the mean and standard deviation were used as optimization parameters.The value probability of categorical features was optimized as a parameter.The classification accuracy rate was used as the optimization target to make full use of random information of label information and missing data.Four statistical methods and two evolutionary algorithm based imputation methods were used to compare the results on six typical classification datasets.The results show that the proposed method significantly outperforms other comparison algorithms in terms of classification accuracy indicator,and has better time overhead at the same time,which can effectively solve the data missing problems of mixed features.
作者
刘艺
秦伟
李庚松
刘坤
王强
郑奇斌
任小广
LIU Yi;QIN Wei;LI Gengsong;LIU Kun;WANG Qiang;ZHENG Qibin;REN Xiaoguang(Academy of Military Sciences,Beijing 100091,China)
出处
《国防科技大学学报》
EI
CAS
CSCD
北大核心
2024年第6期107-112,共6页
Journal of National University of Defense Technology
基金
国家自然科学基金资助项目(91948303)
国家自然科学基金青年科学基金资助项目(61802426)。
关键词
缺失数据
数据填补
粒子群优化
混合特征
分类
missing data
data imputation
particle swarm optimization
mixed features
classification