摘要
数据的完整性对人工智能、数据挖掘的研究有重要意义,然而在数据从采集到应用的过程中,由于各种原因,经常会存在数据缺失的现象。为减少数据缺失对数据应用带来的影响,提出一种基于变分自编码器生成对抗网络(Variational Autoencoder Generative Adversarial Net-work,VAEGAN)的缺失数据填补模型。模型根据不完整数据集中缺失信息构建缺失掩码,利用缺失掩码在无需完整数据参与的条件下设计重构损失函数和鉴别损失函数,在不完整数据集上采用变分推断的思想生成缺失数据的估计值,利用鉴别器对抗训练生成网络。最后在不同数据集、不同缺失的条件下与常用的缺失填补算法进行对比实验。
Data integrity is of great significance to the research of artificial intelligence and data mining.However,the problem of missing data occurs constantly for various reasons during data acquisition and application.To reduce the adverse impact of missing data,a missing data imputation model based on VAEGAN is proposed.The model constructs a missing mask according to the missing information in the incomplete dataset.The reconstruction loss function and the discriminate loss function is designed using the missing mask,without participation of complete data.The estimated value of the missing data is generated by the idea of variational inference on the incomplete dataset.The model is trained using an adversarial mode.Finally,an experiment of missing data imputation is conducted using different methods on various datasets and different missing rate.
作者
徐晔波
倪颖杰
XU Yebo;NI Yingjie(Information Engineering University,Zhengzhou 450001,China;Jiangnan Institute of Computing Technology,Wuxi 214083,China)
出处
《信息工程大学学报》
2022年第2期224-229,共6页
Journal of Information Engineering University