摘要
为解决由于隐私保护政策中研究人员在获取训练数据时经常受到限制而导致训练数据集匮乏问题,提出一种基于生成对抗网络(Generative Adversarial Networks,GANs)的混合数据(数值和标签)生成模型(mixGAN)用来生成符合真实数据分布的合成数据,以此作为真实数据的补充并增加可用样本的数量。该模型使用预训练的自编码器(Autoencoder)将给定数据集映射到低维连续空间;通过在低维空间中的生成器和原始数据空间中的鉴别器进行对抗学习从而获得具有模拟真实数据的生成模型。通过从属性独立分布和多属性相关性两个方面对生成算法性能进行评估,表明所提出算法比目前其他基于深度学习的生成算法能更好地保持原始数据的分布结构。
In the privacy protection policy,researchers are often restricted in obtaining training data,resulting in a lack of training data sets.To solve this problem,we propose a mixed data generation model(mixGAN)based on generative adversarial networks(GANs)to generate synthetic data that conforms to the real data distribution.It can supplement the real data and increase the number of available samples.The model pre-trained the autoencoder which mapped the given data set into a low-dimensional continuous space.Adversarial learning was performed by the generator in the low-dimensional space and the discriminator in the original data space,so as to obtain the generative model with the simulated real data.We evaluated the proposed method both in the independent distribution of the attribute and in the relationship of the attributes.The experiment results show that the proposed method has a better performance in preserve the distribution structure of the original data compared with other generation methods based on deep learning.
作者
魏宁
汪龙志
董方敏
Wei Ning;Wang Longzhi;Dong Fangmin(School of Computer and Information,China Three Gorges University,Yichang 443002,Hubei,China)
出处
《计算机应用与软件》
北大核心
2022年第6期29-34,共6页
Computer Applications and Software
基金
国家自然科学基金项目(61871258)。
关键词
生成对抗网络
自编码器
混合类型数据
Generative adversarial network
Autoencoder
Mixed type data