摘要
为了提升合成表格数据的质量,提出一种简单的方法生成每个类的数据,使用度量损失控制每一类结构化数据的生成,将此方法命名为SCGAN。文章用此方法在二分类问题上进行了尝试。使用三种不同的度量损失在三个真实的数据集上训练生成对抗网络:逐次对每一类数据进行合成,利用合成数据训练分类器模型,使用gmean来评估模型的性能。结果表明,单独生成每一类数据能够提升模型的分类性能。
In order to improve the quality of tabular data synthesis,a simple method to generate data of each category is proposed,and it is named SCGAN and uses metrics loss to control the generation of structured data of each category.In this paper,the binary classification problem is tried to be solved by this method.By using three different metrics losses,the generative adversarial network is trained on three real datasets that each category of data are synthesized one by one,the classifier model are trained with the synthesized data,and gmean is used to evaluate the performance of the model.The results show that generating each category of data separately can improve the classification performance of the model.
作者
曹爽
Cao Shuang(College of Computer&Information Engineering,Henan University,Kaifeng,Henan 475000,China)
出处
《计算机时代》
2021年第4期25-27,共3页
Computer Era
关键词
合成数据
度量损失
生成对抗网络
分类器
synthesized data
metrics loss
generative adversarial networks
classifier