摘要
公平性学习是机器学习领域的研究热点,预防歧视的目的在于执行预测任务之前消除不公平训练集对于分类器的影响.为了保证分类公平性和准确性,本文通过发现和消除原始数据集中的歧视样本寻找生成公平数据集的方法,即提出了一种基于分类间隔的加权方法用于处理二分类任务中的歧视现象,并在demographic parity和equalized odds公平性判定准则上实现分类公平.为了不影响分类准确性,本文基于最大间隔原理将样本投影之后选出目标集,对于目标集中的每个样本,通过加权距离度量方法判定该样本是否具有歧视性,并进行修正.通过在3个真实数据集上与已有方法进行实验对比,本文的方法能够获得更好的分类公平性和准确性,并且不局限于特定的公平性判定准则和分类器.
Fairness learning is one of research hotspots in machine learning.The purpose of preventing discrimination is to eliminate the impact of unfair training sets on classifiers before performing prediction tasks.To ensure the fairness and accuracy of classification,this paper presents a method for generating fair data sets by identifying and eliminating discriminatory samples in original data sets.This is a margin-based weighted method for dealing with discrimination in binary classification tasks and obtaining the demographic parity and equalized odds.To improve the classification accuracy,the target set is selected after projecting based on the margin principle.For each sample in the target set,a weighted distance measurement method is used to identify the discriminatory sample and then correct it.The experimental results on three real data sets demonstrate that the proposed method can obtain better classification fairness and accuracy than existing methods;the conclusion is not limited to specific fairness criteria or classifiers.
作者
石鑫盛
李云
Xinsheng SHI;Yun LI(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;Jiangsu Key Laboratory of Big Data Security and Intelligent Processing,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处
《中国科学:信息科学》
CSCD
北大核心
2020年第8期1255-1266,共12页
Scientia Sinica(Informationis)
基金
国家自然科学基金(批准号:61603197,61772284,61876091,61802205)资助项目。
关键词
公平性学习
分类间隔
目标集
加权距离度量
歧视性
fairness learning
classification margin
target set
weighted distance metric
discriminatory