摘要
非平衡数据是指两类问题中正类样本与负类样本个数不相等,甚至相比悬殊.非平衡数据集会导致许多分类器的性能下降,这与分类器的构造原理有关.本文首先阐述了Fisher线性判别的分类机制,指出当两类样本的协方差矩阵不同时,样本不平衡会导致Fisher线性判别的性能下降.在此基础上,提出了一种加权Fisher线性判别(WFLD),以减小样本不平衡的影响.然后,从UCI中选择了8个非平衡数据集,并采用ROC曲线下面积作为评估指标进行比较,实验结果证明了WFLD模型的有效性.
As the majority of classification methods previously designed usually assume that their training sets are well-balanced, they have to be affected by class imbalance in which examples in training data belonging to one class heavily outnumber the examples in the other class. This paper demonstrates that, when the two sample covariance matrices are not identical, class imbalance has a negative effect on the performance of Fisher linear discriminant(FLD). A weighted FLD(WFLD) is proposed for reducing the negative effects of the class imbalance. Using area under the ROC curve as performance measarement, eight UCI imbalanced data sets are tested to show WFLD's effectiveness.
出处
《北京交通大学学报》
EI
CAS
CSCD
北大核心
2006年第5期15-18,共4页
JOURNAL OF BEIJING JIAOTONG UNIVERSITY
基金
浙江省自然科学基金资助项目(Y104540)
北京市重点实验室基金资助项目(TDXX0509)