期刊文献+

非平衡数据集Fisher线性判别模型 被引量:15

Fisher Linear Discriminant Model with Class Imbalance
下载PDF
导出
摘要 非平衡数据是指两类问题中正类样本与负类样本个数不相等,甚至相比悬殊.非平衡数据集会导致许多分类器的性能下降,这与分类器的构造原理有关.本文首先阐述了Fisher线性判别的分类机制,指出当两类样本的协方差矩阵不同时,样本不平衡会导致Fisher线性判别的性能下降.在此基础上,提出了一种加权Fisher线性判别(WFLD),以减小样本不平衡的影响.然后,从UCI中选择了8个非平衡数据集,并采用ROC曲线下面积作为评估指标进行比较,实验结果证明了WFLD模型的有效性. As the majority of classification methods previously designed usually assume that their training sets are well-balanced, they have to be affected by class imbalance in which examples in training data belonging to one class heavily outnumber the examples in the other class. This paper demonstrates that, when the two sample covariance matrices are not identical, class imbalance has a negative effect on the performance of Fisher linear discriminant(FLD). A weighted FLD(WFLD) is proposed for reducing the negative effects of the class imbalance. Using area under the ROC curve as performance measarement, eight UCI imbalanced data sets are tested to show WFLD's effectiveness.
出处 《北京交通大学学报》 EI CAS CSCD 北大核心 2006年第5期15-18,共4页 JOURNAL OF BEIJING JIAOTONG UNIVERSITY
基金 浙江省自然科学基金资助项目(Y104540) 北京市重点实验室基金资助项目(TDXX0509)
关键词 非平衡数据集 FISHER线性判别 ROC曲线下面积(AUC) class imbalance Fisher linear discriminant(FLD) area under the ROC curve (AUC)
  • 相关文献

参考文献14

  • 1Chan P K, Stolfo S J. Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection[C]//In. Proc of the Fourth International Conference on Knowledge Discovery and Data Mining(KDD-98). New York, 1998: 164- 168.
  • 2Weiss G M, Hirsh H. Learning to Predict Rare Events in Event Sequences[ C]// In. Proc of the Fourth International Conference on Knowledge Discovery and Data Mining(KDD-98). New York: 1998:359- 363.
  • 3Atiya A F. Bankruptcy Prediction for Credit Risk Using Neural Network: a Survey and New Results [J ]. IEEE Trans. Neural Networks, 2001, 12(4) : 929 - 935.
  • 4Kubat M, Holte R C, Matwin S. Machine Learning for the Detection of Oil Spills in Satellite Radar Images[J ].Machine Learning, 1998, 30(2): 195-215.
  • 5Chawla N V, Japkowicz N, Kolcz A. Editorial. Special Issue on Learning from Imbalanced Data Sets[C]// ACM SIGKDD Explorations, 2004, 6(1) : 1 - 6.
  • 6Weiss G M. Mining with Rarity-Problems and Solutions:A Unifying Framework [ C ] // SIGKDD Explorations,2004,6(1) :7 - 19.
  • 7Chawla N V, Japkowicz N. Kolcz A (editors). ICML'2003 Workshop on Learning from Imbalanced Data Sets[C/OL] [ 2003 ]. http://www, site. uottawa, ca/- nat/Workshop2003/workshop2003. html
  • 8Japkowica N (editor). Proc of the AAM'2000 Workshop on Learning form Imbalanced Data Sets[R]. AAAI Tech Report WS-00-05, AAAI, 2000.
  • 9肖健华,吴今培.样本数目不对称时的SVM模型[J].计算机科学,2003,30(2):165-167. 被引量:24
  • 10McLachlan G J. Discriminant Analysis and Statistical Pattern Recognition[M]. New York: Wiley, 1992.

二级参考文献1

共引文献80

同被引文献150

引证文献15

二级引证文献71

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部