面向不平衡数据集的线性分类方法研究

Linear Classification Methods for Imbalanced Datasets

下载PDF

导出

摘要近年来,面向不平衡数据集的分类器学习与推广问题越来越受到人们的关注,在此以机器学习数据库、美国邮政编码、2维元音等国际上典型的分类问题为应用背景,重点研究如何用线性分类器解决样本数不平衡的问题;对Fisher、伪逆和单层感知器等3种典型的线性分类器做了深入的研究,并将这3种线性分类方法应用到不平衡数据集的分类中;通过实验及分析,这些新方法对平衡数据集的线性分类起到了良好的分类效果。 In recent years,much attention is paid to the learning and generalization problems of classifiers for imbalanced datasets.For the typical classification applications such as machine learning datasets,the US postal service,and 2-dimensional vowels,this paper focuses on the design and learning algorithms of linear classifiers in order to tackle the imbalanced datasets and makes deep studies on Fisher,Pseudo-inverse and single-layer perceptrons and applies these three linear classifiers to imbalanced datasets.Through experiments and analysis,these new methods play a good classification role in linear classification of imbalance datasets.

作者殷士勇

机构地区盐城纺织职业技术学院

出处《重庆工商大学学报（自然科学版）》 2010年第5期467-475,共9页 Journal of Chongqing Technology and Business University:Natural Science Edition

关键词不平衡数据集 FISHER分类器伪逆法单层感知器线性分类方法 imbalanced datasets Fisher classifier pseudo-inverse algorithm single-layer perceptrons linear classification methods

分类号 TP231 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献24

1KUBAT M, HOLTE R, MATWIN S. Machine Learning for the Detection of oil Spills in Satellite Radar Images [ J ]. Machine Learning, 1998, 30(23): 195-215.
2PPUA C, ALAHAKOON D. Minority Report in Fraud Detection: classification of Skewed Data[J]. Sigkdd Explorations,2004,6 ( 1 ) :50-59.
3PeREZ J, MUCUERZA J, ARBELAITZ O. ConSolidated tree Classifier learning in a car insurance fraud detection domain with class imbalance[ C]. Proc of the 3^rd International Conference on Advances in Pattern Recognition,2005:381-389.
4CASTILL M, SERRANO J. A multistrategy approach for digital text categoryation from imbalanced documents [ J ]. SIGKDD Explorations, 2004, 6 ( 1 ) : 70-79.
5ZHANG Z, WU X, SRIHARI R. Feature selection for text categoryzation on imbalanced data[ J]. SIGKDD Explorations,2004,6 ( 1 ) : 80-89.
6COHEN G, HILARIO M, SAX H. Data imbalance in surveillance of nosocomial infections [ C ]. Proc of the 4th International Symposium on Medical Data Analysis, Berlin : ~ s. n. ] , 2003 : 109-117.
7CHEN J, CHENG T, CHAN A. An application of classification analysis for skewed class distribution in therapeutic drug monitoring the case of vancomycin[ C ]. Proc of Workshop on Medical Information Systems, Beijing: [ s. n. ] ,2004:35-39.
8YOON K, KWEK S. An Unsupervised Learning Approach to Resolving The Data Imbalanced Issue in Supervised Learning Problems in Functional Genomics [ C ]. Proc of the 5^th International Conference on Hybrid Intelligent Systems (HISO5), Rio de Janeiro : [ s, n. ], 2005:303-308.
9RADIVOJAC P, KORAD U, SIVALINGAM K. Learning from Class Imbalanced Data in Wireless Sensor Networks [ J 1. Pros of Vehicular Technology Conference, Orlando: [ s, n. ] ,2003:3030-3034.
10MARTY F, OSCAR F. Readings in Computer Vision : Issuer, Problems, Principles and Paradigms [ J ]. Morgan Kaufmann, San Mateo, CA, 1987.

二级参考文献30

1邵森木,权建峰.三角波调频测距引信系统仿真研究[J].探测与控制学报,2005,27(2):13-17. 被引量：6
2朱莉,娄国伟,李兴国.非大气窗口毫米波FMCW近程雷达[J].制导与引信,2005,26(4):13-15. 被引量：1
3崔占忠.调频测距信号分析[J].探测与控制学报,2006,28(5):1-3. 被引量：15
4叶文,朱爱红,刘博,范洪达.飞机低空突防技术研究[J].电光与控制,2007,14(4):87-91. 被引量：17
5李金宗，模式识别导论，1994年
6边肇祺张学工等.模式识别[M].北京：清华大学出版社,2001..
7Chan P K, Stolfo S J. Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection[C]//In. Proc of the Fourth International Conference on Knowledge Discovery and Data Mining(KDD-98). New York, 1998: 164- 168.
8Weiss G M, Hirsh H. Learning to Predict Rare Events in Event Sequences[ C]// In. Proc of the Fourth International Conference on Knowledge Discovery and Data Mining(KDD-98). New York: 1998:359- 363.
9Atiya A F. Bankruptcy Prediction for Credit Risk Using Neural Network: a Survey and New Results [J ]. IEEE Trans. Neural Networks, 2001, 12(4) : 929 - 935.
10Kubat M, Holte R C, Matwin S. Machine Learning for the Detection of Oil Spills in Satellite Radar Images[J ].Machine Learning, 1998, 30(2): 195-215.

共引文献29

1童辉杰,杨雅婕,胡娟.大学毕业生职业心理适应预测模型研究[J].教育学术月刊,2013(2):54-57.
2冯天瑾,刘洪波,丁香乾.多层感知器分类行为的模糊线性分析[J].模式识别与人工智能,2005,18(3):334-339.
3高嘉伟,梁吉业.非平衡数据集分类问题研究进展[J].计算机科学,2008,35(4):10-13. 被引量：16
4周晓斌,崔宝同.企业财务状况评价模型研究[J].边疆经济与文化,2008(5):27-29. 被引量：2
5陈娅冰,王永仲,王延华.基于非平衡Fisher判别的两类红外特征提取[J].红外技术,2008,30(7):395-398.
6周舒冬,李丽霞,郜艳晖,徐英,叶小华,张丕德.加权Fisher线性判别法在非平衡医学数据集中的应用[J].数理医药学杂志,2009,22(1):59-61. 被引量：2
7徐庆,王秀春,李青.基于高分辨一维像的目标特征提取方法[J].现代雷达,2009,31(6):60-63. 被引量：18
8王娜,侯爽.K-最近邻分类技术的新发展与技术改进[J].河北省科学院学报,2009,26(4):11-13. 被引量：5
9尹军梅,杨明,万建武.一种面向不平衡数据集的核Fisher线性判别分析方法[J].模式识别与人工智能,2010,23(3):414-420. 被引量：5
10徐红国,王素格.基于改进的类别分布特征选择方法[J].中北大学学报（自然科学版）,2011,32(2):139-142.

1李恬,袁宇宾,李波.基于感知器算法的网络拥塞控制研究[J].计算机工程与设计,2009,30(7):1635-1638. 被引量：1
2全庆一,辛承恕,张忠平.一种新型单层感知器[J].电声技术,1995,19(7):2-4. 被引量：2
3林成荫,高大启.改进的RBF网络及其参数优化方法[J].计算机工程与应用,2004,40(18):95-98. 被引量：7
4王必强,毕硕本,董学士.基于单层感知器的数据挖掘分类的设计和实现[J].计算机技术与发展,2010,20(9):111-114. 被引量：3
5冯少辉,申东日,陈义俊.基于单层感知器的辨识方法及其应用[J].抚顺石油学院学报,1999,19(2):58-61.
6罗年,钟平,王士乐,涂新星.基于单层感知器的彩色图像目标提取方法[J].光学与光电技术,2010,8(6):9-12. 被引量：1
7李明,徐向东.时滞系统传感器故障检测的神经网络方法[J].电站系统工程,2001,17(6):371-374.
8丛爽.典型人工神经网络的结构、功能及其在智能系统中的应用[J].信息与控制,2001,30(2):97-103. 被引量：27
9苗苗,姜建国.一种采用类电磁机制算法的线性分类方法[J].西安电子科技大学学报,2015,42(2):84-88. 被引量：1
10夏先智,杜新宇,郑扬飞.基于蚁群遗传算法的属性约简[J].计算机与现代化,2013(1):25-28. 被引量：1

重庆工商大学学报（自然科学版）

2010年第5期

浏览历史

内容加载中请稍等...

面向不平衡数据集的线性分类方法研究

参考文献24

二级参考文献30

共引文献29

相关作者

相关机构

相关主题

浏览历史