摘要
研究了数据挖掘中通过特征变换的数据预处理来提高支持向量机(SVM)分类精度的方法,提出了改进粒子群优化(PSO)和SVM混合的方法.用推广t统计、Fisher判别式和随机森林的线性加权度量来排序特征,得到预选特征子集,再用启发式信息加速改进PSO搜索特征的线性变换因子,并用二进制PSO对特征变换子集进行特征选择,在后处理中通过格子搜索获取了高精度SVM分类器.在NIPS 2003的madelon及10个UCI数据集上的实验表明,与有C-SVM分类精度相比,新方法在4个数据集上的精度更高.
Linear feature transformation was investigated to improve the classification accuracy of support vector machine (SVM) by preprocessing, and a hybrid method combining the modified particle swarm optimization (PSO) with SVM was presented. In the method, features top-ranked were preselected by linear weighted combination of t-statistic extended, Fisher's discriminant ratio and random forests feature importance scores, and a modified PSO and novel heuristic info were used to attract swarm to find optimal linear feature transformation factors. Features on dataset transformed were further refined by binary PSO, and a grid method was utilized to obtain SVM with high accuracy. Experiments on madelon of neural information processing system (NIPS) 2003 and ten data sets of university of California Irvine (UCI) verify this method has higher accuracy on 4 data sets than original C-SVM.
出处
《北京邮电大学学报》
EI
CAS
CSCD
北大核心
2009年第6期24-27,52,共5页
Journal of Beijing University of Posts and Telecommunications
基金
高等学校博士学科点专项科研基金项目(20060013007)
国家科技支撑计划项目(2007BAH05B02-04)
北京市自然科学基金项目(4092029)
关键词
粒子群
特征变换
支持向量机
特征选择
分类
particle swarm optimization
feature transformation
support vector machine
feature selection
classification