摘要
提出一种基于偏最小二乘回归的鲁棒性特征选择与分类算法(RFSC-PLSR)用于解决特征选择中特征之间的冗余和多重共线性问题。首先,定义一个基于邻域估计的样本类一致性系数;然后,根据不同k近邻(k NN)操作筛选出局部类分布结构稳定的保守样本,用其建立偏最小二乘回归模型,进行鲁棒性特征选择;最后,在全局结构角度上,用类一致性系数和所有样本的优选特征子集建立偏最小二乘分类模型。从UCI数据库中选择了5个不同维度的数据集进行数值实验,实验结果表明,与支持向量机(SVM)、朴素贝叶斯(NB)、BP神经网络(BPNN)和Logistic回归(LR)四种典型的分类器相比,RFSC-PLSR在低维、中维、高维等不同情况下,分类准确率、鲁棒性和计算效率三种性能上均表现出较强的竞争力。
A Robust Feature Selection and Classification algorithm based on Partial Least Squares Regression (RFSC- PLSR) was proposed to solve the problem of redundancy and muhi-collinearity between features in feature selection. Firstly, the consistency coefficient of sample class based on neighborhood estimation was defined. Then, the k Nearest Neighbor (kNN) operation was used to select the conservative samples with local class structure stability, and the partial least squares regression model was used to construct the robust feature selection. Finally, a partial least squares classification model was constructed using the class consistency coefficient and the preferred feature subset for all samples from a global structure perspective. Five data sets of different dimensions were selected from the UCI database for numerical experiments. The experimental results show that compared with four typical classifiers--Support Vector Machine (SVM), Naive Bayes (NB), Back-Propagation Neural Network (BPNN) and Logistic Regression (LR), RFSC-PLSR is more efficient in low-dimensional, medium-dimension, high-dimensional and other different cases, and shows stronger competitiveness in classification accuracy, robustness and computational efficiency.
出处
《计算机应用》
CSCD
北大核心
2017年第3期871-875,共5页
journal of Computer Applications
基金
国家自然科学基金资助项目(U1304602
61473266
61305080)
河南省高等学校重点科研项目(15A120016)~~
关键词
偏最小二乘回归
K近邻
噪声样本
特征选择
鲁棒性
Partial Least Squares Regression (PLSR)
k Nearest Neighbor (kNN)
noise sample
feature selection
robust