摘要
提出了一种基于各向异性高斯核核惩罚的主成分分析的特征提取算法.该算法不同于传统的核主成分分析算法.在非线性数据降维中,传统的核主成分分析算法忽略了原始数据的无量纲化.此外,传统的核函数在各维度上主要由一个相同的核宽参数控制,该方法无法准确反映各维度不同特征的重要性,从而导致降维过程中准确率低下.为了解决上述问题,首先针对现原始数据的无量纲化问题,提出了一种均值化算法,使得原始数据的总方差贡献率有明显的提高.其次,引入了各向异性高斯核函数,该核函数每个维度拥有不同的核宽参数,各核宽参数能够准确地反映所在维度数据特征的重要性.再次,基于各向异性高斯核函数建立了核主成分分析的特征惩罚目标函数,以便用较少的特征表示原始数据,并反映每个主成分信息的重要性.最后,为了寻求最佳特征,引入梯度下降算法来更新特征惩罚目标函数中的核宽度和控制特征提取算法的迭代过程.为了验证所提出算法的有效性,各算法在UCI公开数据集上和KDDCUP99数据集上进行了比较.实验结果表明,所提基于各向异性高斯核核惩罚的主成分分析的特征提取算法比传统的主成分分析算法在9种公开的UCI公开数据集上准确率平均提高了4.49%.在KDDCUP99数据集上,所提基于各向异性高斯核核惩罚的主成分分析的特征提取算法比传统的主成分分析算法准确率提高了8%.
This study proposes a feature extraction algorithm based on the principal component analysis(PCA)of the anisotropic Gaussian kernel penalty which is different from the traditional kernel PCA algorithms.In the non-linear data dimensionality reduction,the nondimensionalization of raw data is ignored by the traditional kernel PCA algorithms.Meanwhile,the previous kernel function is mainly controlled by one identical kernel width parameter in each dimension,which cannot reflect the significance of different features in each dimension precisely,resulting in the low accuracy of dimensionality reduction process.To address the above issues,contraposing the current problem of nondimensionalization of raw data,an averaging algorithm is proposed in this study,which has shown sound performance in improving the variance contribution rate of the original data typically.Then,anisotropic Gaussian kernel function is introduced owing each dimension has different kernel width parameters which can critically reflect the importance of the dimension data features.In addition,the feature penalty function of kernel PCA is formulated based on the anisotropic Gaussian kernel function to represent the raw data with fewer features and reflect the importance of each principal component information.Furthermore,the gradient descent method is introduced to update the kernel width of feature penalty function and control the iterative process of the feature extraction algorithm.To verify the effectiveness of the proposed algorithm,several algorithms are compared on UCI public data sets and KDDCUP99 data sets,respectively.The experimental results show that the feature extraction algorithm of the PCA based on the anisotropic Gaussian kernel penalty is 4.49%higher on average than the previous PCA algorithms on UCI public data sets.The feature extraction algorithm of the PCA based on the anisotropic Gaussian kernel penalty is 8%higher on average than the previous PCA algorithms on KDDCUP99 data sets.
作者
刘俊
李威
陈蜀宇
徐光侠
LIU Jun;LI Wei;CHEN Shu-Yu;XU Guang-Xia(School of Software Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;School of Big Data&Software Engineering,Chongqing University,Chongqing 401331,China)
出处
《软件学报》
EI
CSCD
北大核心
2022年第12期4574-4589,共16页
Journal of Software
基金
国家自然科学基金(61772099,61772098)
重庆市自然科学基金(cstc2021jcyj-msxmX0530)
重庆市“三百”科技创新领军人才支持计划(CSTCCXLJRC201917)
重庆市创新创业示范团队培育计划(CSTC2017kjrc-cxcytd0063)。
关键词
各向异性高斯核
特征惩罚函数
主成分分析
梯度下降法
anisotropic Gaussian kernel
feature penalty function
principal component analysis(PCA)
gradient descent algorithm