摘要
典型相关分析(canonical correlation analysis,CCA)是寻找同一对象两组变量间线性相关性的一种常用的多元统计分析方法,其采用的欧氏距离度量方式导致了算法的非鲁棒性。核诱导的距离度量不仅在理论上被证明是鲁棒的,而且在(聚类)应用上获得了有效验证。将其进一步应用于CCA,发展出了核诱导距离度量的鲁棒CCA(CCA based on kernel-induced measure,KI-CCA)。该算法不仅克服了CCA非鲁棒的不足,而且使现有基于最大相关熵的鲁棒主成分分析(half-quadratic principal component analysis,HQ-PCA)成为特例,且具有非线性相关分析的能力。一方面,核的多样性使得KI-CCA也具有多样性,从而使其成为一般性的分析算法。另一方面,与CCA刻画上的相似性,使其求解可归结为广义特征值问题。在人工数据、多特征手写体数据库(multiple feature database,MFD)和人脸数据集(Yale、AR、ORL)上的实验验证了该算法的有效性。
Canonical correlation analysis (CCA) is a commonly used multivariate statistical analysis method which aims at searching for the linear correlation between the two sets of variables of the same object. And the Euclidean distance measure used in CCA results in robustness problem. Kernel-induced measure has been proved to be robust in theory, and has been successfully used in clustering. This paper develops a robust CCA based on kernel-induced measure (KI-CCA). It not only overcomes the shortcomings of CCA and some related algorithms which are not robust, but also makes the robust principal component analysis based on maximum entropy be a special case, and has the ability of nonlinear correlation analysis. Because of the diversity of kernel functions, KI-CCA is a general algorithm. The solution can be obtained by solving a generalized eigenvalue problem as CCA. Experiments on toy problem, multiple feature database (MFD) and face datasets (Yale, AR, ORL) demonstrate the effectiveness of KI-CCA.
出处
《计算机科学与探索》
CSCD
2012年第8期708-716,共9页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金No.61170151
南京航空航天大学研究基金No.NP2011030~~
关键词
典型相关分析(CCA)
核诱导
鲁棒性
广义特征值问题
canonical correlation analysis (CCA)
kernel-induced
robustness
generalized eigenvalue problem