摘要
增量学习是处理大规模动态流数据的重要技术,在机器学习领域得到广泛应用。已有众多学者将其与降维方法相结合得到增量式降维算法,其中增量典型相关分析(ICCA)是典型相关分析(CCA)的增量式改进版本,可有效处理多视图的高维数据流降维问题。由于ICCA每次只利用单对样本更新投影向量,每新增一对样本均需更新一次投影向量,导致该算法比较耗时。为了提高算法的效率,提出了块增量典型相关分析(CICCA)算法。该算法无需计算样本协方差矩阵,直接将数据流按批处理,每次利用新增的批样本信息对上一步投影向量进行修正更新,从而得到主投影向量。进一步,在投影向量的正交补空间中计算其他投影向量,进而将原始高维的多视图数据投影到低维空间。在人工数据集和真实数据集上的实验结果表明,该算法提取低维特征的分类性能与CCA、ICCA相当,但训练时间大幅度减少。
For the large-scale dynamic data stream,incremental learning is an effective and efficient technique and is widely used in machine learning.Incremental dimensionality reduction algorithms have been proposed by many scholars.As an improved canonical correlation analysis(CCA)method based on incremental learning,incremental canonical correlation analysis(ICCA)can effectively deal with the problem of dimensionality reduction of high dimensional multi-view data stream.However,there is a drawback in this approach that the projection vector must be updated once for each new sample,which consumes a lot of time on the issue of online learning.Aiming at this problem,chunk incremental canonical correlation analysis(CICCA)is proposed in this paper.It can avoid the calculation of sample covariance matrices and process batch data stream directly.The main projection vector is updated each time with the newly added batch sample information,which is used to revise and update the projection vector of the previous step.Further,the other projection vectors are calculated in the orthogonal complement space of the projection vector.Therefore,data can be got from low-dimensional spaces.Experimental results show that the classification performance of CICCA is comparable to CCA and ICCA,but the training time is greatly reduced on synthetic dataset and real dataset.
作者
潘玉
陈晓红
李舜酩
李纪永
PAN Yu;CHEN Xiaohong;LI Shunming;LI Jiyong(College of Science,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China;College of Energy and Power Engineering,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China;Sichuan Aerospace Zhongtian Power Equipment Co.,Ltd.,Chengdu 610100,China)
出处
《计算机科学与探索》
CSCD
北大核心
2022年第8期1809-1818,共10页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金(11971231,12111530001)
国家重点研发计划(2018YFB2003300)。
关键词
典型相关分析(CCA)
数据降维
增量学习
多视图分类
canonical correlation analysis(CCA)
dimensionality reduction
incremental learning
multi-view classification