摘要
最小二乘法(LS)分类器是一种基础但有效的分类器,尤其适用于解决大规模数据分类问题.LS方法需要求逆矩阵,这使得LS方法在处理高维数据问题时效率低下.为此,提出基于LS的并行化非线性方法(PNLS).通过随机地划分数据维,PNLS能够并行地计算局部模型参数,经过迭代优化,形成最终的全局解.PNLS方法具有三个特点:1)局部线性但全局非线性;2)避免求解大矩阵的逆,适合处理高维数据;3)通过并行计算,能够提高学习效率.另外,理论分析证明了PNLS方法的收敛性.本文进一步提出一种随机版本的PNLS方法,它在每次迭代过程中随机分割数据维以优化PNLS的性能.实验结果表明,与最小二乘法相比,本文提出的方法可以获得更好的预测精度和运行效率.
least squares( LS ) classifier is a basic but effective classifier, especially for solving large-scale data classification problems. LS method needs to invert the matrix whose size is determined by the dimensionality, which makes it inherently inefficient for dealing with high-dimensional data. In this paper, we propose a parallel nonlinear version of LS { PNLS ). Based on random dimensionality par- titioning, PNLS can obtain local model parameters in parallel. After an iterative optimization, PNI.~ forms the final global solution. At the same time, PNLS enjoys three properties: 1 ) PNLS is a locally linear but globally nonlinear method; 2) It can avoid inverting large matrix, which makes it suitable for high-dimensional data; and 3 ) It can calculate model parameters in parallel, which can improve learning efficiency. Besides, theoretical analysis proves the iterative PNLS method is convergent. In the paper, we also propose a ran- dom PNLS with randomly partitioned data in each iteration to optimize the performance of PNLS. Experimental results on text and im- age data demonstrate that the proposed methods can obtain better prediction accuracy and runtime efficiency than LS.
出处
《小型微型计算机系统》
CSCD
北大核心
2014年第3期579-583,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61250007
U1204610)资助
国家"八六三"高技术研究发展计划重点项目(2009AA012201)资助
中国博士后科学基金项目(2011M501189)资助
关键词
最小二乘法
并行
高维度
分类
least squares
parallel
high dimensionality
classification