摘要
主曲线是一种用于数据压缩和特征提取的有效方法,是对主成分分析的非线性推广。由于主曲线与主成分的密切联系,主曲线生成算法通常以第一主成分线做初始值。然而实验发现第一主成分未必是算法初始化的最佳选择。本文将以HS算法和多边形算法为例,就初始值的选取对生成主曲线的影响做出分析并通过实验得出结论:HS算法以原点作初值效果较好,多边形算法应根据数据点集的不同结构选择合适的初值。
Principal Curves are very useful approaches of feature extraction and data compression, they are nonlinear generalizations of the first linear principal component which can be thought of as 'optimal' linear 1-d summarization of the data. Up to now several algorithms for constructing principal curves have been proposed. They are all initialized by the first principal component for the close relation between principal curves and the principal components. Taking example for the HS and the polygonal line algorithms, this paper analyzes how the initial line affects the principal curves. We conclude the first principal component line is not always the best choice of the initialization step. The experiments show the HS algorithm will produce perfect results if it starts with the origin,and the polygonal line algorithm should choose proper initial value according to different global structure of the data. It proves that local optimization can not always lead to global optimization.
出处
《计算机科学》
CSCD
北大核心
2007年第2期227-229,共3页
Computer Science
关键词
主曲线
主成分
初始值
投影指标
Principal curves, Principal component, Initial value, Projection indices