摘要
流形学习已经成为机器学习与数据挖掘领域的一个重要的研究课题。目前的流形学习算法都假设所研究的高维数据存在于同一个流形上,并不能支持或者应用于大量存在的采样于多流形上的高维数据。本文针对等维度的独立多流形提出了DC-ISOMAP算法。该算法首先通过从采样密集点开始扩展切空间的方法将多流形准确分解为单个流形,并逐个计算其低维嵌入,然后基于各子流形间的内部位置关系将其低维嵌入组合起来,得到最终的嵌入结果。实验结果表明,该算法在人造数据和实际的人脸图像数据上都能有效地计算出高维数据的低维嵌入结果。
Manifold learning has become a hot issue in the field of machine learning and data mining. Its algorithms often assume that the data resides on a single manifold. And both the theories and algorithms are lacking when the data is supported on a mixture of manifolds. A new method, which is called DC-ISOMAP method, is proposed for the nonlinear dimensionality reduction of data lying on the separated multi-manifold with same intrinsic dimension. The main idea is first to decompose a given data set into several sub-manifolds by propagating the tangent subspace of the point with maximum sampling density to a separate sub-manifold, and then the low-dimensional embeddings of each sub-manifold is independently calculated. Finally the embeddings of all sub-manifolds are composed into their proper positions and orientations based on their inter-connections. Experimental results on synthetic data as well as real world images demonstrate that our approaches can construct an accurate low-dimensional representation of the data in an efficient manner.
出处
《新型工业化》
2013年第4期-,共12页
The Journal of New Industrialization
基金
高等学校博士学科点专项科研基金(20101401110002)
973计划前期研究专项(2011CB311805)
国家自然科学基金(71031006)的资助
关键词
机器学习
非线性维数约简
流形学习
独立多流形
切空间
DC-ISOMAP
Machine Learning
Nonlinear Dimensionality reduction
Manifold learning
Well-separated multi-manifold
Tangent space
DC-ISOMAP