摘要
大量无类别标签的数据具有对分类有用的信息,有效地利用这些信息来提高分类精确度,是半监督分类研究的主要内容.提出了一种基于流形距离的半监督判别分析(semi-supervised discriminant analysis based on manifold distance,简称SSDA)算法,通过定义的流形距离,能够选择位于流形上的数据点的同类近邻点、异类近邻点以及全局近邻点,并依据流形距离定义数据点与其各近邻点之间的相似度,利用这种相似度度量构造算法的目标函数.通过在ORL,YALE人脸数据库上的实验表明,与现有算法相比,数据集通过该算法降维后,能够使基于距离的识别算法具有更高的分类精确度.同时,为了解决非线性降维问题,提出了Kernel SSDA,同样通过实验验证了算法的有效性.
Rich unlabeled data contains valuable information, which is useful for classification. Using information efficiently to improve the accuracy of classification is the major purpose of semi-supervised learning. This paper proposes a kind of semi-supervised classification approach called Semi-Supervised Discriminant Analysis that is based on Manifold Distance, SSDA. The intra-class neighbors, the inter-class neighbors, and the total neighbors of a selected point can be determined by the proposed manifold distance. The similarity between these neighbors and the point can be defined based on the manifold distance. The object function is defined using the similarity. As the experiments operated on the database ORL and YALE show, compared with the existing algorithms, the proposed algorithm can improve the accuracy of classified algorithms based on distance. When dealing with nonlinear dimensionality reduction problem, the Kemel SSDA (namely, kernel semi-supervised discriminant analysis based on manifold distance) is proposed. Also, the experimental results show the efficiency of this algorithm.
出处
《软件学报》
EI
CSCD
北大核心
2010年第10期2445-2453,共9页
Journal of Software
关键词
主成分分析
线性判别分析
流形距离
半监督判别分析
principal component analysis
linear discrimininat analysis
manifold distance
semi-supervised classfication