摘要
传统的流形学习局部线性嵌入(locally linear embedding,LLE)算法通过欧氏距离来选择邻域,如果数据集选自多个类别,这种距离度量方法无法得到正确的邻域关系。本研究提出一种改进的局部线性嵌入(modifiedLLE,MLLE)算法,该算法通过改进距离矩阵,使得类间的距离大、类内的距离小,从而使得邻域的选择尽量在一个类中。将MLLE算法应用到中文文本分类中,结果表明:与传统的算法比较,MLLE在分类结果可视化效果和识别率等方面都有显著提高。
According to the euclidean distance, the original LLE (locally linear embedding) algorithm chooses the neighborhood. If the data was originated from multiple classes, the correct neighborhood relationship could not be ob- tained. In order to solve this problem, an improved MLLE ( modified LLE) was proposed. In MLLE algorithm, the distance matrix was modified, which could make the distance longger between classes and smaller within classes, and so could make the neighborhood in one class as far as possible. The test of Chinese text clustering showed that the MLLE algorithm could improve the clustering visualization and the recognition rate.
出处
《山东大学学报(工学版)》
CAS
北大核心
2012年第4期8-12,共5页
Journal of Shandong University(Engineering Science)
基金
国家自然科学基金资助项目(61070121)