摘要
kNN算法作为一种简单、有效的分类算法,在文本分类中得到广泛的应用。但是在k值(通常是固定的)的选取问题上通常是人为设定。为此,本文引入了重构和局部保持投影(locality preserving projections,LPP)技术用于最近邻分类,使得k值的选取是由样本间的相关性和拓扑结构决定。该算法利用l1-范数稀疏编码方法使每个测试样本都由它的k(不固定)个最近邻样本来重构,同时通过LPP保持重构前后样本间的局部结构不变,不仅解决了k值的选取问题,并且避免了固定k值对分类的影响。实验结果表明,该方法的分类性能优于经典kNN算法。
As a simple and effective classification algorithm,kNN algorithm is widely used in text classification.However,the k value(usually fixed)is usually set by users.For this purpose,the reconstruction and locality preserving projections(LPP)technology is introduced into the nearest neighbor classification,which makes the selection of the kvalue to be determined by the correlation between the samples and the topology structure.The algorithm uses l1-norm sparse coding method to reconstruct the test sample by its k(not fixed)nearest neighbor samples and LPP keeps the local structure of the sample after the reconstruction,which not only solves the problem of choosing kvalue,but also avoids the influence of fixed k value on classification.Experimental results show that the classification performance of the proposed method is better than that of the classical kNN algorithm.
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2016年第1期52-58,共7页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家自然科学基金资助项目(61573270
61263035
61363009)
国家973计划项目(2013CB329404)
广西自然科学基金资助项目(2012GXNSFGA060004
2015GXNSFCB139011)
中国博士后科学基金资助项目(2015M570837)
广西多源信息挖掘与安全重点实验室开放基金资助项目(MIMS13-08)
关键词
KNN
保局投影
重构
稀疏编码
k-nearest neighbor
locality preserving projections
reconstruction
sparse coding