摘要
近邻传播算法是一种快速有效的聚类方法.针对近邻传播算法在无先验知识条件下偏向参数选择的问题,使用Silhouette聚类有效性指标确定偏向参数.针对近邻传播算法在处理结构复杂或高维数据时,存在数据信息重叠的问题,提出将局部保持投影方法与近邻传播算法相结合的方法,在有效保留数据内部非线性结构的前提下,有效删除数据空间中的冗余信息.仿真结果验证了提出的算法优于传统的近邻传播算法.
Affinity propagation(AP)algorithm is a fast and effective clustering method.Compared with other traditional clustering algorithms,the AP algorithm treats each data point as the candidate of the representative point to avoid the clustering results limiting in the choice of initial representative point.At the same time,the algorithm does not need the symmetry of the similarity matrix generated in the dataset with high operation speed in dealing with large-scale multi class data.Hence,AP algorithm can effectively solve the problem of non Euclidean space and large sparse matrix calculation.Due to the great advantage of the AP algorithm in clustering,it is widely applied in pattern recognition,web mining,biomedical and multi target detection,and is becoming a necessary method of data analysis.In order to well determine bias parameter of AP algorithm without prior knowledge,a novel method called silhouette clustering validity index is utilized to determine the parameter in this paper.The problem of information overlap is the main drawback of AP algorithm in dealing with complex structure or high dimensional data for clustering.In order to resolve the above problem,we propose an approaching algorithm which combines the locality preserving projections(LPP)method and the AP algorithm.It deletes the redundant information in the data space under the condition of effectively keeping the data inner nonlinear structure.The experiment results verify its accuracy and effectiveness and shows that the performance of the proposed algorithm is better than the traditional AP algorithm.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2015年第4期741-748,共8页
Journal of Nanjing University(Natural Science)
基金
江苏省产学研联合创新资金―前瞻性联合研究项目(BY2013015-33)
江苏省自然科学基金(BK20131107)
关键词
近邻传播算法
局部保持投影
Silhouette指标
邻域选择
流形距离
affinity propagation(AP)
locality preserving projections(LPP)
silhouette index
neighborhood selection
manifold distance