摘要
标记传播是使用最广泛的半监督分类方法之一。基于共识率的标记传播算法(Consensus Rate-based Label Propagation,CRLP)通过汇总多个聚类方法以合并数据各种属性得到的共识率来构造图。然而,CRLP算法与大多数基于图的半监督分类方法一样,在图中将每个标记样本视为同等重要,它们主要通过优化图的结构来提高算法的性能。事实上,样本不一定是均匀分布的,不同的样本在算法中的重要性也是不同的,并且CRLP算法容易受聚类数目和聚类方法的影响,对低维数据的适应性不足。针对这些问题,文中提出了一种基于加权样本和共识率的标记传播算法(Label Propagation Algorithm Based on Weighted Samples and Consensus-Rate,WSCRLP)。WSCRLP算法首先对数据集进行多次聚类,以探索样本的结构,并结合共识率和样本的局部信息构造图;然后为不同分布的标记样本分配不同的权重;最后基于构造的图和加权样本进行半监督分类。在真实数据集上的实验表明,WSCRLP算法对标记样本进行加权和构造图的方法可以显著提高分类准确率,在84%的实验中都优于对比方法。相比CRLP算法,WSCRLP算法不仅具有更好的性能,而且对输入参数具有鲁棒性。
Label Propagation is one of the most widely used semi-supervised classification methods.Consensus rate-based label propagation(CRLP)algorithm constructs the graph by summarizing multiple clustering solutions to incorporate various properties of the data.Like most graph-based semi-supervised classification method,CRLP focuses on optimizing the graph to improve the performance.In fact,samples are not always evenly distributed.The importance of different samples in the algorithm is diffe-rent.CRLP algorithm is easily affected by the numbers of clustering and the clustering methods,and it is not adaptable to low-dimensional data.To deal with these problems,a label propagation algorithm based on weighted samples and consensus-rate(WSCRLP)is proposed.WSCRLP firstly clusters the dataset multiple times to explore the structure of sample and combines the consensus-rate and the local information of the sample to construct a graph.Secondly,different weights are assigned to labeled samples with different distributions.Finally,semi-supervised classification is performed based on constructed graph and weighted samples.Experiments on real datasets show that the WSCRLP of weighting and constructing graphs on labeled samples can significantly improve classification accuracy,and is superior to other compared methods in 84%of the experiments.Compared with CRLP,WSCRLP not only has better performance,but also is robust to input parameters.
作者
储杰
张正军
汤鑫瑶
黄振生
CHU Jie;ZHANG Zheng-jun;TANG Xin-yao;HUANG Zhen-sheng(School of Science,Nanjing University of Science and Technology,Nanjing 210094,China)
出处
《计算机科学》
CSCD
北大核心
2021年第3期214-219,共6页
Computer Science
基金
全国统计科学研究重大项目(2018LD01)。
关键词
加权样本
共识率
标记传播
半监督分类
Weighted samples
Consensus-rate
Label propagation
Semi-supervised classification