摘要
随着多媒体技术的发展,信息越来越多的以图片的形式出现。如何对海量的无标签图片进行聚类,是机器学习领域的热点问题。而图像聚类在人脸识别、手写数字识别等领域也有着重要的作用。由于图片数据通常以非负矩阵的形式存储,因此非负矩阵分解算法(NMF)在图像聚类领域得到了广泛的应用。但是NMF算法直接在数据的原始空间进行处理,这就导致NMF算法所得的图片标签易受到数据采集过程中含有的噪声等不利因素的影响。为了解决这些问题,提出了一种基于预处理的超图非负矩阵分解算法(Nonnegative Matrix Factorization with Hypergraph Based on Per-treatments,PHGNMF)。PHGNMF算法将预处理操作和超图的思想引入到NMF算法。在预处理的过程中,使用灰度处理来去除图片中不同光线条件所带来的影响,采用小波分析来提取图片的低时频子图,同时降低了算法所处理的矩阵维度。采取构建超图的方法来进一步保留对聚类结果有重要影响的数据局部结构。最后在5个主流数据集上的实验验证了PHGNMF算法相对于传统算法的有效性,结果显示聚类精度提升了2%~7%,标准互信息在部分数据集上提升了2%~5%。
With the development of the media technology,more information is stored as the pictures.It is a topic problem in the machine learning field that how to distribute the right label to lots of unsigned pictures.And the image clustering has wide application on the face recognition and the handwriting number recognition field.Because the pictures are always stored as nonnegative matrices,the nonnegative matrix factorization algorithm(NMF)plays an important role in the image clustering.But the disadvantage in NMF algorithm is that the algorithm processes the data in the original data space which may produce a terrible result when the data have errors.To address this problem,the proposed algorithm is the nonnegative matrix factorization algorithm with a hypergraph based on per-treatments(PHGNMF).The PHGNMF algorithm introduces the per-treatments and the hypergraph into the NMF algorithm.In the per-treatments,the algorithm uses the grayscale normalization to eliminate the influence of the different illuminations firstly and then the algorithm can extract the low-frequency information of the pictures by the wavelet analysis.The wavelet procession could also reduce the dimensions of the data.The algorithm constructs a hypergraph for the data to save the neighboring information which has an important influence in the clustering procession.At last the results in five fundamental data sets confirm the effectiveness of the algorithm compared with fundamental algorithms.The results show the increase of accuracy is 2%~7%and the increase of normalized mutual information on some data sets is 2%~5%.
作者
李向利
贾梦雪
LI Xiang-li;JIA Meng-xue(School of Mathematics&Computing Science,Guilin University of Electronic Technology,Guilin,Guangxi541004,China;Guangxi Key Laboratory of Cryptography and Information Security,Guilin,Guangxi 541004,China;Guangxi Key Laboratory of Automatic Testing Technology and Instrument,Guilin,Guangxi 541004,China;Guangxi University Key Laboratory of Data Analysis and Calculation,Guilin,Guangxi 541004,China)
出处
《计算机科学》
CSCD
北大核心
2020年第7期71-77,共7页
Computer Science
基金
国家自然科学基金(11961010,61967004)
广西自然科学基金(2018GXNSFAA138169)
广西密码学与信息安全重点实验室研究课题(GCIS201708)
广西自动检测技术与仪器重点实验室基金(YQ19111)
桂林电子科技大学研究生教育创新计划资助项目(2020YCXS087).
关键词
图像聚类
非负矩阵分解
灰度处理
小波分析
超图
Image clustering
Nonnegative matrix factorization
Grayscale normalization
Wavelet analysis
Hypergraph