摘要
为了对网上多媒体信息进行有效检索和过滤,提出一种基于文本和图片相似性融合的联合聚类算法。首先通过相似性计算得到文本相似性和图片相似性,然后,将所得文本相似性矩阵和图片相似性矩阵进行水平拼接融合,经奇异值分解后,进行k-means联合聚类,使得聚类后的结果融合文本信息和图片信息。研究结果表明:与单一图像联合聚类方法相比,采用联合聚类算法所得每一簇的F-Measure值都有明显提高,与单一文本联合聚类在第1,2,3和7簇的F-Measure值也有所提高。
A similarity fusion algorithm about the text and image co-clustering of multimedia structured documents was given in order to perform multimedia retrieval and filter efficiently.This method fuses text similarity matrix and image similarity matrix to make a fusion similarity matrix and then it is co-clustered with k-means algorithm after eigenvector decomposition.This algorithm was tested on the task of multimedia structured documents which had two information sources,i.e.,text and image.The results show that the F-Measure value in all clusters obtained by the co-clustering algorithm based on structured Web document are larger than those obtained by a flat image co-clustering and the F-Measure value increases in the first,second,third,seventh cluster compared to those obtained by flat text co-clustering.
出处
《中南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2010年第5期1871-1876,共6页
Journal of Central South University:Science and Technology
基金
湖南省教育厅项目(09c647)
关键词
联合聚类
相似性融合
结构化文档
co-clustering
similarity fusion
structured document