期刊文献+

跨模态社交图像聚类 被引量:4

Cross-Modal Social Image Clustering
下载PDF
导出
摘要 社交图像包含两种模态的信息:视觉信息和社交标签信息.绝大部分跨模态学习领域的研究者,将其精力集中在多模态信息的共享特征空间学习上,从而往往忽略了各模态信息所独有的特征.在该文中将探究如何利用二者的共享信息以及独有信息进行跨模态的图像聚类.该文将共享特征空间的学习看作一个共轭词典学习问题(Coupled Dictionary Learning,CDL),通过一个L1,∞范数的正则项使各模态的词典稀疏化,这种结构化的稀疏性限制会使各模态独有的特征得以保留.除此之外,该文还提出了一个简单的语义相似度度量框架.借助一个包含丰富语义关系的信息库WordNet,该文通过度量标签间的概念距离(conceptual distance)与释义相似度(gloss similarity),为标签添加一定的语义关系,以度量样本间的语义相似度.通过实验证明该文"共享&独有"模式的跨模态学习的方法,相比其它只利用共享特征的方法,在聚类任务上表现更为出色. With the growth of industrial demands, cross modal learning has gradually attracted more and more attention. Due to the popularity of social media websites, people can tag social images according to their social or cultural backgrounds, personal expertise and perception. With the exponential growth of tagged social images, it has become increasingly attractive to develop new algorithms for achieving more effective organization and summarization of large-scale social images. In general, social images contain two modalities of information: visual information and keyword information. Combining them may lead to a comprehensive description of the social images. However, most researchers on cross-modal learning focus attention on the shared latent space learning, and ignore the private information of each modality. In this paper, we leverage a novel approach to find a latent space in which the information is correctly factorized into shared and private parts. First, we consider the latent space learning as the coupled dictionary learning problem, which can generates homogeneous dictionaries for different modalities by associating and jointly updating their shared coefficients. Second, we add structured sparseness constraints on the dictionaries to allow a latent dimension to be associated with a single modality. Specifically, for each modality~ s dictionary matrix we add a LI,~ norm regularizer to encourage some dictionary entries to be zeroed-out. By imposing such structured sparseness constraints, some latent dimensions would be explained by one modality rather than by both the modalities only. We leverage an optimization method which optimizes the objective function with respect to the dictionary matrices and the shared coefficients matrix alternately. For the sub-problems involving dictionary matrices, we leverage an efficient optimization algorithm based on the composite gradient mapping method which has been proved to converge very fast. For the sub-problem of the shared coefficients matrix, a multiplicative update algorithm is used. In addition, it's important to extract sufficient semantic relations from a limited number of social keywords. To this end, basing on an extra lexical database (such as WordNet) that contains sufficient semantic relationships, we propose a framework for semantic similarity measurement. First, a common sense determination algorithm is used to detect the common sense for each keyword. Then, we compute the semantic similarities between social keywords through the measurement of conceptual distance and gloss similarity between the common senses. Finally, the image-level semantic similarities are computed to describe the semantic relations among the social images, which construct the semantic feature matrix feeding to the cross-modal learning algorithms tested in this paper. In the experiments, two real-world datasets were employed for quantitatively testing the performance of "shared&private" approach (S&P) on social image clustering task. In order to show the effectiveness of S&P, we compared it with four baseline methods, including the Canonical Correlation Analysis algorithm widely used in many cross-modal learning tasks as a workhorse tool. Through the experiments, we demonstrate that the S&P approach achieves better performance than the baselines. Besides, we also investigated the influence of S&P's parameters on its performance by varying one parameter at a time while fixing the other. This investigation can guide the practical applications in industries.
出处 《计算机学报》 EI CSCD 北大核心 2018年第1期98-111,共14页 Chinese Journal of Computers
基金 国家自然科学基金(61379106) 山东省中青年科学家奖励基金(BS2010DX037) 山东省自然科学基金(ZR2009GL014 ZR2013FM036 ZR2015FM011) 浙江大学CAD&CG国家重点实验室开放课题(A1315)资助~~
关键词 跨模态学习 共轭词典学习 WORDNET 图像聚类 社交图像 语义相似度度量 cross-modal learning coupled dictionary learning WordNet image clustering social images semantic similarity measurement
  • 相关文献

同被引文献6

引证文献4

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部