摘要
社交图像包含两种模态的信息:视觉信息和社交标签信息.绝大部分跨模态学习领域的研究者,将其精力集中在多模态信息的共享特征空间学习上,从而往往忽略了各模态信息所独有的特征.在该文中将探究如何利用二者的共享信息以及独有信息进行跨模态的图像聚类.该文将共享特征空间的学习看作一个共轭词典学习问题(Coupled Dictionary Learning,CDL),通过一个L1,∞范数的正则项使各模态的词典稀疏化,这种结构化的稀疏性限制会使各模态独有的特征得以保留.除此之外,该文还提出了一个简单的语义相似度度量框架.借助一个包含丰富语义关系的信息库WordNet,该文通过度量标签间的概念距离(conceptual distance)与释义相似度(gloss similarity),为标签添加一定的语义关系,以度量样本间的语义相似度.通过实验证明该文"共享&独有"模式的跨模态学习的方法,相比其它只利用共享特征的方法,在聚类任务上表现更为出色.
With the growth of industrial demands, cross modal learning has gradually attracted more and more attention. Due to the popularity of social media websites, people can tag social images according to their social or cultural backgrounds, personal expertise and perception. With the exponential growth of tagged social images, it has become increasingly attractive to develop new algorithms for achieving more effective organization and summarization of large-scale social images. In general, social images contain two modalities of information: visual information and keyword information. Combining them may lead to a comprehensive description of the social images. However, most researchers on cross-modal learning focus attention on the shared latent space learning, and ignore the private information of each modality. In this paper, we leverage a novel approach to find a latent space in which the information is correctly factorized into shared and private parts. First, we consider the latent space learning as the coupled dictionary learning problem, which can generates homogeneous dictionaries for different modalities by associating and jointly updating their shared coefficients. Second, we add structured sparseness constraints on the dictionaries to allow a latent dimension to be associated with a single modality. Specifically, for each modality~ s dictionary matrix we add a LI,~ norm regularizer to encourage some dictionary entries to be zeroed-out. By imposing such structured sparseness constraints, some latent dimensions would be explained by one modality rather than by both the modalities only. We leverage an optimization method which optimizes the objective function with respect to the dictionary matrices and the shared coefficients matrix alternately. For the sub-problems involving dictionary matrices, we leverage an efficient optimization algorithm based on the composite gradient mapping method which has been proved to converge very fast. For the sub-problem of the shared coefficients matrix, a multiplicative update algorithm is used. In addition, it's important to extract sufficient semantic relations from a limited number of social keywords. To this end, basing on an extra lexical database (such as WordNet) that contains sufficient semantic relationships, we propose a framework for semantic similarity measurement. First, a common sense determination algorithm is used to detect the common sense for each keyword. Then, we compute the semantic similarities between social keywords through the measurement of conceptual distance and gloss similarity between the common senses. Finally, the image-level semantic similarities are computed to describe the semantic relations among the social images, which construct the semantic feature matrix feeding to the cross-modal learning algorithms tested in this paper. In the experiments, two real-world datasets were employed for quantitatively testing the performance of "shared&private" approach (S&P) on social image clustering task. In order to show the effectiveness of S&P, we compared it with four baseline methods, including the Canonical Correlation Analysis algorithm widely used in many cross-modal learning tasks as a workhorse tool. Through the experiments, we demonstrate that the S&P approach achieves better performance than the baselines. Besides, we also investigated the influence of S&P's parameters on its performance by varying one parameter at a time while fixing the other. This investigation can guide the practical applications in industries.
出处
《计算机学报》
EI
CSCD
北大核心
2018年第1期98-111,共14页
Chinese Journal of Computers
基金
国家自然科学基金(61379106)
山东省中青年科学家奖励基金(BS2010DX037)
山东省自然科学基金(ZR2009GL014
ZR2013FM036
ZR2015FM011)
浙江大学CAD&CG国家重点实验室开放课题(A1315)资助~~