期刊文献+

一个基于概率潜语义分析的多模态多媒体检索模型 被引量:5

Multimodal Multimedia Retrieval Model Based on Probabilistic Latent Semantic Analysis
下载PDF
导出
摘要 互联网上快速增长的多媒体信息往往包含几种不同的模态,并且在同一个多媒体文档中的这些不同形式的模态往往包含相似的含义.因此,最近多模态检索已经变成了多媒体检索领域的热点问题.提出一个基于概率潜语义分析的多模态检索模型用来完成多模态的检索.两个假设被提出:(1)同一个多媒体文档的不同模态是这个文档的多种表达方式,因此它们都表示相似的含义;(2)文本单词和图像特性是独立地被生成出来的.利用概率潜语义分析分别模拟训练集中文本和图像的生成过程并且通过期望最大化算法学习获得它们的潜在主题分布.利用多元线性回归方法分析文本表达和图像表达,并利用最小二乘法得到回归矩阵的估计.这个矩阵用于将文本和图像模态互相转换.实验表明了该方法的有效性. Nowadays,multimedia information that has explosively increased in the Internet usually consists of a variety of different modal contents and these multi-modal contents probably represent the similar senses. Thus recently the multimodal retrieval becomes the hotspot in the multimedia retrieval research. In this paper, we propose a multimodal multimedia retrieval modal based on probabilistic Latent Semantic analysis ( pLSA ) to achieve multi-modal retrieval. Two hypotheses are presented that ( 1 ) the different modal contents ( the text and image ) in one document are the representations of the different forms of this document so they represent the similar senses, and ( 2 ) the textual words and the visual features are respectively generated independently. We employ the generative model, pL- SA, to respectively simulate the generative processes of texts and images in the same documents in the training set and the topics of pLSA model are learned by EM method. Then we employ the multivariate linear regression method to analyze the correlation between representations of texts and images and use the ordinary least squares (OLS ) method to obtain the estimation of the regression matrix that can be used to transform between textual and visual modal data. Extensive experiments results demonstrate the effectiveness and efficiency of the proposed model.
出处 《小型微型计算机系统》 CSCD 北大核心 2015年第8期1665-1670,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61025007 61328202 61100024)资助 国家"九七三"重点基础研究发展计划项目(2011CB302200-G)资助 国家"八六三"高技术研究发展计划项目(2012AA011004)资助 中央高校基本科研业务费项目(N130504006)资助
关键词 多模态 多媒体 检索 概率潜语义分析 multimodal multimedia retrieval pLSA
  • 相关文献

参考文献2

二级参考文献24

  • 1[1]Bolt R A.The Human Interface [M] .California: Lifetime Learning Press,1984.
  • 2[2]Card S K,Moran T P.The Psychology of Human Computer Interaction [M] .Hillsdale,N J ( ed ): Lawrence Erlbaum,1983.
  • 3[3]Hartson H R.The UAN:A User-Oriented Representation for Direct Manipulation User Interfaces[J].ACM Trans.on Information Systems,1990,8(3): 181-203.
  • 4[4]Hauptmann A G,Mcavinney P.Gestures with Speech for Graphic Manipulation [J].Int.J,of Man- Machine Studies,1993,18 (2).
  • 5[5]Rui Y,Huang T S,Chang S F.Image Retrieval:Current Techniques,Promising Directions and Open Issues [J].J of Visual Communication and Image Representation,1999,10: 1-23.
  • 6[6]Chang S F,et al.A Fully Automated Content-based Video Search Engine Supporting Spatial-Temporal Queries[J].IEEE Trans.On Circuits & Sys.for Video Technology,1998,8(5 ).
  • 7[7]Deng Y N,Manjunath B S.Content-based Search of Video Using Color,Texture and Motion[J].Proc.of IEEE on IP,1997,2:534-537.CA.
  • 8[8]Zhang H J,Low C Y,Smoliar S W,et al.Video Paring,Retrieval and Browsing:An Integrated and Content-based Solution[J].ACM Multimedia,1995,15-24.
  • 9[9]Ren J C,Feng D G,Zhao R C,et al.A Self-Extensible Model for Content-based Video Retrieval [C] .Hong Kong: Int.Workshop MMWS2000,2000,259-262.
  • 10[10]Hauptmann A,et al.Informedia: News-on-Demand Multimedia Infformation Acquisition and Retrieval[M] .Intelligent Multimedia Retrieval.AAAI Press,1997.213-223.

共引文献10

同被引文献39

引证文献5

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部