期刊文献+

多媒体内容理解的研究现状与展望 被引量:32

Current Research Status and Prospects on Multimedia Content Understanding
下载PDF
导出
摘要 随着多媒体和网络技术的迅猛发展,海量的图像、视频、文本、音频等多媒体数据快速涌现.这些不同媒体的数据在形式上多源异构,语义上相互关联.认知科学研究表明,人脑生理组织结构决定了其对外界的感知和认知过程是跨越多种感官信息的融合处理.如何对不同媒体的数据进行语义分析和关联建模以实现多媒体内容理解,成为了一个研究和应用的关键问题,受到了学术界和工业界的广泛关注.选取了多媒体内容理解的5个最新热点研究方向:图像细分类与检索、视频分类与目标检测、跨媒体检索、视觉描述与生成、视觉问答,分别阐述了它们的基本概念、代表性方法、研究现状等,并进一步阐述了多媒体内容理解面临的重要挑战,同时给出未来的发展趋势,旨在帮助读者全面了解多媒体内容理解的研究现状,吸引更多研究人员投身相关研究并为他们提供技术参考,推动该领域的进一步发展. With the rapid development of multimedia and Internet technologies,a large amount of multimedia data has been rapidly emerging,such as image,video,text and audio.Data of different media types from multi-source is heterogeneous in the form but relevant in the semantic.As indicated in the research of cognitive science,the perception and cognition of the environment is through the fusion across different sensory organs of human,which is decided by the human brain s organization structure.Therefore,it has been a key challenge to perform data semantic analysis and correlation modeling across different media types,for achieving comprehensive multimedia content understanding,which has drawn wide interests of both academic and industrial areas.In this paper,the basic concepts,representative methods and research status of 5 latest highlighting research topics of multimedia content understanding are referred,including fine-grained image classification and retrieval,video classification and object detection,cross-media retrieval,visual description and generation,and visual question answering.This paper further presents the major challenges of multimedia content understanding,as well as gives the development trend in the future.The goal of this paper is to help readers get a comprehensive understanding on the research status of multimedia content understanding,draw more attention of researchers to relevant research topics,and provide the technical insights to promote further development of this area.
作者 彭宇新 綦金玮 黄鑫 Peng Yuxin;Qi Jinwei;and Huang Xin(Institute of Computer Science and Technology,Peking University,Beijing 100871)
出处 《计算机研究与发展》 EI CSCD 北大核心 2019年第1期183-208,共26页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61771025 61532005)~~
关键词 多媒体内容理解 图像细分类与检索 视频分类与目标检测 跨媒体检索 视觉描述与生成 视觉问答 multimedia content understanding fine-grained image classification and retrieval video classification and object detection cross-media retrieval visual description and generation visual question answering
  • 相关文献

参考文献2

二级参考文献26

  • 1Shotton J, Blake A, Cipolla R. Multiscale categorical object recognition using contour fragments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 30(7): 1270-1281.
  • 2Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T. Ro- bust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(3): 411-426.
  • 3Torralba A, Murphy K P, Freeman W T. Contextual mod- els for object detection using boosted random fields. In: Proceedings of the Neural Information Processing Systems. Vancouver. Canada: NIPS. 2004. 1401-1408.
  • 4Zhu L, Rao A B, Zhang A D. Theory of keyblock-based im- age retrieval. ACM Transactions on Information Systems, 2002, 20(2): 224-257.
  • 5Comaniciu D, Meer P. Mean shift: a robust approach to- ward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002,24(5): 603-619.
  • 6Gemert J C, Geusebroek J M, Veenman C J, Smeulders A W M. Kernel codebooks for scene categorization. In: Pro- ceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2008. 696-709.
  • 7Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene cat- egories. In: Proceedings of the IEEE Computer Vision and Pattern Recognition. New York, USA: IEEE, 2006. 2169-2178.
  • 8Belongie S, Malik J, Puzicha J. Shape matching and ob- ject recognition using shape contexts. IEEE Transactions on Pattern AnMysis and Machine Intelligence, 2002, 24(4): 509-522.
  • 9Loupias E, Sebe N, Bres S, Jolion J M. Wavelet-based salient points for image retrieval. In: Proceedings of the Interna- tional Conference on Image Processing. Vancouver, Canada: IEEE, 2000. 518-521.
  • 10Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal oF Computer Vision, 2004, 60(2): 91-110.

共引文献50

同被引文献197

引证文献32

二级引证文献87

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部