期刊文献+

融合语义知识的深度表达学习及在视觉理解中的应用 被引量:4

The Semantic Knowledge Embedded Deep Representation Learning and Its Applications on Visual Understanding
下载PDF
导出
摘要 近几年来,随着深度学习技术的日趋完善,传统的计算机视觉任务得到了前所未有的发展.如何将传统视觉研究中的领域知识融入到深度模型中提升深度模型的视觉表达能力,从而应对更为复杂的视觉任务,成为了学术界广泛关注的问题.鉴于此,以融合了语义知识的深度表达学习为主线展开了一系列研究.取得的主要创新成果包括3个方面:1)研究了将单类型的语义信息(类别相似性)融入到深度特征的学习中,提出了嵌入正则化语义关联的深度Hash学习方法,并将其应用于图像的相似性比对与检索问题中,取得了较大的性能提升;2)研究了将多类型信息(多重上下文信息)融入到深度特征的学习中,提出了基于长短期记忆神经网络的场景上下文学习方法,并将其应用于复杂场景的几何属性分析问题中;3)研究了将视觉数据的结构化语义配置融入到深度表达的学习中,提出了融合语法知识的表达学习方法,并将其应用到复杂场景下的通用内容解析问题中.相关的实验结果表明:该方法能有效地对场景的结构化配置进行预测. With the rapid development of deep learning technique and large scale visual datasets,thetraditional computer vision tasks have achieved unprecedented i m p r o v e m e n t.In order to handle m o r eand m o r e complex vision tasks,h o w to integrate the d o main knowl e d g e into the deep neural networkand enhance the ability of deep mod e l to represent the visual pattern,has b e c o m e a widely discussedtopic in both academia and industry.This thesis engages in exploring effective deep models to combinethe semantic k n o w ledge and feature learning.T h e m a i n contributions can be s ummarized as follows:1)W e integrate the semantic similarity of visual data into the deep feature learning process,andpropose a deep similarity comparison mod e l n a m e d bit-scalable deep hashing to address the issue ofvisual similarity comparison.T h e m odel in this thesis has achieved great performance on imagesearching and people’s identification.2)W e also propose a high-order graph L S T M(H G-L S T M)networks to solve the problem of geometric attribute analysis,which realizes the process ofintegrating the multi semantic context into the feature learning process.O u r extensive experimentss h o w that our m odel is capable of predicting rich scene geometric attributes and outperforming severalstate-of-the-art m e t h o d s by large margins.3)W e integrate the structured semantic information ofvisual data into the feature learning process,and propose a novel deep architecture to investigate afundamental problem of scene understanding:h o w to parse a scene image into a structuredconfiguration.Extensive experiments s h o w that our m odel is capable of producing meaningful andstructured scene configurations?and achieving m o r e favorable scene labeling result on t w o challengingdatasets compa r e d with other state-of-the-art weakly-supervised deep learning m e t h o d s.
作者 张瑞茂 彭杰锋 吴恙 林倞 Zhang Ruimao;Peng Jiefeng;Wu Yang;Lin Liang(School of Data and Computer Science,Sun Yat-sen University,Guangzhou 510006)
出处 《计算机研究与发展》 EI CSCD 北大核心 2017年第6期1251-1266,共16页 Journal of Computer Research and Development
基金 国家自然科学基金优秀青年科学基金项目(6162200366)
关键词 深度学习 神经网络 语义嵌入 场景解析 相似性检索 deep learning neural networks seman embedding scene parsingsimilarity search
  • 相关文献

参考文献9

二级参考文献163

  • 1史忠植.高级人工智能[M].北京:科学出版社,1997.60-100.
  • 2Marr D.视觉计算理论[M].北京:科学出版社,1988.51-80.
  • 3Amoid W. M. , Marce W. , Simone S. et al.. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(12): 1349-1379.
  • 4Buxton H. , Mukerjee A.. Conceptualizing images. Image and Vision Computing, 2000, 18(2): 79.
  • 5Hermes T, et al.. Image retrieval for information systems in storage and retrieval for image and videoDatabases Ⅲ. In:Proceedings of SPIE 2420, San Jose, CA, 1995.
  • 6Mojsilovic A. et al.. Matching and retrieval based on the vocabulary and grammar of color patterns. IEEE Transactions on Image Processing, 2000, 9(1):189-194.
  • 7Zhuang Y. , Mehrotra S. , Huang T. S.. A multimedia information retrieval model based on semantic and visual content.In: Proceedings of the 5th International ICYCS Conference,Nanjing, China, 1999, 468-475.
  • 8Colombo C. et al.. Semantics in visual information retrieval.IEEE Multimedia, 1999, 6(3):38-53.
  • 9Cavazza M. , Green R. J. , Palmer I.J.. Multimedia semantic features and image content description. In: Proceedings of the 1998 Multimedia Modeling, Lausanne, Switzerland, 1998,39-46.
  • 10Biederman I.. Aspects and extensions of a theory of human image understanding. In: Pylyshyn Z. W. ed. Computational Processes in Human Vision: An Interdisciplinary Perspective.Norwood, NJ:Ablex, 1988, 370-428.

共引文献345

同被引文献16

引证文献4

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部