摘要
近几年来,随着深度学习技术的日趋完善,传统的计算机视觉任务得到了前所未有的发展.如何将传统视觉研究中的领域知识融入到深度模型中提升深度模型的视觉表达能力,从而应对更为复杂的视觉任务,成为了学术界广泛关注的问题.鉴于此,以融合了语义知识的深度表达学习为主线展开了一系列研究.取得的主要创新成果包括3个方面:1)研究了将单类型的语义信息(类别相似性)融入到深度特征的学习中,提出了嵌入正则化语义关联的深度Hash学习方法,并将其应用于图像的相似性比对与检索问题中,取得了较大的性能提升;2)研究了将多类型信息(多重上下文信息)融入到深度特征的学习中,提出了基于长短期记忆神经网络的场景上下文学习方法,并将其应用于复杂场景的几何属性分析问题中;3)研究了将视觉数据的结构化语义配置融入到深度表达的学习中,提出了融合语法知识的表达学习方法,并将其应用到复杂场景下的通用内容解析问题中.相关的实验结果表明:该方法能有效地对场景的结构化配置进行预测.
With the rapid development of deep learning technique and large scale visual datasets,thetraditional computer vision tasks have achieved unprecedented i m p r o v e m e n t.In order to handle m o r eand m o r e complex vision tasks,h o w to integrate the d o main knowl e d g e into the deep neural networkand enhance the ability of deep mod e l to represent the visual pattern,has b e c o m e a widely discussedtopic in both academia and industry.This thesis engages in exploring effective deep models to combinethe semantic k n o w ledge and feature learning.T h e m a i n contributions can be s ummarized as follows:1)W e integrate the semantic similarity of visual data into the deep feature learning process,andpropose a deep similarity comparison mod e l n a m e d bit-scalable deep hashing to address the issue ofvisual similarity comparison.T h e m odel in this thesis has achieved great performance on imagesearching and people’s identification.2)W e also propose a high-order graph L S T M(H G-L S T M)networks to solve the problem of geometric attribute analysis,which realizes the process ofintegrating the multi semantic context into the feature learning process.O u r extensive experimentss h o w that our m odel is capable of predicting rich scene geometric attributes and outperforming severalstate-of-the-art m e t h o d s by large margins.3)W e integrate the structured semantic information ofvisual data into the feature learning process,and propose a novel deep architecture to investigate afundamental problem of scene understanding:h o w to parse a scene image into a structuredconfiguration.Extensive experiments s h o w that our m odel is capable of producing meaningful andstructured scene configurations?and achieving m o r e favorable scene labeling result on t w o challengingdatasets compa r e d with other state-of-the-art weakly-supervised deep learning m e t h o d s.
作者
张瑞茂
彭杰锋
吴恙
林倞
Zhang Ruimao;Peng Jiefeng;Wu Yang;Lin Liang(School of Data and Computer Science,Sun Yat-sen University,Guangzhou 510006)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2017年第6期1251-1266,共16页
Journal of Computer Research and Development
基金
国家自然科学基金优秀青年科学基金项目(6162200366)
关键词
深度学习
神经网络
语义嵌入
场景解析
相似性检索
deep learning
neural networks
seman embedding
scene parsingsimilarity search