摘要
在场景识别中,为了在只有RGB图像的测试阶段也能利用深度图像与RGB图像所包含的互补信息,以深度图像为特权信息,提出了一种端到端可训练的深度神经网络模型,用以结合特权信息和注意力机制。在该模型中,以图像编码到特征解码再到图像编码为架构,建立了由RGB图像到深度图像再到深度图像高层语义特征的映射关系。通过注意力机制,将RGB图像高层语义特征与对应的深度图像高层语义特征进行融合,输入分类网络,最终得到预测结果。在测试时,只需要输入RGB图像,便可在该模型获取的深度图像特权信息的帮助下,提升场景识别的性能。大量实验结果表明:本文方法在SUN RGB-D和NYUD2两个场景识别数据库中分别取得了51.5%和65.4%的识别正确率,验证了所提方法的有效性。
In the scene recognition,in order to use the complementary information contained in the depth images and the RGB images in the test phase with only RGB images,this paper used the depth image as the privilege information,and proposed an end-to-end trainable deep neural network model to combine the privilege information and attention mechanism.In the proposed method,the image encoding,feature decoding and then image encoding were used as the framework to establish a mapping relationship from RGB images to depth images and to high-level semantic features of depth images.By using of the attention mechanism,the high-level semantic features of RGB images were fused with the corresponding high-level semantic features of the depth image.And these two features were fed into the classification network to make the final prediction.In the test phase,only RGB images would be used,and the performance of scene recognition could be improved with the help of privilege information extracted from depth image.Through a large number of experiments,the method in this paper achieved 51.5%in the SUN RGB-D scene identification database and 65.4%in NYUD2 database,which verified the effectiveness of the method in this paper.
作者
孙宁
王龙玉
刘佶鑫
韩光
SUN Ning;WANG Longyu;LIU Jixin;HAN Guang(Engineering Research Center of Wideband Wireless Communication Technology of Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China;School of Communication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China)
出处
《郑州大学学报(工学版)》
CAS
北大核心
2021年第1期42-49,共8页
Journal of Zhengzhou University(Engineering Science)
基金
国家自然科学基金资助项目(61471206,61871445)
江苏省优秀青年基金项目(BK20180088)。
关键词
场景识别
特权信息
注意力
卷积神经网络
scene recognition
privilege information
attention mechanism
convolutional neural network