期刊文献+

基于深度融合的显著性目标检测算法 被引量:34

Salient Object Detection Based on Deep Fusion of Hand-Crafted Features
下载PDF
导出
摘要 自然图像往往包含各种复杂的内容,基于单一特征的显著性检测算法很难从复杂场景中提取符合人类视觉的显著性目标.虽然多种显著图的融合能够弥补或者纠正单一特征带来的检测缺陷,但是不合理的显著图融合方式可能会进一步降低算法的检测性能.为了解决多种显著图的有效融合问题,作者提出了一种基于深度卷积神经网络的特征图深度融合模型.算法使用四种低层显著图作为网络的输入,采用前融合和后融合的双通道卷积网络学习图像的显著目标.前融合通道利用一个多层的全卷积网络生成对目标物体边缘敏感的显著图,后融合通道使用权重共享的浅层网络分别获得四种目标对象位置保持的高层语义显著图.两个通道的特征图再通过一个四层的全卷积网络进行优化,从而获得最终的显著图.在公开数据集上的大量实验证明了本文提出的显著图深度融合算法的有效性. Visual saliency detection is an important and fundamental research problem in neuroscience and psychology, which investigates the mechanism of human visual systems in selecting regions of interest from complex scenes. Recently it has also been a 5 active topic in computer vision, due to its applications to object detection, video summarization, image editing techniques, image retrieval, face detection, and fine - grained visual categorization. Saliency detection is commonly interpreted as a process that includes two stages:(1) detecting the most salient regions in accordance with human visual attention and (2) segmenting the accurate boundary of that regions. In general, saliency detection methods can be categorized as either bottom-up or top-down. The former focuses on stimulus-driven stage of attention which is of main interest in computer vision community. Contrasted with bottom-up methods, top-down approaches usually require supervised learning with manually labeled ground truth. However, natural images often contain a variety of complex content. Saliency detection method based single visual feature hardly extract salient object that are consistent with human visual system from complex scenes. Although the fusion of various saliency maps is able to compensate or correct the defects of single visual feature, the irrational fusion may further degrade the performance. In order to solve the problem of effective fusion of saliency maps, we propose a deep fusion model based on deep convolutional neural network. We use four hand-crafted feature maps, which include Local Contrast (LC), Global Contrast (GC), Spatial Variance (SV) and Center Variance (CV), as the input of the network and learn the saliency values in a dual-estimation process. The anterior fusion estimation is fed with fused maps to learn values with precise boundaries. The posterior fusion estimation is fed with feature map to learn the object conception saliency separately. Four feature maps are exploited dependent to detect salient region, and as well as six fusion schemes including HF, HF p , HF a , proposed in the paper, and CRF, CSVM and WA. The evaluations show that HF gains the highest performance, which shows the superiority of anterior and posterior. Therefore, HF is selected as the fusion scheme and the two features are concatenated and integrated into a jointly optimized network for final saliency detection. Extensive experiments are carried out on four benchmark datasets including ASD, PASCALS, ECSSD, HKU - IS. Two measurements including MAE and F -Score are calculated to evaluate the proposed method’s performance and the existing methods including HS, GMR, DSR, DRFI, LEGS, MDF, MCDL and ELD, among which, HS, GMR, DSR and DRFI are traditional methods based on low-level features, while LEGS, MDF, MCDL and ELD are based on deep learning. The PRCs of the methods are also used to illustrate their performances. The results show that the proposed method gain significant and consistent improvements over the representative deep learning framework based saliency detection methods. We believe that the proposed method’s success comes from the following factors.(1) Four hand-crafted feature maps come from different level features, which are complementary.(2) Deep network is efficient to locate the correlation among the feature maps.(3) Contrasted with shallow fusion model SVM and CRF, HF can fuse the different level feature maps and get the better performance.
作者 张冬明 靳国庆 代锋 袁庆升 包秀国 张勇东 ZHANG Dong-Ming;JIN Guo-Qing;DAI Feng;YUAN Qing-Sheng;BAO Xiu-Guo;ZHANG Yong-Dong(The National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing 100029;Intelligent Information Processing Key Lab, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190;School of Information Science and Technology, University of Science and Technology of China, Hefei 230026)
出处 《计算机学报》 EI CSCD 北大核心 2019年第9期2076-2086,共11页 Chinese Journal of Computers
基金 国家重点研发计划(2018YFB0804202) 国家自然科学基金项目(61672495,61771458,61525206)资助~~
关键词 显著目标检测 人工特征 深度融合 深度学习 显著图 salient object detection hand-crafted features deep fusion deep learning saliency map
  • 相关文献

参考文献1

二级参考文献13

  • 1Bourque E, Dudek G, Ciaravola P. Robotic sightseeing: A method for automatically creating virtual environments. In: Giralt G, ed.Proc. of the IEEE Conf. on Robotics and Automation. Leuven: IEEE Press, 1998. 3186~3191.
  • 2Kadir T, Brady M. Saliency, scale and image description. International Journal of Computer Vision, 2001,45(2):83-105.
  • 3Gesu VD, Valenti C, Strinati L. Local operators to detect regions of interest. Pattern Recognition Letters, 1997,18(11-13):1077-1081.
  • 4Wai WYK, Tsotsos JK. Directing attention to onset and offset of image events for eye-head movement control. In: Huang T, ed.Proc. of the Int'l Association for Pattern Recognition Workshop on Visual Behaviors. Seattle: IEEE Press, 1994. 79~84.
  • 5Stentiford FWM. An evolutionary programming approach to the simulation of visual attention. In: Kim JH, ed. Proc. of the IEEE Congress on Evolutionary Computation. Seoul: IEEE Press, 2001. 851-858.
  • 6Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1998,20(11):1254-1259.
  • 7Itti L, Koch C. Computational modeling of visual attention. Nature Reviews Neuroscience, 2001,2(3):194-230.
  • 8Itti L, Koch C. Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging,2001,10(1):161-169.
  • 9Yee H, Pattanaik SN, Greenberg DP. Spatiotemporal sensitivity and visual attention for efficient rendering of dynamic environments. ACM Trans. on Computer Graphics, 2001,20(1):39-65.
  • 10Boccignone G, Ferraro M, Caelli T. Generalized spatio-chromatic diffusion. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2002,24(10): 1298-1309.

共引文献52

同被引文献167

引证文献34

二级引证文献89

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部