期刊文献+

基于邻域像素注意力机制的光场深度估计方法

Depth Estimation Method of Light Field Based on Attention Mechanism of Neighborhood Pixel
原文传递
导出
摘要 通过发掘深度信息与子孔径图像邻域像素间的高度相关性,提出了一种基于邻域像素注意力机制的光场深度估计方法。首先根据光场图像的数据特性提出了一种邻域像素注意力机制,该注意力机制考虑了不同子孔径图像在同一邻域间的极几何关系,能够增强网络对遮挡像素的感知能力。其次基于注意力机制设计了一个光场子孔径图像序列特征提取模块,该模块通过三维卷积将相邻序列图像上的特征编码到特征图上,并通过注意力机制增强网络对光场图像极几何特征的学习能力。最后联合邻域像素注意力机制和特征提取模块设计了一个多分支的全卷积神经网络,该网络使用部分光场子孔径图像序列即可估计图像的深度特征。实验结果表明,所提方法在均方误差(MSE)和平均坏像素率(BP)指标上总体表现优于其他先进方法,同时得益于高效注意力机制的加入,与其他先进方法相比所提方法运行速度最快。 Objective Accurate acquisition of depth information has always been a research hotspot in computer vision.Traditional cameras can only capture light intensity information within a certain time period,losing other information such as the incident light angle helpful for depth estimation.The emergence of light field cameras provides a new solution for depth estimation.Compared to traditional cameras,light field cameras can capture four-dimensional light field information.Micro-lens array light field cameras also solve the problems of large camera array size and impracticality to carry.Therefore,employing light field cameras to estimate the depth of a scene has broad research prospects.However,in the existing research,there are problems such as inaccurate depth estimation,high computational complexity,and occlusions in multi-view scenarios.Occlusions have always been challenging in tasks of light field depthestimation.For scenes without occlusions,most existing methods can yield good depth estimation results,but this requires the pixels to satisfy the color consistency principle.When occluded pixels exist in the scene,this principle among different views is no longer satisfied.In such cases,the accuracy of the depth map obtained using existing methods will significantly decrease,with more errors in the occluded areas and edges.Thus,we propose a method to estimate light field depth based on the attention mechanism of neighborhood pixel.By exploiting the high correlation between depth information and neighboring pixels in sub-aperture images,the network performance in estimating the depth of light field images is improved.Methods First,after analyzing the characteristics of the sub-aperture image sequence,we utilize the correlation between the depth information of a pixel in the light field image and a limited neighborhood of surrounding pixels to propose a neighborhood pixel attention mechanism Mix Attention.This mechanism efficiently models the relationship between feature maps and depth by combining spatial and channel attention,thereby improving the estimation accuracy of light field depth and providing the network with a certain degree of occlusion robustness.Next,based on Mix Attention,a sequential image feature extraction module is proposed.It employs three-dimensional convolutions to encode the spatial and angular information contained in the sub-aperture image sequence into feature maps and adopts Mix Attention to adjust the weights.This module enhances the representation power of the network by incorporating both spatial and angular information.Finally,a multi-branch depth estimation network is proposed to take part of sub-aperture images of the light field as input and achieve fast end-to-end depth estimation for light field images of arbitrary input sizes.This network leverages the proposed attention mechanism and the sequential image feature extraction module to effectively estimate depth from the light field image.Overall,we propose a novel estimation approach for light field depth.By leveraging the correlation between neighboring pixels and incorporating attention mechanisms,this approach improves the depth estimation accuracy and enhances the network's ability to handle occlusions.The proposed network architecture enables efficient and robust depth estimation for light field images.Results and Discussions In quantitative analysis,mean square error(MSE)and bad pixel rate are chosen as evaluation metrics.The proposed method demonstrates stable performance,with an average bad pixel rate and MSE of 3.091%and 1.126,respectively(Tables 1 and 2).In most scenarios,the method achieves optimal(bold)or suboptimal(underlined)depth estimation results.The effectiveness of the proposed attention mechanism(Mix Attention)is further demonstrated by ablation experiments(Table 3).Qualitative analysis(Figs.7 and 8)reveals that the proposed method exhibits strong robustness in depth-discontinuous regions(hanging lamp in the Sideboard scene),high accuracy in texture-rich areas and depth-continuous regions(Cotton and Pyramids scenes),reduced prediction errors in areas with reflections(shoes on the floor in the Sideboard scene),and high smoothness at depth edges(edges in the Backgammon scene).Generally,the proposed method yields more desirable disparity estimation results.Experimental results indicate that the overall performance of the proposed network surpasses that of other algorithms.Therefore,the proposed method exhibits stable and superior performance in depth estimation,as indicated by the selected evaluation metrics,quantitative results,and qualitative analysis.Conclusions Aiming at the estimation task characteristics of light field depth and the features of light field data,we propose an attention mechanism of neighborhood pixel called Mix Attention.This mechanism captures the correlation between a pixel and its limited neighborhood pixel in the light field and depth features.By calculating the feature maps of the neighborhood,different feature maps in the network are selectively attended to improve the utilization efficiency of light field images.Additionally,by analyzing the pixel displacement between different sub-aperture images in the light field,a fast end-to-end multi-stream estimation network of light field depth is introduced to employ three-dimensional convolutional kernels to extract sequential image features.Tests on the New HCI light field dataset demonstrate that the proposed estimation network outperforms existing methods in three performance metrics,including 0.07 bad pixel rate,MSE,and computational time.It effectively enhances the depth prediction performance and exhibits robustness in occluded scenes such as Boxes.Ablation experiments show that the proposed mechanism fully exploits the correlation between neighboring pixels in different channels,improving the depth prediction performance in the estimation network of light field depth.However,the performance of the proposed method is unsatisfactory in regions lacking texture information.In the future,we will focus on techniques such as spatial pyramids to enhance the network's ability to extract multi-scale features,smooth the depth results in textureless regions,and further improve the depth estimation reliability.
作者 林曦 郭阳 赵永强 姚乃夫 Lin Xi;Guo Yang;Zhao Yongqiang;Yao Naifu(School of Automation,Northwestern Polytechnical University,Xi'an 710129,Shaanxi,China)
出处 《光学学报》 EI CAS CSCD 北大核心 2023年第21期217-228,共12页 Acta Optica Sinica
关键词 光场图像 深度估计 邻域像素 注意力机制 神经网络 light field image depth estimation neighborhood pixel attention mechanism neural network
  • 相关文献

参考文献4

二级参考文献15

共引文献42

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部