摘要
压缩光场显示具有结构简单紧凑、显示空间分辨率高的优点,但求解压缩光场显示图案的迭代算法存在计算量大的问题。随着人工智能技术的发展,基于深度学习的图像生成算法也被应用到三维显示中。提出一种将计算机视觉中执行图像分割任务的U-Net作为优化压缩光场显示图案的网络模型。根据给定的观看角度生成几组经过数据增强的目标光场数据集作为U-Net的训练集;在U-Net收敛后,将训练完成的U-Net用于生成重建测试目标光场的显示图案。训练和测试结果表明,相比基于堆叠CNN和迭代算法的方法,所提出的基于U-Net的压缩光场显示图案生成方法具有重建质量更高、计算资源少的优势。
Objective 3D display technology is the entrance to the realistic-feeling metaverse for tabletop,portable,and near-eye electronic devices.True 3D displays are mainly divided into light field displays and holographic displays,among which light field displays can be further subdivided into integral-imaging displays,directional light field displays,and compressive light field displays.Compressive light field displays utilize the scattering characteristic of display panels and the correlation between viewpoint images of the 3D scene.The compressive light field display is a candidate for portable 3D display owing to its compact structure,moderate viewing angle,and high spatial resolution.However,computational resources of portable electronic devices are restricted to satisfy their duration demand.Meanwhile,iterative algorithms to solve the compressive light field display patterns have the problem of heavy computation,preventing compressive light field displays from being a practical solution to portable dynamic 3D displays.With the development of artificial intelligence technology,image generation algorithms based on deep learning are gradually applied to 3D displays.Deep neural networks can be trained to fit the iterative process.Additionally,fast display image synthesis could be realized with rapid forward propagation of artificial neural networks.Previously,researchers proposed a stacked CNN-based method to generate images for compressive light field displays.However,the stacked CNN-based method suffers from convergence and over-fitting problems.U-Net is initially employed for image segmentation in computed tomography to handle slicing data and output the organ’s cancer probability.The skip connection added in the U-Net architecture significantly improves its convergence compared with the stacked CNN model.Light field data are pretty similar to slicing data in computed tomography.Thus,we introduce U-Net as the network model for optimizing compressive light field display patterns for better convergence and generalization.Given a specific viewing angle,several augmented target light field datasets are generated as the training sets of U-Net.After the U-Net converges,the trained U-Net synthesizes the display patterns that reconstruct the target light field for testing.The training and testing results prove that compared to the stacked CNN-based method and iterative algorithms,the proposed U-Net-based pattern generation method for compressive light field displays features higher reconstruction quality and fewer computing resources.Methods An artificial neural network’s training procedure can be split into forward and backward propagation.The forward propagation includes the following steps.Firstly,the target light field for training is input into the network,display images are output,and then the light field is reconstructed by simulated perspective projection.The backward propagation is to update the network’s parameters with the loss function and regular terms.Meanwhile,the above procedure is repeated during every epoch and batch.When the training is finished,the target light field for testing is input into the network,and display images are synthesized.This is called the inference procedure.The datasets,network architecture,and hyper-parameters are carefully designed to fit the features of compressive light field displays.The datasets contain 1260 pairs of image blocks cropped from seven scenes.The ReLU function is set as the activation function of the U-Net model initialized uniformly with Kaiming Initialization.The loss function is the mean square error between the target and reconstructed light field and the regular term is the effective range of image pixel values.Results and Discussions Performances of the proposed U-Net-based method,the stacked CNN-based method,and iterative algorithms are compared fairly for multiplicative(Fig.8),additive(Fig.9),polarized(Fig.10),and hybrid(Fig.11)types of compressive light field displays.The training and testing results(Figs.17‒20)prove that the proposed method’s light field reconstruction quality is always 2 dB higher than that of stacked CNN-based method.The reason is that the U-Net-based method utilizes the value range of image pixels more effectively than the stacked CNN-based method.Additionally,for additive-type compressive light field displays,the proposed method takes less time to reach the same reconstruction quality than iterative algorithms(Fig.21).Conclusions To improve the image quality,uniformity,and computation performance of compressive light field displays,we apply an elaborate U-Net model to synthesize display images.The proposed method is compared with the stacked CNN method and iterative algorithms by simulating the perspective projection of display images with the same target light field as input.For the additive-type compressive light field display,the trained U-Net’s inference speed is much faster than the speed of iterative algorithm under the same reconstruction quality.However,the trained U-Net’s generalization performance still needs promotion for multiplicative and polarized-type compressive light field displays.Possible improvements are changing activation functions and increasing the network’s depth.
作者
高晨
谭小地
李海峰
刘旭
Gao Chen;Tan Xiaodi;Li Haifeng;Liu Xu(College of Photonic and Electronic Engineering,Fujian Normal University,Fuzhou 350117,Fujian,China;Fujian Provincial Key Laboratory of Photonics Technology,Fuzhou 350117,Fujian,China;Key Laboratory of Optoelectronic Science and Technology for Medicine of Ministry of Education,Fuzhou 350117,Fujian,China;Fujian Provincial Engineering Technology Research Center of Photoelectric Sensing Application,Fuzhou 350117,Fujian,China;Information Photonics Research Center,Fujian Normal University,Fuzhou 350117,Fujian,China;College of Optical Science and Engineering,Zhejiang University,Hangzhou 310027,Zhejiang,China)
出处
《光学学报》
EI
CAS
CSCD
北大核心
2024年第10期366-384,共19页
Acta Optica Sinica
基金
国家自然科学基金(U22A2080)
国家重点研发计划(2018YFA0701800)
福建省科技重大专项(2020HZ01012)。
关键词
物理光学
成像系统
压缩光场显示
光场渲染
深度学习
physical optics
imaging system
compressive light field display
light field rendering
deep learning