期刊文献+

三维场景注视点渲染深度学习方法综述

Deep learning-based foveated rendering in 3D space:a review
原文传递
导出
摘要 在大型高分辨显示器和头戴式显式设备中实现实时、逼真的渲染仍然是计算机图形学面临的主要挑战之一。注视点渲染(foveated rendering)利用人类视觉系统的局限性,根据注视点调整图像渲染质量,从而在不损失用户感知质量的前提下大大提高渲染速度。随着深度学习方法在渲染领域的广泛应用,涌现出大量基于深度学习的注视点渲染新方法。本文从深度学习的角度对注视点渲染领域的最新方法进行综述。首先,概述了人类视觉感知的背景知识。接着,简要介绍了注视点渲染中最具代表性的非深度学习方法,包括自适应分辨率、几何简化、着色简化和硬件实现,并总结了这些方法的优缺点。随后,描述了文中用于评估深度学习不同方法所使用的评估准则,包括常用的注视点渲染图像的评估指标和注视点预测评估指标。接下来,将注视点渲染中的深度学习方法细分为超分辨率、降噪、补全、图像合成、注视点预测和图像应用,对它们进行详细概述和总结。最后,提出了深度学习方法目前面临的问题和挑战。通过对注视点渲染领域的深度学习方法的讨论,可以更详细地展示深度学习在注视点渲染中的研究前景和发展方向,对后续研究人员在选择研究方向和设计网络架构等方面都有一定的参考价值。 The widespread adoption of virtual reality(VR)and augmented reality technologies across various sectors,including healthcare,education,military,and entertainment,has propelled head-mounted displays with high resolution and wide fields of view into the forefront of display devices.However,attaining a satisfactory level of immersion and interactivity poses a primary challenge in the realm of VR,with latency potentially leading to user discomfort in the form of dizziness and nausea.Multiple studies have underscored the necessity of achieving a highly realistic VR experience while maintaining user comfort,entailing the elevation of the screen’s image refresh rate to 1800 Hz and keeping latency below 3~40 ms.Achieving real-time,photorealistic rendering at high resolution and low latency represents a formidable objective.Foveated rendering is an effective approach to address these issues by adjusting the rendering quality across the image based on gaze position,maintaining high quality in the fovea area while reducing quality in the periphery.This technique leads to substantial computational savings and improved rendering speed without a perceptible loss in visual quality.While previous reviews have examined technical approaches to foveated rendering,they focused more on categorizing the imple mentation techniques.A comprehensive review within the domain of machine learning still needs to be explored.With the ongoing advancements in machine learning within the rendering field,combining machine learning and foveated rendering is considered a promising research area,especially in postprocessing,where machine learning methods have great potential.Nonmachine learning methods inevitably introduce artifacts.By contrast,machine learning methods have a wide range of applications in the postprocessing domain of rendering to optimize and improve foveated rendering results and enhance the realism and immersion of foveated images in a manner unattainable through nonmachine learning approaches.Therefore,this work presents a comprehensive overview of foveated rendering from a machine-learning perspective.In this paper,we first provide an overview of the background knowledge of human visual perception,including aspects of the human visual system,contrast sensitivity functions,visual acuity models,and visual crowding.Subsequently,this paper briefly describes the most representative nonmachine learning methods for point-of-attention rendering,including adaptive resolution,geometric simplification,shading simplification,and hardware implementation,and summarizes these methods’features,advantages,and disadvantages.Additionally,we describe the criteria employed for method evaluation in this review,including evaluation metrics for foveated images and gaze-point prediction.Next,we subdivide machine learning methods into super-resolution,denoise,image reconstruction,image synthesis,gaze prediction,and image application.We provide a detailed summary of them in terms of four aspects:results quality,network speed,user experience,and the ability to handle objects.Among them,super-resolution methods commonly use more neural blocks in the foveal region while fewer neural blocks in the periphery region,resulting in variable regional super-resolution quality.Similarly,foveated denoising usually performs fine denoising in the fovea and coarse denoising in the peripheral,but the denoising aspect has yet to receive extensive attention.The initial attempt to integrate image reconstruction with gaze utilized generative adversarial networks(GANs),yielding promising outcomes.Then,some researchers combined direct prediction and kernel prediction for image reconstruction,which is also the state of the art in this field.Gaze prediction is a key development direction for future VR rendering,which is mostly combined with saliency detection to predict the location of the viewpoint.Substantial work remains in the field,but unfortunately,only a tiny portion of the work can be achieved in real time.Finally,we present the current problems and challenges machine learning methods face.Our review of machine learning approaches in foveated rendering not only elucidates the research prospects and developmental direction but also provides insights for future researchers in choosing research direction and designing network architectures.
作者 李英群 胡啸 徐翔 徐延宁 王璐 Li Yingqun;Hu Xiao;Xu Xiang;Xu Yanning;Wang Lu(School of Software,Shandong University,Jinan 250101,China;Shandong Key Laboratory of Blockchain Finance,Shandong University of Finance and Economics,Jinan 250014,China)
出处 《中国图象图形学报》 CSCD 北大核心 2024年第10期2955-2978,共24页 Journal of Image and Graphics
基金 国家重点研发计划资助(2022YFB3303203) 国家自然科学基金项目(62272275)。
关键词 注视点渲染 深度学习 实时渲染 注视点预测 图像补全 超分辨率 光路追踪降噪 foveated rendering deep learning real-time rendering eye fixations prediction image reconstruction super-resolution ray tracing denoising
  • 相关文献

参考文献6

二级参考文献29

共引文献258

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部