The authors propose a novel method for transporting multi-view videos that aims to keep the bandwidth requirements on both end-users and servers as low as possible. The method is based on application layer multicast, ...The authors propose a novel method for transporting multi-view videos that aims to keep the bandwidth requirements on both end-users and servers as low as possible. The method is based on application layer multicast, where each end point re- ceives only a selected number of views required for rendering video from its current viewpoint at any given time. The set of selected videos changes in real time as the user’s viewpoint changes because of head or eye movements. Techniques for reducing the black-outs during fast viewpoint changes were investigated. The performance of the approach was studied through network experiments.展开更多
This paper analyzes the technical characteristic of three-dimensional display technology (3DTV) system and some core technologies yet to be solved. It points out the ways to solve these problems and presents an effe...This paper analyzes the technical characteristic of three-dimensional display technology (3DTV) system and some core technologies yet to be solved. It points out the ways to solve these problems and presents an effective solution for thediscomfort of watching the three-dimensional TV.展开更多
3D human pose estimation is a major focus area in the field of computer vision,which plays an important role in practical applications.This article summarizes the framework and research progress related to the estimat...3D human pose estimation is a major focus area in the field of computer vision,which plays an important role in practical applications.This article summarizes the framework and research progress related to the estimation of monocular RGB images and videos.An overall perspective ofmethods integrated with deep learning is introduced.Novel image-based and video-based inputs are proposed as the analysis framework.From this viewpoint,common problems are discussed.The diversity of human postures usually leads to problems such as occlusion and ambiguity,and the lack of training datasets often results in poor generalization ability of the model.Regression methods are crucial for solving such problems.Considering image-based input,the multi-view method is commonly used to solve occlusion problems.Here,the multi-view method is analyzed comprehensively.By referring to video-based input,the human prior knowledge of restricted motion is used to predict human postures.In addition,structural constraints are widely used as prior knowledge.Furthermore,weakly supervised learningmethods are studied and discussed for these two types of inputs to improve the model generalization ability.The problem of insufficient training datasets must also be considered,especially because 3D datasets are usually biased and limited.Finally,emerging and popular datasets and evaluation indicators are discussed.The characteristics of the datasets and the relationships of the indicators are explained and highlighted.Thus,this article can be useful and instructive for researchers who are lacking in experience and find this field confusing.In addition,by providing an overview of 3D human pose estimation,this article sorts and refines recent studies on 3D human pose estimation.It describes kernel problems and common useful methods,and discusses the scope for further research.展开更多
Vision-based measurement technology benefits high-quality manufacturers through improved dimensional precision,enhanced geo-metric tolerance,and increased product yield.The monocular 3D structured light visual sensing...Vision-based measurement technology benefits high-quality manufacturers through improved dimensional precision,enhanced geo-metric tolerance,and increased product yield.The monocular 3D structured light visual sensing method is popular for detecting online parts since it can reach micron-meter depth accuracy.However,the line-of-sight requirement of a single viewpoint vision system often fails when hiding occurs due to the object’s surface structure,such as edges,slopes,and holes.To address this issue,a multi-view 3D structured light vi-sion system is proposed in this paper to achieve high accuracy,i.e.,Z-direction repeatability,and reduce hiding probability during mechani-cal dimension measurement.The main contribution of this paper includes the use of industrial cameras with high resolution and high frame rates to achieve high-precision 3D reconstruction.Moreover,a multi-wavelength(heterodyne)phase expansion method is employed for high-precision phase calculation.By leveraging multiple industrial cameras,the system overcomes field of view occlusions,thereby broadening the 3D reconstruction field of view.Finally,the system achieves a Z-axis repetition accuracy of 0.48µm.展开更多
Mastering quality of experience (QoE) is key to the widespread adoption of stereoscopic 3DTV (S-3DTV). However, assessing QoE of S-3DTV is not straightforward. Methods for determining observer experience need to b...Mastering quality of experience (QoE) is key to the widespread adoption of stereoscopic 3DTV (S-3DTV). However, assessing QoE of S-3DTV is not straightforward. Methods for determining observer experience need to be clearly defined and sufficiently robust. In this paper, we present state-of-the-art subjective QoE assessment for S-3DTV. We present conventional stan- dardized ITU recommendations for evaluating picture quality and discuss new ITU activities in the area of S-3DTV assess- ment. We also present and discuss explorative studies from the literature. We then introduce ways of using conventional quality assessment for S-3DTV QoE assessment. In discussing our pro- posal, we mainly focus on QoE indicators and common features of subjective assessment. Multidimensional QoE indicators need to be used in S-3DTV to highlight advantages and reveal problems. In the second part of our proposal, we discuss the re- quirements for adapting ITU-R BT.500, a conventional subjec- tive QoE assessment method, ITU-R BT.500, for assessing QoE of S-3DTV are presented.展开更多
基金Project (No. 511568) supported by the European Commissionwithin Framework Program 6 with the acronym 3DTV
文摘The authors propose a novel method for transporting multi-view videos that aims to keep the bandwidth requirements on both end-users and servers as low as possible. The method is based on application layer multicast, where each end point re- ceives only a selected number of views required for rendering video from its current viewpoint at any given time. The set of selected videos changes in real time as the user’s viewpoint changes because of head or eye movements. Techniques for reducing the black-outs during fast viewpoint changes were investigated. The performance of the approach was studied through network experiments.
基金supported by the National Natural Science Foundation of China(Grant No.60832003)the Science and Technology Commission of Shanghai Municipality(Grant No.10510500500)the Key Laboratory of Advanced Display and System Applications(Shanghai University),Ministry of Education,China(Grant No.P200801)
文摘This paper analyzes the technical characteristic of three-dimensional display technology (3DTV) system and some core technologies yet to be solved. It points out the ways to solve these problems and presents an effective solution for thediscomfort of watching the three-dimensional TV.
基金supported by the Program of Entrepreneurship and Innovation Ph.D.in Jiangsu Province(JSSCBS20211175)the School Ph.D.Talent Funding(Z301B2055)the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(21KJB520002).
文摘3D human pose estimation is a major focus area in the field of computer vision,which plays an important role in practical applications.This article summarizes the framework and research progress related to the estimation of monocular RGB images and videos.An overall perspective ofmethods integrated with deep learning is introduced.Novel image-based and video-based inputs are proposed as the analysis framework.From this viewpoint,common problems are discussed.The diversity of human postures usually leads to problems such as occlusion and ambiguity,and the lack of training datasets often results in poor generalization ability of the model.Regression methods are crucial for solving such problems.Considering image-based input,the multi-view method is commonly used to solve occlusion problems.Here,the multi-view method is analyzed comprehensively.By referring to video-based input,the human prior knowledge of restricted motion is used to predict human postures.In addition,structural constraints are widely used as prior knowledge.Furthermore,weakly supervised learningmethods are studied and discussed for these two types of inputs to improve the model generalization ability.The problem of insufficient training datasets must also be considered,especially because 3D datasets are usually biased and limited.Finally,emerging and popular datasets and evaluation indicators are discussed.The characteristics of the datasets and the relationships of the indicators are explained and highlighted.Thus,this article can be useful and instructive for researchers who are lacking in experience and find this field confusing.In addition,by providing an overview of 3D human pose estimation,this article sorts and refines recent studies on 3D human pose estimation.It describes kernel problems and common useful methods,and discusses the scope for further research.
基金supported by the 2023 Guangdong Basic and Applied Basic Research Fund Regional Joint Fund Key Project under Grant No. 2023B15151200172023 Key Project of Guangdong Provincial Department of Education for General Universities under Grant No. 2023ZDZX3024ZTE Industry-University-Institute Cooperation Funds under Grant No. K2133Z167
文摘Vision-based measurement technology benefits high-quality manufacturers through improved dimensional precision,enhanced geo-metric tolerance,and increased product yield.The monocular 3D structured light visual sensing method is popular for detecting online parts since it can reach micron-meter depth accuracy.However,the line-of-sight requirement of a single viewpoint vision system often fails when hiding occurs due to the object’s surface structure,such as edges,slopes,and holes.To address this issue,a multi-view 3D structured light vi-sion system is proposed in this paper to achieve high accuracy,i.e.,Z-direction repeatability,and reduce hiding probability during mechani-cal dimension measurement.The main contribution of this paper includes the use of industrial cameras with high resolution and high frame rates to achieve high-precision 3D reconstruction.Moreover,a multi-wavelength(heterodyne)phase expansion method is employed for high-precision phase calculation.By leveraging multiple industrial cameras,the system overcomes field of view occlusions,thereby broadening the 3D reconstruction field of view.Finally,the system achieves a Z-axis repetition accuracy of 0.48µm.
文摘Mastering quality of experience (QoE) is key to the widespread adoption of stereoscopic 3DTV (S-3DTV). However, assessing QoE of S-3DTV is not straightforward. Methods for determining observer experience need to be clearly defined and sufficiently robust. In this paper, we present state-of-the-art subjective QoE assessment for S-3DTV. We present conventional stan- dardized ITU recommendations for evaluating picture quality and discuss new ITU activities in the area of S-3DTV assess- ment. We also present and discuss explorative studies from the literature. We then introduce ways of using conventional quality assessment for S-3DTV QoE assessment. In discussing our pro- posal, we mainly focus on QoE indicators and common features of subjective assessment. Multidimensional QoE indicators need to be used in S-3DTV to highlight advantages and reveal problems. In the second part of our proposal, we discuss the re- quirements for adapting ITU-R BT.500, a conventional subjec- tive QoE assessment method, ITU-R BT.500, for assessing QoE of S-3DTV are presented.