Based on an auditory model, the zero-crossings with maximal Teager energy operator (ZCMT) feature extraction approach was described, and then applied to speech and emotion recognition. Three kinds of experiments were ...Based on an auditory model, the zero-crossings with maximal Teager energy operator (ZCMT) feature extraction approach was described, and then applied to speech and emotion recognition. Three kinds of experiments were carried out. The first kind consists of isolated word recognition experiments in neutral (non-emotional) speech. The results show that the ZCMT approach effectively improves the recognition accuracy by 3.47% in average compared with the Teager energy operator (TEO). Thus, ZCMT feature can be considered as a noise-robust feature for speech recognition. The second kind consists of mono-lingual emotion recognition experiments by using the Taiyuan University of Technology (TYUT) and the Berlin databases. As the average recognition rate of ZCMT approach is 82.19%, the results indicate that the ZCMT features can characterize speech emotions in an effective way. The third kind consists of cross-lingual experiments with three languages. As the accuracy of ZCMT approach only reduced by 1.45%, the results indicate that the ZCMT features can characterize emotions in a language independent way.展开更多
The human mind’s evolution owes much to its companion phenomena of intelligence, sapience, wisdom, awareness and consciousness. In this paper we take the concepts of intelligence and sa-pience as the starting point o...The human mind’s evolution owes much to its companion phenomena of intelligence, sapience, wisdom, awareness and consciousness. In this paper we take the concepts of intelligence and sa-pience as the starting point of a route towards elucidation of the conscious mind. There is much disagreement and confusion associated with the word intelligence. A lot of this results from its use in diverse contexts, where it is called upon to represent different ideas and to justify different ar-guments. Addition of the word sapience to the mix merely complicates matters, unless we can relate both of these words to different concepts in a way which acceptably crosses contextual boundaries. We have established a connection between information processing and processor “architecture” which provides just such a linguistic separation, and which is applicable in either a computational or conceptual form to any context. This paper reports the argumentation leading up to a distinction between intelligence and sapience, and relates this distinction to human “cognitive” activities. Information is always contextual. Information processing in a system always takes place between “architectural” scales: intelligence is the “tool” which permits an “overview” of the relevance of individual items of information. System unity presumes a degree of coherence across all the scales of a system: sapience is the “tool” which permits an evaluation of the relevance of both individual items and individual scales of information to a common purpose. This hyperscalar coherence is created through mutual inter-scalar observation, whose recursive nature generates the independence of high-level consciousness, making humans human. We conclude that intelligence and sapience are distinct and necessary properties of all information processing systems, and that the degree of their availability controls a system’s or a human’s cognitive capacity, if not its appli-cation. This establishes intelligence and sapience as prime ancestors of the conscious mind. How-ever, to our knowledge, there is no current mathematical approach which can satisfactorily deal with the native irrationalities of information integration across multiple scales, and therefore of formally modeling the mind.展开更多
Multiview video can provide more immersive perception than traditional single 2-D video. It enables both interactive free navigation applications as well as high-end autostereoscopic displays on which multiple users c...Multiview video can provide more immersive perception than traditional single 2-D video. It enables both interactive free navigation applications as well as high-end autostereoscopic displays on which multiple users can perceive genuine 3-D content without glasses. The multiview format also comprises much more visual information than classical 2-D or stereo 3-D content, which makes it possible to perform various interesting editing operations both on pixel-level and object-level. This survey provides a comprehensive review of existing multiview video synthesis and editing algorithms and applications. For each topic, the related technologies in classical 2-D image and video processing are reviewed. We then continue to the discussion of recent advanced techniques for multiview video virtual view synthesis and various interactive editing applications. Due to the ongoing progress on multiview video synthesis and editing, we can foresee more and more immersive 3-D video applications will appear in the future.展开更多
In this paper,we present an interactive static image composition approach,namely color retargeting,to flexibly represent time-varying color editing effect based on time-lapse video sequences.Instead of performing prec...In this paper,we present an interactive static image composition approach,namely color retargeting,to flexibly represent time-varying color editing effect based on time-lapse video sequences.Instead of performing precise image matting or blending techniques,our approach treats the color composition as a pixel-level resampling problem. In order to both satisfy the user's editing requirements and avoid visual artifacts,we construct a globally optimized interpolation field. This field defines from which input video frames the output pixels should be resampled.Our proposed resampling solution ensures that(i) the global color transition in the output image is as smooth as possible,(ii) the desired colors/objects specified by the user from different video frames are well preserved,and(iii) additional local color transition directions in the image space assigned by the user are also satisfied.Various examples have been shown to demonstrate that our efficient solution enables the user to easily create time-varying color image composition results.展开更多
Holographic displays have the promise to be the ultimate 3D display technology,able to account for all visual cues.Recent advances in photonics and electronics gave rise to high-resolution holographic display prototyp...Holographic displays have the promise to be the ultimate 3D display technology,able to account for all visual cues.Recent advances in photonics and electronics gave rise to high-resolution holographic display prototypes,indicating that they may become widely available in the near future.One major challenge in driving those display systems is computational:computer generated holography(CGH)consists of numerically simulating diffraction,which is very computationally intensive.Our goal in this paper is to give a broad overview of the state-of-the-art in CGH.We make a classification of modern CGH algorithms,we describe different algorithmic CGH acceleration techniques,discuss the latest dedicated hardware solutions and indicate how to evaluate the perceptual quality of CGH.We summarize our findings,discuss remaining challenges and make projections on the future of CGH.展开更多
基金Project(61072087)supported by the National Natural Science Foundation of ChinaProject(2010011020-1)supported by the Natural Scientific Foundation of Shanxi Province,ChinaProject(20093010)supported by Graduate Innovation Fundation of Shanxi Province,China
文摘Based on an auditory model, the zero-crossings with maximal Teager energy operator (ZCMT) feature extraction approach was described, and then applied to speech and emotion recognition. Three kinds of experiments were carried out. The first kind consists of isolated word recognition experiments in neutral (non-emotional) speech. The results show that the ZCMT approach effectively improves the recognition accuracy by 3.47% in average compared with the Teager energy operator (TEO). Thus, ZCMT feature can be considered as a noise-robust feature for speech recognition. The second kind consists of mono-lingual emotion recognition experiments by using the Taiyuan University of Technology (TYUT) and the Berlin databases. As the average recognition rate of ZCMT approach is 82.19%, the results indicate that the ZCMT features can characterize speech emotions in an effective way. The third kind consists of cross-lingual experiments with three languages. As the accuracy of ZCMT approach only reduced by 1.45%, the results indicate that the ZCMT features can characterize emotions in a language independent way.
文摘The human mind’s evolution owes much to its companion phenomena of intelligence, sapience, wisdom, awareness and consciousness. In this paper we take the concepts of intelligence and sa-pience as the starting point of a route towards elucidation of the conscious mind. There is much disagreement and confusion associated with the word intelligence. A lot of this results from its use in diverse contexts, where it is called upon to represent different ideas and to justify different ar-guments. Addition of the word sapience to the mix merely complicates matters, unless we can relate both of these words to different concepts in a way which acceptably crosses contextual boundaries. We have established a connection between information processing and processor “architecture” which provides just such a linguistic separation, and which is applicable in either a computational or conceptual form to any context. This paper reports the argumentation leading up to a distinction between intelligence and sapience, and relates this distinction to human “cognitive” activities. Information is always contextual. Information processing in a system always takes place between “architectural” scales: intelligence is the “tool” which permits an “overview” of the relevance of individual items of information. System unity presumes a degree of coherence across all the scales of a system: sapience is the “tool” which permits an evaluation of the relevance of both individual items and individual scales of information to a common purpose. This hyperscalar coherence is created through mutual inter-scalar observation, whose recursive nature generates the independence of high-level consciousness, making humans human. We conclude that intelligence and sapience are distinct and necessary properties of all information processing systems, and that the degree of their availability controls a system’s or a human’s cognitive capacity, if not its appli-cation. This establishes intelligence and sapience as prime ancestors of the conscious mind. How-ever, to our knowledge, there is no current mathematical approach which can satisfactorily deal with the native irrationalities of information integration across multiple scales, and therefore of formally modeling the mind.
基金partially supported by Innoviris(3-DLicornea project)FWO(project G.0256.15)+3 种基金supported by the National Natural Science Foundation of China(Nos.61272226 and 61373069)Research Grant of Beijing Higher Institution Engineering Research CenterTsinghua-Tencent Joint Laboratory for Internet Innovation TechnologyTsinghua University Initiative Scientific Research Program
文摘Multiview video can provide more immersive perception than traditional single 2-D video. It enables both interactive free navigation applications as well as high-end autostereoscopic displays on which multiple users can perceive genuine 3-D content without glasses. The multiview format also comprises much more visual information than classical 2-D or stereo 3-D content, which makes it possible to perform various interesting editing operations both on pixel-level and object-level. This survey provides a comprehensive review of existing multiview video synthesis and editing algorithms and applications. For each topic, the related technologies in classical 2-D image and video processing are reviewed. We then continue to the discussion of recent advanced techniques for multiview video virtual view synthesis and various interactive editing applications. Due to the ongoing progress on multiview video synthesis and editing, we can foresee more and more immersive 3-D video applications will appear in the future.
基金supported by the iMinds visualization research program(HIVIZ)
文摘In this paper,we present an interactive static image composition approach,namely color retargeting,to flexibly represent time-varying color editing effect based on time-lapse video sequences.Instead of performing precise image matting or blending techniques,our approach treats the color composition as a pixel-level resampling problem. In order to both satisfy the user's editing requirements and avoid visual artifacts,we construct a globally optimized interpolation field. This field defines from which input video frames the output pixels should be resampled.Our proposed resampling solution ensures that(i) the global color transition in the output image is as smooth as possible,(ii) the desired colors/objects specified by the user from different video frames are well preserved,and(iii) additional local color transition directions in the image space assigned by the user are also satisfied.Various examples have been shown to demonstrate that our efficient solution enables the user to easily create time-varying color image composition results.
基金This research was funded by the Research Foundation-Flanders(FWO),Junior postdoctoral fellowship(12ZQ220N),the joint JSPS-FWO scientific cooperation program(VS07820N)the Japan Society for the Promotion of Science(19H04132 and JPJSBP120202302)。
文摘Holographic displays have the promise to be the ultimate 3D display technology,able to account for all visual cues.Recent advances in photonics and electronics gave rise to high-resolution holographic display prototypes,indicating that they may become widely available in the near future.One major challenge in driving those display systems is computational:computer generated holography(CGH)consists of numerically simulating diffraction,which is very computationally intensive.Our goal in this paper is to give a broad overview of the state-of-the-art in CGH.We make a classification of modern CGH algorithms,we describe different algorithmic CGH acceleration techniques,discuss the latest dedicated hardware solutions and indicate how to evaluate the perceptual quality of CGH.We summarize our findings,discuss remaining challenges and make projections on the future of CGH.