期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
EM-Gaze:eye context correlation and metric learning for gaze estimation
1
作者 Jinchao Zhou Guoan Li +3 位作者 Feng Shi Xiaoyan Guo Pengfei Wan Miao Wang 《Visual Computing for Industry,Biomedicine,and Art》 EI 2023年第1期97-108,共12页
In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction.Previous studies have made significant achievements in predicting 2D or 3D ... In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction.Previous studies have made significant achievements in predicting 2D or 3D gazes from monocular face images.This study presents a deep neural network for 2D gaze estimation on mobile devices.It achieves state-of-the-art 2D gaze point regression error,while significantly improving gaze classification error on quadrant divisions of the display.To this end,an efficient attention-based module that correlates and fuses the left and right eye contextual features is first proposed to improve gaze point regression performance.Subsequently,through a unified perspective for gaze estimation,metric learning for gaze classification on quadrant divisions is incorporated as additional supervision.Consequently,both gaze point regression and quadrant classification perfor-mances are improved.The experiments demonstrate that the proposed method outperforms existing gaze-estima-tion methods on the GazeCapture and MPIIFaceGaze datasets. 展开更多
关键词 Computer vision Gaze estimation Metric learning ATTENTION Multi-task learning
下载PDF
Review of light field technologies 被引量:1
2
作者 Shuyao Zhou Tianqian Zhu +3 位作者 Kanle Shi Yazi Li Wen Zheng Junhai Yong 《Visual Computing for Industry,Biomedicine,and Art》 EI 2021年第1期295-307,共13页
Light fields are vector functions that map the geometry of light rays to the corresponding plenoptic attributes.They describe the holographic information of scenes by representing the amount of light flowing in every ... Light fields are vector functions that map the geometry of light rays to the corresponding plenoptic attributes.They describe the holographic information of scenes by representing the amount of light flowing in every direction through every point in space.The physical concept of light fields was first proposed in 1936,and light fields are becoming increasingly important in the field of computer graphics,especially with the fast growth of computing capacity as well as network bandwidth.In this article,light field imaging is reviewed from the following aspects with an emphasis on the achievements of the past five years:(1)depth estimation,(2)content editing,(3)image quality,(4)scene reconstruction and view synthesis,and(5)industrial products because the technologies of lights fields also intersect with industrial applications.State-of-the-art research has focused on light field acquisition,manipulation,and display.In addition,the research has extended from the laboratory to industry.According to these achievements and challenges,in the near future,the applications of light fields could offer more portability,accessibility,compatibility,and ability to visualize the world. 展开更多
关键词 Light field imaging Holographics Human-machine graphic interaction
下载PDF
Point cloud completion via structured feature maps using a feedback network
3
作者 Zejia Su Haibin Huang +2 位作者 Chongyang Ma Hui Huang Ruizhen Hu 《Computational Visual Media》 SCIE EI CSCD 2023年第1期71-85,共15页
In this paper,we tackle the challenging problem of point cloud completion from the perspective of feature learning.Our key observation is that to recover the underlying structures as well as surface details,given part... In this paper,we tackle the challenging problem of point cloud completion from the perspective of feature learning.Our key observation is that to recover the underlying structures as well as surface details,given partial input,a fundamental component is a good feature representation that can capture both global structure and local geometric details.We accordingly first propose FSNet,a feature structuring module that can adaptively aggregate point-wise features into a 2D structured feature map by learning multiple latent patterns from local regions.We then integrate FSNet into a coarse-to-fine pipeline for point cloud completion.Specifically,a 2D convolutional neural network is adopted to decode feature maps from FSNet into a coarse and complete point cloud.Next,a point cloud upsampling network is used to generate a dense point cloud from the partial input and the coarse intermediate output.To efficiently exploit local structures and enhance point distribution uniformity,we propose IFNet,a point upsampling module with a self-correction mechanism that can progressively refine details of the generated dense point cloud.We have conducted qualitative and quantitative experiments on ShapeNet,MVP,and KITTI datasets,which demonstrate that our method outperforms stateof-the-art point cloud completion approaches. 展开更多
关键词 3D point clouds shape completion geometry processing deep learning
原文传递
HDR-Net-Fusion:Real-time 3D dynamic scene reconstruction with a hierarchical deep reinforcement network 被引量:1
4
作者 Hao-Xuan Song Jiahui Huang +1 位作者 Yan-Pei Cao Tai-Jiang Mu 《Computational Visual Media》 EI CSCD 2021年第4期419-435,共17页
Reconstructing dynamic scenes with commodity depth cameras has many applications in computer graphics,computer vision,and robotics.However,due to the presence of noise and erroneous observations from data capturing de... Reconstructing dynamic scenes with commodity depth cameras has many applications in computer graphics,computer vision,and robotics.However,due to the presence of noise and erroneous observations from data capturing devices and the inherently ill-posed nature of non-rigid registration with insufficient information,traditional approaches often produce low-quality geometry with holes,bumps,and misalignments.We propose a novel 3D dynamic reconstruction system,named HDR-Net-Fusion,which learns to simultaneously reconstruct and refine the geometry on the fly with a sparse embedded deformation graph of surfels,using a hierarchical deep reinforcement(HDR)network.The latter comprises two parts:a global HDR-Net which rapidly detects local regions with large geometric errors,and a local HDR-Net serving as a local patch refinement operator to promptly complete and enhance such regions.Training the global HDR-Net is formulated as a novel reinforcement learning problem to implicitly learn the region selection strategy with the goal of improving the overall reconstruction quality.The applicability and efficiency of our approach are demonstrated using a large-scale dynamic reconstruction dataset.Our method can reconstruct geometry with higher quality than traditional methods. 展开更多
关键词 dynamic 3D scene reconstruction deep reinforcement learning point cloud completion deep neural networks
原文传递
Emotion-Aware Music Driven Movie Montage
5
作者 刘伍琴 林敏轩 +4 位作者 黄海斌 马重阳 宋玉 董未名 徐常胜 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第3期540-553,共14页
In this paper, we present Emotion-Aware Music Driven Movie Montage, a novel paradigm for the challenging task of generating movie montages. Specifically, given a movie and a piece of music as the guidance, our method ... In this paper, we present Emotion-Aware Music Driven Movie Montage, a novel paradigm for the challenging task of generating movie montages. Specifically, given a movie and a piece of music as the guidance, our method aims to generate a montage out of the movie that is emotionally consistent with the music. Unlike previous work such as video summarization, this task requires not only video content understanding, but also emotion analysis of both the input movie and music. To this end, we propose a two-stage framework, including a learning-based module for the prediction of emotion similarity and an optimization-based module for the selection and composition of candidate movie shots. The core of our method is to align and estimate emotional similarity between music clips and movie shots in a multi-modal latent space via contrastive learning. Subsequently, the montage generation is modeled as a joint optimization of emotion similarity and additional constraints such as scene-level story completeness and shot-level rhythm synchronization. We conduct both qualitative and quantitative evaluations to demonstrate that our method can generate emotionally consistent montages and outperforms alternative baselines. 展开更多
关键词 movie montage emotion analysis audio-visual modality contrastive learning
原文传递
Learning to assess visual aesthetics of food images 被引量:3
6
作者 Kekai Sheng Weiming Dong +4 位作者 Haibin Huang Menglei Chai Yong Zhang Chongyang Ma Bao-Gang Hu 《Computational Visual Media》 EI CSCD 2021年第1期139-152,共14页
Distinguishing aesthetically pleasing food photos from others is an important visual analysis task for social media and ranking systems related to food.Nevertheless,aesthetic assessment of food images remains a challe... Distinguishing aesthetically pleasing food photos from others is an important visual analysis task for social media and ranking systems related to food.Nevertheless,aesthetic assessment of food images remains a challenging and relatively unexplored task,largely due to the lack of related food image datasets and practical knowledge.Thus,we present the Gourmet Photography Dataset(GPD),the first largescale dataset for aesthetic assessment of food photos.It contains 24,000 images with corresponding binary aesthetic labels,covering a large variety of foods and scenes.We also provide a non-stationary regularization method to combat over-fitting and enhance the ability of tuned models to generalize.Quantitative results from extensive experiments,including a generalization ability test,verify that neural networks trained on the GPD achieve comparable performance to human experts on the task of aesthetic assessment.We reveal several valuable findings to support further research and applications related to visual aesthetic analysis of food images.To encourage further research,we have made the GPD publicly available at https://github.com/Openning07/GPA. 展开更多
关键词 image aesthetic assessment food image analysis DATASET REGULARIZATION
原文传递
A novel robotic visual perception framework for underwater operation 被引量:1
7
作者 Yue LU Xingyu CHEN +2 位作者 Zhengxing WU Junzhi YU Li WEN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2022年第11期1602-1619,共18页
Underwater robotic operation usually requires visual perception(e.g.,object detection and tracking),but underwater scenes have poor visual quality and represent a special domain which can affect the accuracy of visual... Underwater robotic operation usually requires visual perception(e.g.,object detection and tracking),but underwater scenes have poor visual quality and represent a special domain which can affect the accuracy of visual perception.In addition,detection continuity and stability are important for robotic perception,but the commonly used static accuracy based evaluation(i.e.,average precision)is insufficient to reflect detector performance across time.In response to these two problems,we present a design for a novel robotic visual perception framework.First,we generally investigate the relationship between a quality-diverse data domain and visual restoration in detection performance.As a result,although domain quality has an ignorable effect on within-domain detection accuracy,visual restoration is beneficial to detection in real sea scenarios by reducing the domain shift.Moreover,non-reference assessments are proposed for detection continuity and stability based on object tracklets.Further,online tracklet refinement is developed to improve the temporal performance of detectors.Finally,combined with visual restoration,an accurate and stable underwater robotic visual perception framework is established.Small-overlap suppression is proposed to extend video object detection(VID)methods to a single-object tracking task,leading to the flexibility to switch between detection and tracking.Extensive experiments were conducted on the ImageNet VID dataset and real-world robotic tasks to verify the correctness of our analysis and the superiority of our proposed approaches.The codes are available at https://github.com/yrqs/VisPerception. 展开更多
关键词 Underwater operation Robotic perception Visual restoration Video object detection
原文传递
A review of feature fusion-based media popularity prediction methods
8
作者 An-An Liu Xiaowen Wang +5 位作者 Ning Xu Junbo Guo Guoqing Jin Quan Zhang Yejun Tang Shenyuan Zhang 《Visual Informatics》 EI 2022年第4期78-89,共12页
With the popularization of social media,the way of information transmission has changed,and the prediction of information popularity based on social media platforms has attracted extensive attention.Feature fusion-bas... With the popularization of social media,the way of information transmission has changed,and the prediction of information popularity based on social media platforms has attracted extensive attention.Feature fusion-based media popularity prediction methods focus on the multi-modal features of social media,which aim at exploring the key factors affecting media popularity.Meanwhile,the methods make up for the deficiency in feature utilization of traditional methods based on information propagation processes.In this paper,we review feature fusion-based media popularity prediction methods from the perspective of feature extraction and predictive model construction.Before that,we analyze the influencing factors of media popularity to provide intuitive understanding.We further argue about the advantages and disadvantages of existing methods and datasets to highlight the future directions.Finally,we discuss the applications of popularity prediction.To the best of our knowledge,this is the first survey reporting feature fusion-based media popularity prediction methods. 展开更多
关键词 Social media Popularity prediction Multi-modal analysis
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部