期刊文献+

为您找到了以下期刊:

共找到396篇文章
< 1 2 20 >
每页显示 20 50 100
Temporally consistent video colorization with deep feature propagation and self-regularization learning 被引量:2
1
作者 Yihao Liu Hengyuan Zhao +4 位作者 Kelvin CKChan Xintao Wang Chen Change Loy Yu Qiao Chao Dong computational visual media SCIE EI CSCD 2024年第2期375-395,共21页
Video colorization is a challenging and highly ill-posed problem.Although recent years have witnessed remarkable progress in single image colorization,there is relatively less research effort on video colorization,and... Video colorization is a challenging and highly ill-posed problem.Although recent years have witnessed remarkable progress in single image colorization,there is relatively less research effort on video colorization,and existing methods always suffer from severe flickering artifacts(temporal inconsistency)or unsatisfactory colorization.We address this problem from a new perspective,by jointly considering colorization and temporal consistency in a unified framework.Specifically,we propose a novel temporally consistent video colorization(TCVC)framework.TCVC effectively propagates frame-level deep features in a bidirectional way to enhance the temporal consistency of colorization.Furthermore,TCVC introduces a self-regularization learning(SRL)scheme to minimize the differences in predictions obtained using different time steps.SRL does not require any ground-truth color videos for training and can further improve temporal consistency.Experiments demonstrate that our method can not only provide visually pleasing colorized video,but also with clearly better temporal consistency than state-of-the-art methods.A video demo is provided at https://www.youtube.com/watch?v=c7dczMs-olE,while code is available at https://github.com/lyh-18/TCVC-Tem porally-Consistent-Video-Colorization. 展开更多
关键词 video colorization temporal consistency feature propagation self-regularization
原文传递
Multi-modal visual tracking:Review and experimental comparison 被引量:1
2
作者 Pengyu Zhang Dong Wang Huchuan Lu computational visual media SCIE EI CSCD 2024年第2期193-214,共22页
Visual object tracking has been drawing increasing attention in recent years,as a fundamental task in computer vision.To extend the range of tracking applications,researchers have been introducing information from mul... Visual object tracking has been drawing increasing attention in recent years,as a fundamental task in computer vision.To extend the range of tracking applications,researchers have been introducing information from multiple modalities to handle specific scenes,with promising research prospects for emerging methods and benchmarks.To provide a thorough review of multi-modal tracking,different aspects of multi-modal tracking algorithms are summarized under a unified taxonomy,with specific focus on visibledepth(RGB-D)and visible-thermal(RGB-T)tracking.Subsequently,a detailed description of the related benchmarks and challenges is provided.Extensive experiments were conducted to analyze the effectiveness of trackers on five datasets:PTB,VOT19-RGBD,GTOT,RGBT234,and VOT19-RGBT.Finally,various future directions,including model design and dataset construction,are discussed from different perspectives for further research. 展开更多
关键词 visual tracking object tracking multi-modal fusion RGB-T tracking RGB-D trackin
原文传递
Hierarchical vectorization for facial images
3
作者 Qian Fu Linlin Liu +1 位作者 Fei Hou Ying He computational visual media SCIE EI CSCD 2024年第1期97-118,共22页
The explosive growth of social media means portrait editing and retouching are in high demand.While portraits are commonly captured and stored as raster images,editing raster images is non-trivial and requires the use... The explosive growth of social media means portrait editing and retouching are in high demand.While portraits are commonly captured and stored as raster images,editing raster images is non-trivial and requires the user to be highly skilled.Aiming at developing intuitive and easy-to-use portrait editing tools,we propose a novel vectorization method that can automatically convert raster images into a 3-tier hierarchical representation.The base layer consists of a set of sparse diffusion curves(DCs)which characterize salient geometric features and low-frequency colors,providing a means for semantic color transfer and facial expression editing.The middle level encodes specular highlights and shadows as large,editable Poisson regions(PRs)and allows the user to directly adjust illumination by tuning the strength and changing the shapes of PRs.The top level contains two types of pixel-sized PRs for high-frequency residuals and fine details such as pimples and pigmentation.We train a deep generative model that can produce high-frequency residuals automatically.Thanks to the inherent meaning in vector primitives,editing portraits becomes easy and intuitive.In particular,our method supports color transfer,facial expression editing,highlight and shadow editing,and automatic retouching.To quantitatively evaluate the results,we extend the commonly used FLIP metric(which measures color and feature differences between two images)to consider illumination.The new metric,illumination-sensitive FLIP,can effectively capture salient changes in color transfer results,and is more consistent with human perception than FLIP and other quality measures for portrait images.We evaluate our method on the FFHQR dataset and show it to be effective for common portrait editing tasks,such as retouching,light editing,color transfer,and expression editing. 展开更多
关键词 face editing VECTORIZATION Poisson editing color transfer illumination editing expression editing
原文传递
Foundation models meet visualizations: Challenges and opportunities
4
作者 Weikai Yang Mengchen Liu +1 位作者 Zheng Wang Shixia Liu computational visual media SCIE EI CSCD 2024年第3期399-424,共26页
Recent studies have indicated that foundation models, such as BERT and GPT, excel atadapting to various downstream tasks. This adaptability has made them a dominant force in buildingartificial intelligence (AI) system... Recent studies have indicated that foundation models, such as BERT and GPT, excel atadapting to various downstream tasks. This adaptability has made them a dominant force in buildingartificial intelligence (AI) systems. Moreover, a newresearch paradigm has emerged as visualizationtechniques are incorporated into these models. Thisstudy divides these intersections into two researchareas: visualization for foundation model (VIS4FM)and foundation model for visualization (FM4VIS).In terms of VIS4FM, we explore the primary roleof visualizations in understanding, refining, and evaluating these intricate foundation models. VIS4FMaddresses the pressing need for transparency, explainability, fairness, and robustness. Conversely, in termsof FM4VIS, we highlight how foundation models canbe used to advance the visualization field itself. Theintersection of foundation models with visualizations ispromising but also introduces a set of challenges. Byhighlighting these challenges and promising opportunities, this study aims to provide a starting point forthe continued exploration of this research avenue. 展开更多
关键词 VISUALIZATION artificial intelligence(AI) machine learning foundation models visualization for foundation model(VIS4FM) foundation model for visualization(FM4VIS)
原文传递
Towards robustness and generalization of point cloud representation:A geometry coding method and a large-scale object-level dataset
5
作者 Mingye Xu Zhipeng Zhou +1 位作者 Yali Wang Yu Qiao computational visual media SCIE EI CSCD 2024年第1期27-43,共17页
Robustness and generalization are two challenging problems for learning point cloud representation.To tackle these problems,we first design a novel geometry coding model,which can effectively use an invariant eigengra... Robustness and generalization are two challenging problems for learning point cloud representation.To tackle these problems,we first design a novel geometry coding model,which can effectively use an invariant eigengraph to group points with similar geometric information,even when such points are far from each other.We also introduce a large-scale point cloud dataset,PCNet184.It consists of 184 categories and 51,915 synthetic objects,which brings new challenges for point cloud classification,and provides a new benchmark to assess point cloud cross-domain generalization.Finally,we perform extensive experiments on point cloud classification,using ModelNet40,ScanObjectNN,and our PCNet184,and segmentation,using ShapeNetPart and S3DIS.Our method achieves comparable performance to state-of-the-art methods on these datasets,for both supervised and unsupervised learning.Code and our dataset are available at https://github.com/MingyeXu/PCNet184. 展开更多
关键词 geometry coding self-supervised learning point cloud classification segmentation 3D analysis
原文传递
Benchmarking visual SLAM methods in mirror environments
6
作者 Peter Herbert Jing Wu +1 位作者 Ze Ji Yu-Kun Lai computational visual media SCIE EI CSCD 2024年第2期215-241,共27页
Visual simultaneous localisation and mapping(vSLAM)finds applications for indoor and outdoor navigation that routinely subjects it to visual complexities,particularly mirror reflections.The effect of mirror presence(t... Visual simultaneous localisation and mapping(vSLAM)finds applications for indoor and outdoor navigation that routinely subjects it to visual complexities,particularly mirror reflections.The effect of mirror presence(time visible and its average size in the frame)was hypothesised to impact localisation and mapping performance,with systems using direct techniques expected to perform worse.Thus,a dataset,MirrEnv,of image sequences recorded in mirror environments,was collected,and used to evaluate the performance of existing representative methods.RGBD ORB-SLAM3 and BundleFusion appear to show moderate degradation of absolute trajectory error with increasing mirror duration,whilst the remaining results did not show significantly degraded localisation performance.The mesh maps generated proved to be very inaccurate,with real and virtual reflections colliding in the reconstructions.A discussion is given of the likely sources of error and robustness in mirror environments,outlining future directions for validating and improving vSLAM performance in the presence of planar mirrors.The MirrEnv dataset is available at https://doi.org/10.17035/d.2023.0292477898. 展开更多
关键词 visual simultaneous localisation and mapping(vSLAM) MIRROR LOCALISATION MAPPING REFLECTION datase
原文传递
Dynamic ocean inverse modeling based on differentiable rendering
7
作者 Xueguang Xie Yang Gao +2 位作者 Fei Hou Aimin Hao Hong Qin computational visual media SCIE EI CSCD 2024年第2期279-294,共16页
Learning and inferring underlying motion patterns of captured 2D scenes and then re-creating dynamic evolution consistent with the real-world natural phenomena have high appeal for graphics and animation.To bridge the... Learning and inferring underlying motion patterns of captured 2D scenes and then re-creating dynamic evolution consistent with the real-world natural phenomena have high appeal for graphics and animation.To bridge the technical gap between virtual and real environments,we focus on the inverse modeling and reconstruction of visually consistent and property-verifiable oceans,taking advantage of deep learning and differentiable physics to learn geometry and constitute waves in a self-supervised manner.First,we infer hierarchical geometry using two networks,which are optimized via the differentiable renderer.We extract wave components from the sequence of inferred geometry through a network equipped with a differentiable ocean model.Then,ocean dynamics can be evolved using the reconstructed wave components.Through extensive experiments,we verify that our new method yields satisfactory results for both geometry reconstruction and wave estimation.Moreover,the new framework has the inverse modeling potential to facilitate a host of graphics applications,such as the rapid production of physically accurate scene animation and editing guided by real ocean scenes. 展开更多
关键词 inverse modeling surface reconstruction wave modeling ocean waves differentiable rendering(DR)
原文传递
Multi-granularity sequence generation for hierarchical image classification
8
作者 Xinda Liu Lili Wang computational visual media SCIE EI CSCD 2024年第2期243-260,共18页
Hierarchical multi-granularity image classification is a challenging task that aims to tag each given image with multiple granularity labels simultaneously.Existing methods tend to overlook that different image region... Hierarchical multi-granularity image classification is a challenging task that aims to tag each given image with multiple granularity labels simultaneously.Existing methods tend to overlook that different image regions contribute differently to label prediction at different granularities,and also insufficiently consider relationships between the hierarchical multi-granularity labels.We introduce a sequence-to-sequence mechanism to overcome these two problems and propose a multi-granularity sequence generation(MGSG)approach for the hierarchical multi-granularity image classification task.Specifically,we introduce a transformer architecture to encode the image into visual representation sequences.Next,we traverse the taxonomic tree and organize the multi-granularity labels into sequences,and vectorize them and add positional information.The proposed multi-granularity sequence generation method builds a decoder that takes visual representation sequences and semantic label embedding as inputs,and outputs the predicted multi-granularity label sequence.The decoder models dependencies and correlations between multi-granularity labels through a masked multi-head self-attention mechanism,and relates visual information to the semantic label information through a crossmodality attention mechanism.In this way,the proposed method preserves the relationships between labels at different granularity levels and takes into account the influence of different image regions on labels with different granularities.Evaluations on six public benchmarks qualitatively and quantitatively demonstrate the advantages of the proposed method.Our project is available at https://github.com/liuxindazz/mgs. 展开更多
关键词 hierarchical multi-granularity classification vision and text transformer sequence generation fine-grained image recognition cross-modality attenti
原文传递
CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network
9
作者 Fan Zhang Gongguan Chen +1 位作者 Hua Wang Caiming Zhang computational visual media SCIE EI CSCD 2024年第3期593-608,共16页
Recently, facial-expression recognition (FER)has primarily focused on images in the wild, includingfactors such as face occlusion and image blurring, ratherthan laboratory images. Complex field environmentshave introd... Recently, facial-expression recognition (FER)has primarily focused on images in the wild, includingfactors such as face occlusion and image blurring, ratherthan laboratory images. Complex field environmentshave introduced new challenges to FER. To addressthese challenges, this study proposes a cross-fusion dualattention network. The network comprises three parts:(1) a cross-fusion grouped dual-attention mechanism torefine local features and obtain global information;(2) aproposed C2 activation function construction method,which is a piecewise cubic polynomial with threedegrees of freedom, requiring less computation withimproved flexibility and recognition abilities, whichcan better address slow running speeds and neuroninactivation problems;and (3) a closed-loop operationbetween the self-attention distillation process andresidual connections to suppress redundant informationand improve the generalization ability of the model.The recognition accuracies on the RAF-DB, FERPlus,and AffectNet datasets were 92.78%, 92.02%, and63.58%, respectively. Experiments show that this modelcan provide more effective solutions for FER tasks. 展开更多
关键词 facial-expression recognition(FER) cubic polynomial activation function dualattention mechanism interactive learning self-attention distillation
原文传递
FilterGNN:Image feature matching with cascaded outlier filters and linearattention
10
作者 Jun-Xiong Cai Tai-Jiang Mu Yu-Kun Lai computational visual media SCIE EI CSCD 2024年第5期873-884,共12页
The cross-view matching of local image features is a fundamental task in visual localization and 3D reconstruction.This study proposes FilterGNN,a transformer-based graph neural network(GNN),aiming to improve the matc... The cross-view matching of local image features is a fundamental task in visual localization and 3D reconstruction.This study proposes FilterGNN,a transformer-based graph neural network(GNN),aiming to improve the matching efficiency and accuracy of visual descriptors.Based on high matching sparseness and coarse-to-fine covisible area detection,FilterGNN utilizes cascaded optimal graph-matching filter modules to dynamically reject outlier matches.Moreover,we successfully adapted linear attention in FilterGNN with post-instance normalization support,which significantly reduces the complexity of complete graph learning from O(N2)to O(N).Experiments show that FilterGNN requires only 6%of the time cost and 33.3%of the memory cost compared with SuperGlue under a large-scale input size and achieves a competitive performance in various tasks,such as pose estimation,visual localization,and sparse 3D reconstruction. 展开更多
关键词 image matching TRANSFORMER linear attention visual localization sparse reconstruction
原文传递
A unified multi-view multi-person tracking framework
11
作者 Fan Yang Shigeyuki Odashima +3 位作者 Sosuke Yamao Hiroaki Fujimoto Shoichi Masui Shan Jiang computational visual media SCIE EI CSCD 2024年第1期137-160,共24页
Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the lat... Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the latter, because they directly obtain 3D positions on the ground plane via a homography projection, which is inapplicable to 3D poses above the ground. In contrast, frameworks designed for pose tracking generally isolate multi-view and multi-frame associations and may not be sufficiently robust for footprint tracking, which utilizes fewer key points than pose tracking, weakening multi-view association cues in a single frame. This study presents a unified multi-view multi-person tracking framework to bridge the gap between footprint tracking and pose tracking. Without additional modifications, the framework can adopt monocular 2D bounding boxes and 2D poses as its input to produce robust 3D trajectories for multiple persons. Importantly, multi-frame and multi-view information are jointly employed to improve association and triangulation. Our framework is shown to provide state-of-the-art performance on the Campus and Shelf datasets for 3D pose tracking, with comparable results on the WILDTRACK and MMPTRACK datasets for 3D footprint tracking. 展开更多
关键词 multi-camera multi-person tracking pose tracking footprint tracking TRIANGULATION spatiotemporal clustering
原文传递
3D hand pose and shape estimation from monocular RGB via efficient 2D cues
12
作者 Fenghao Zhang Lin Zhao +3 位作者 Shengling Li Wanjuan Su Liman Liu Wenbing Tao computational visual media SCIE EI CSCD 2024年第1期79-96,共18页
Estimating 3D hand shape from a single-view RGB image is important for many applications.However,the diversity of hand shapes and postures,depth ambiguity,and occlusion may result in pose errors and noisy hand meshes.... Estimating 3D hand shape from a single-view RGB image is important for many applications.However,the diversity of hand shapes and postures,depth ambiguity,and occlusion may result in pose errors and noisy hand meshes.Making full use of 2D cues such as 2D pose can effectively improve the quality of 3D human hand shape estimation.In this paper,we use 2D joint heatmaps to obtain spatial details for robust pose estimation.We also introduce a depth-independent 2D mesh to avoid depth ambiguity in mesh regression for efficient hand-image alignment.Our method has four cascaded stages:2D cue extraction,pose feature encoding,initial reconstruction,and reconstruction refinement.Specifically,we first encode the image to determine semantic features during 2D cue extraction;this is also used to predict hand joints and for segmentation.Then,during the pose feature encoding stage,we use a hand joints encoder to learn spatial information from the joint heatmaps.Next,a coarse 3D hand mesh and 2D mesh are obtained in the initial reconstruction step;a mesh squeeze-and-excitation block is used to fuse different hand features to enhance perception of 3D hand structures.Finally,a global mesh refinement stage learns non-local relations between vertices of the hand mesh from the predicted 2D mesh,to predict an offset hand mesh to fine-tune the reconstruction results.Quantitative and qualitative results on the FreiHAND benchmark dataset demonstrate that our approach achieves state-of-the-art performance. 展开更多
关键词 HAND 3D reconstruction deep learning image features 3D mesh
原文传递
Joint training with local soft attention and dual cross-neighbor label smoothing for unsupervised person re-identification
13
作者 Qing Han Longfei Li +4 位作者 Weidong Min Qi Wang Qingpeng Zeng Shimiao Cui Jiongjin Chen computational visual media SCIE EI CSCD 2024年第3期543-558,共16页
Existing unsupervised person re-identification approaches fail to fully capture thefine-grained features of local regions,which can result in people with similar appearances and different identities being assigned the... Existing unsupervised person re-identification approaches fail to fully capture thefine-grained features of local regions,which can result in people with similar appearances and different identities being assigned the same label after clustering.The identity-independent information contained in different local regions leads to different levels of local noise.To address these challenges,joint training with local soft attention and dual cross-neighbor label smoothing(DCLS)is proposed in this study.First,the joint training is divided into global and local parts,whereby a soft attention mechanism is proposed for the local branch to accurately capture the subtle differences in local regions,which improves the ability of the re-identification model in identifying a person’s local significant features.Second,DCLS is designed to progressively mitigate label noise in different local regions.The DCLS uses global and local similarity metrics to semantically align the global and local regions of the person and further determines the proximity association between local regions through the cross information of neighboring regions,thereby achieving label smoothing of the global and local regions throughout the training process.In extensive experiments,the proposed method outperformed existing methods under unsupervised settings on several standard person re-identification datasets. 展开更多
关键词 person re-identification(Re-ID) unsupervised learning(USL) local soft attention joint training dual cross-neighbor label smoothing(DCLS)
原文传递
Real-time distance field acceleration based free-viewpoint video synthesis for large sports fields
14
作者 Yanran Dai Jing Li +5 位作者 Yuqi Jiang Haidong Qin Bang Liang Shikuan Hong Haozhe Pan Tao Yang computational visual media SCIE EI CSCD 2024年第2期331-353,共23页
Free-viewpoint video allows the user to view objects from any virtual perspective,creating an immersive visual experience.This technology enhances the interactivity and freedom of multimedia performances.However,many ... Free-viewpoint video allows the user to view objects from any virtual perspective,creating an immersive visual experience.This technology enhances the interactivity and freedom of multimedia performances.However,many free-viewpoint video synthesis methods hardly satisfy the requirement to work in real time with high precision,particularly for sports fields having large areas and numerous moving objects.To address these issues,we propose a freeviewpoint video synthesis method based on distance field acceleration.The central idea is to fuse multiview distance field information and use it to adjust the search step size adaptively.Adaptive step size search is used in two ways:for fast estimation of multiobject three-dimensional surfaces,and synthetic view rendering based on global occlusion judgement.We have implemented our ideas using parallel computing for interactive display,using CUDA and OpenGL frameworks,and have used real-world and simulated experimental datasets for evaluation.The results show that the proposed method can render free-viewpoint videos with multiple objects on large sports fields at 25 fps.Furthermore,the visual quality of our synthetic novel viewpoint images exceeds that of state-of-the-art neural-rendering-based methods. 展开更多
关键词 free-viewpoint video view synthesis camera array distance field sports video
原文传递
Symmetrization of quasi-regular patterns with periodic tilting of regular polygons
15
作者 Zhengzheng Yin Yao Jin +4 位作者 Zhijian Fang Yun Zhang Huaxiong Zhang Jiu Zhou Lili He computational visual media SCIE EI CSCD 2024年第3期559-576,共18页
Computer-generated aesthetic patterns arewidely used as design materials in various fields. Themost common methods use fractals or dynamicalsystems as basic tools to create various patterns. Toenhance aesthetics and c... Computer-generated aesthetic patterns arewidely used as design materials in various fields. Themost common methods use fractals or dynamicalsystems as basic tools to create various patterns. Toenhance aesthetics and controllability, some researchershave introduced symmetric layouts along with thesetools. One popular strategy employs dynamical systemscompatible with symmetries that construct functionswith the desired symmetries. However, these aretypically confined to simple planar symmetries. Theother generates symmetrical patterns under theconstraints of tilings. Although it is slightly moreflexible, it is restricted to small ranges of tilingsand lacks textural variations. Thus, we proposed anew approach for generating aesthetic patterns bysymmetrizing quasi-regular patterns using general kuniformtilings. We adopted a unified strategy toconstruct invariant mappings for k-uniform tilings thatcan eliminate texture seams across the tiling edges.Furthermore, we constructed three types of symmetriesassociated with the patterns: dihedral, rotational, andreflection symmetries. The proposed method can beeasily implemented using GPU shaders and is highlyefficient and suitable for complicated tiling with regularpolygons. Experiments demonstrated the advantages of our method over state-of-the-art methods in terms offlexibility in controlling the generation of patterns withvarious parameters as well as the diversity of texturesand styles. 展开更多
关键词 quasi-regular patterns(QRP) k-uniform tilings invariant mappings symmetry aesthetic patterns
原文传递
A causal convolutional neural network for multi-subject motion modeling and generation
16
作者 Shuaiying Hou Congyi Wang +5 位作者 Wenlin Zhuang Yu Chen Yangang Wang Hujun Bao Jinxiang Chai Weiwei Xu computational visual media SCIE EI CSCD 2024年第1期45-59,共15页
Inspired by the success of WaveNet in multi-subject speech synthesis,we propose a novel neural network based on causal convolutions for multi-subject motion modeling and generation.The network can capture the intrinsi... Inspired by the success of WaveNet in multi-subject speech synthesis,we propose a novel neural network based on causal convolutions for multi-subject motion modeling and generation.The network can capture the intrinsic characteristics of the motion of different subjects,such as the influence of skeleton scale variation on motion style.Moreover,after fine-tuning the network using a small motion dataset for a novel skeleton that is not included in the training dataset,it is able to synthesize high-quality motions with a personalized style for the novel skeleton.The experimental results demonstrate that our network can model the intrinsic characteristics of motions well and can be applied to various motion modeling and synthesis tasks. 展开更多
关键词 deep learning optimization motion generation motion denoising motion control
原文传递
A visual modeling method for spatiotemporal and multidimensional features in epidemiological analysis:Applied COVID-19 aggregated datasets
17
作者 Yu Dong Christy Jie Liang +1 位作者 Yi Chen Jie Hua computational visual media SCIE EI CSCD 2024年第1期161-186,共26页
The visual modeling method enables flexible interactions with rich graphical depictions of data and supports the exploration of the complexities of epidemiological analysis.However,most epidemiology visualizations do ... The visual modeling method enables flexible interactions with rich graphical depictions of data and supports the exploration of the complexities of epidemiological analysis.However,most epidemiology visualizations do not support the combined analysis of objective factors that might influence the transmission situation,resulting in a lack of quantitative and qualitative evidence.To address this issue,we developed a portrait-based visual modeling method called+msRNAer.This method considers the spatiotemporal features of virus transmission patterns and multidimensional features of objective risk factors in communities,enabling portrait-based exploration and comparison in epidemiological analysis.We applied+msRNAer to aggregate COVID-19-related datasets in New South Wales,Australia,combining COVID-19 case number trends,geo-information,intervention events,and expert-supervised risk factors extracted from local government area-based censuses.We perfected the+msRNAer workflow with collaborative views and evaluated its feasibility,effectiveness,and usefulness through one user study and three subject-driven case studies.Positive feedback from experts indicates that+msRNAer provides a general understanding for analyzing comprehension that not only compares relationships between cases in time-varying and risk factors through portraits but also supports navigation in fundamental geographical,timeline,and other factor comparisons.By adopting interactions,experts discovered functional and practical implications for potential patterns of long-standing community factors regarding the vulnerability faced by the pandemic.Experts confirmed that+msRNAer is expected to deliver visual modeling benefits with spatiotemporal and multidimensional features in other epidemiological analysis scenarios. 展开更多
关键词 visual modeling epidemiological analysis SPATIOTEMPORAL MULTIDIMENSIONAL COVID-19
原文传递
MusicFace: Music-driven expressive singing face synthesis
18
作者 Pengfei Liu Wenjin Deng +5 位作者 Hengda Li Jintai Wang Yinglin Zheng Yiwei Ding Xiaohu Guo Ming Zeng computational visual media SCIE EI CSCD 2024年第1期119-136,共18页
It remains an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music. In this paper, we present a method for this task with natural motions for the lips, facial expression... It remains an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music. In this paper, we present a method for this task with natural motions for the lips, facial expression, head pose, and eyes. Due to the coupling of mixed information for the human voice and backing music in common music audio signals, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into a human voice stream and a backing music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions, and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressivenes of the generated results, we decompose head movement generation in terms of speed and direction, and decompose eye state generation into short-term blinking and long-term eye closing, modeling them separately. We have also built a novel dataset, SingingFace, to support training and evaluation of models for this task, including future work on this topic. Extensive experiments and a user study show that our proposed method is capable of synthesizing vivid singing faces, qualitatively and quantitatively better than the prior state-of-the-art. 展开更多
关键词 face synthesis SINGING MUSIC generative adversarial network
原文传递
Learning accurate template matching with differentiable coarseto-fine correspondence refinement
19
作者 Zhirui Gao Renjiao Yi +3 位作者 Zheng Qin Yunfan Ye Chenyang Zhu Kai Xu computational visual media SCIE EI CSCD 2024年第2期309-330,共22页
Template matching is a fundamental task in computer vision and has been studied for decades.It plays an essential role in manufacturing industry for estimating the poses of different parts,facilitating downstream task... Template matching is a fundamental task in computer vision and has been studied for decades.It plays an essential role in manufacturing industry for estimating the poses of different parts,facilitating downstream tasks such as robotic grasping.Existing methods fail when the template and source images have different modalities,cluttered backgrounds,or weak textures.They also rarely consider geometric transformations via homographies,which commonly exist even for planar industrial parts.To tackle the challenges,we propose an accurate template matching method based on differentiable coarse-tofine correspondence refinement.We use an edge-aware module to overcome the domain gap between the mask template and the grayscale image,allowing robust matching.An initial warp is estimated using coarse correspondences based on novel structure-aware information provided by transformers.This initial alignment is passed to a refinement network using references and aligned images to obtain sub-pixel level correspondences which are used to give the final geometric transformation.Extensive evaluation shows that our method to be significantly better than state-of-the-art methods and baselines,providing good generalization ability and visually plausible results even on unseen real data. 展开更多
关键词 template matching differentiable homography structure-awareness TRANSFORMERS
原文传递
Shape embedding and retrieval in multi-flow deformation
20
作者 Baiqiang Leng Jingwei Huang +1 位作者 Guanlin Shen Bin Wang computational visual media SCIE EI CSCD 2024年第3期439-451,共13页
We propose a unified 3D flow frameworkfor joint learning of shape embedding and deformationfor different categories. Our goal is to recovershapes from imperfect point clouds by fitting thebest shape template in a shape... We propose a unified 3D flow frameworkfor joint learning of shape embedding and deformationfor different categories. Our goal is to recovershapes from imperfect point clouds by fitting thebest shape template in a shape repository afterdeformation. Accordingly, we learn a shape embeddingfor template retrieval and a flow-based network forrobust deformation. We note that the deformationflow can be quite different for different shapecategories. Therefore, we introduce a novel multi-hubmodule to learn multiple modes of deformation toincorporate such variation, providing a network whichcan handle a wide range of objects from differentcategories. The shape embedding is designed to retrievethe best-fit template as the nearest neighbor in a latentspace. We replace the standard fully connected layerwith a tiny structure in the embedding that significantlyreduces network complexity and further improvesdeformation quality. Experiments show the superiorityof our method to existing state-of-the-art methods viaqualitative and quantitative comparisons. Finally, ourmethod provides efficient and flexible deformation thatcan further be used for novel shape design. 展开更多
关键词 DEFORMATION shape retrieval EMBEDDING RECONSTRUCTION
原文传递
上一页 1 2 20 下一页 到第
使用帮助 返回顶部