期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Causal Reasoning Meets Visual Representation Learning: A Prospective Study 被引量:4
1
作者 Yang Liu Yu-Shen Wei +2 位作者 Hong Yan Guan-Bin Li Liang Lin 《Machine Intelligence Research》 EI CSCD 2022年第6期485-511,共27页
Visual representation learning is ubiquitous in various real-world applications,including visual comprehension,video understanding,multi-modal analysis,human-computer interaction,and urban computing.Due to the emergen... Visual representation learning is ubiquitous in various real-world applications,including visual comprehension,video understanding,multi-modal analysis,human-computer interaction,and urban computing.Due to the emergence of huge amounts of multimodal heterogeneous spatial/temporal/spatial-temporal data in the big data era,the lack of interpretability,robustness,and out-of-distribution generalization are becoming the challenges of the existing visual models.The majority of the existing methods tend to fit the original data/variable distributions and ignore the essential causal relations behind the multi-modal knowledge,which lacks unified guidance and analysis about why modern visual representation learning methods easily collapse into data bias and have limited generalization and cognitive abilities.Inspired by the strong inference ability of human-level agents,recent years have therefore witnessed great effort in developing causal reasoning paradigms to realize robust representation and model learning with good cognitive ability.In this paper,we conduct a comprehensive review of existing causal reasoning methods for visual representation learning,covering fundamental theories,models,and datasets.The limitations of current methods and datasets are also discussed.Moreover,we propose some prospective challenges,opportunities,and future research directions for benchmarking causal reasoning algorithms in visual representation learning.This paper aims to provide a comprehensive overview of this emerging field,attract attention,encourage discussions,bring to the forefront the urgency of developing novel causal reasoning methods,publicly available benchmarks,and consensus-building standards for reliable visual representation learning and related real-world applications more efficiently. 展开更多
关键词 Causal reasoning visual representation learning reliable artificial intelligence spatial-temporal data multi-modal analysis
原文传递
Masked Autoencoders as Single Object Tracking Learners
2
作者 Chunjuan Bo XinChen Junxing Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第7期1105-1122,共18页
Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of ... Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of such trackers heavily relies on ViT models pretrained for long periods,limitingmore flexible model designs for tracking tasks.To address this issue,we propose an efficient unsupervised ViT pretraining method for the tracking task based on masked autoencoders,called TrackMAE.During pretraining,we employ two shared-parameter ViTs,serving as the appearance encoder and motion encoder,respectively.The appearance encoder encodes randomly masked image data,while the motion encoder encodes randomly masked pairs of video frames.Subsequently,an appearance decoder and a motion decoder separately reconstruct the original image data and video frame data at the pixel level.In this way,ViT learns to understand both the appearance of images and the motion between video frames simultaneously.Experimental results demonstrate that ViT-Base and ViT-Large models,pretrained with TrackMAE and combined with a simple tracking head,achieve state-of-the-art(SOTA)performance without additional design.Moreover,compared to the currently popular MAE pretraining methods,TrackMAE consumes only 1/5 of the training time,which will facilitate the customization of diverse models for tracking.For instance,we additionally customize a lightweight ViT-XS,which achieves SOTA efficient tracking performance. 展开更多
关键词 visual object tracking vision transformer masked autoencoder visual representation learning
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部