Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of ...Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of such trackers heavily relies on ViT models pretrained for long periods,limitingmore flexible model designs for tracking tasks.To address this issue,we propose an efficient unsupervised ViT pretraining method for the tracking task based on masked autoencoders,called TrackMAE.During pretraining,we employ two shared-parameter ViTs,serving as the appearance encoder and motion encoder,respectively.The appearance encoder encodes randomly masked image data,while the motion encoder encodes randomly masked pairs of video frames.Subsequently,an appearance decoder and a motion decoder separately reconstruct the original image data and video frame data at the pixel level.In this way,ViT learns to understand both the appearance of images and the motion between video frames simultaneously.Experimental results demonstrate that ViT-Base and ViT-Large models,pretrained with TrackMAE and combined with a simple tracking head,achieve state-of-the-art(SOTA)performance without additional design.Moreover,compared to the currently popular MAE pretraining methods,TrackMAE consumes only 1/5 of the training time,which will facilitate the customization of diverse models for tracking.For instance,we additionally customize a lightweight ViT-XS,which achieves SOTA efficient tracking performance.展开更多
Visual representation learning is ubiquitous in various real-world applications,including visual comprehension,video understanding,multi-modal analysis,human-computer interaction,and urban computing.Due to the emergen...Visual representation learning is ubiquitous in various real-world applications,including visual comprehension,video understanding,multi-modal analysis,human-computer interaction,and urban computing.Due to the emergence of huge amounts of multimodal heterogeneous spatial/temporal/spatial-temporal data in the big data era,the lack of interpretability,robustness,and out-of-distribution generalization are becoming the challenges of the existing visual models.The majority of the existing methods tend to fit the original data/variable distributions and ignore the essential causal relations behind the multi-modal knowledge,which lacks unified guidance and analysis about why modern visual representation learning methods easily collapse into data bias and have limited generalization and cognitive abilities.Inspired by the strong inference ability of human-level agents,recent years have therefore witnessed great effort in developing causal reasoning paradigms to realize robust representation and model learning with good cognitive ability.In this paper,we conduct a comprehensive review of existing causal reasoning methods for visual representation learning,covering fundamental theories,models,and datasets.The limitations of current methods and datasets are also discussed.Moreover,we propose some prospective challenges,opportunities,and future research directions for benchmarking causal reasoning algorithms in visual representation learning.This paper aims to provide a comprehensive overview of this emerging field,attract attention,encourage discussions,bring to the forefront the urgency of developing novel causal reasoning methods,publicly available benchmarks,and consensus-building standards for reliable visual representation learning and related real-world applications more efficiently.展开更多
This study investigates Chinese elementary, middle, and high school students' L2 perceptual learning styles, imagination, ideal L2 self, and motivated L2 behavior. A perceptual learning style and L2 learning motivati...This study investigates Chinese elementary, middle, and high school students' L2 perceptual learning styles, imagination, ideal L2 self, and motivated L2 behavior. A perceptual learning style and L2 learning motivation questionnaire was administered to 1,667 students from elementary to high school. Statistical results revealed that Chinese elementary, middle, and high school students were more in favor of a visual learning style in comparison to their auditory and kinesthetic learning styles. This visually oriented style was significantly correlated with their ideal L2 self and motivated L2 behavior. The sequential regression analysis results indicated that the ideal L2 self and visual learning style were the most meaningful predictors for Chinese students' motivated L2 behavior. This study suggests that teachers need to help students create and maintain their ideal L2 self and facilitate their L2 learning motivation by providing more adequate visual teaching materials.展开更多
Recent developments allowed establishing virtual-reality (VR) setups to study multiple aspects of visual learning in honey bees under controlled experimental conditions. Here, we adopted a VR environment to investigat...Recent developments allowed establishing virtual-reality (VR) setups to study multiple aspects of visual learning in honey bees under controlled experimental conditions. Here, we adopted a VR environment to investigate the visual learning in the buff-tailed bumble bee Bombus terrestris. Based on responses to appetitive and aversive reinforcements used for conditioning, we show that bumble bees had the proper appetitive motivation to engage in the VR experiments and that they learned efficiently elemental color discriminations. In doing so, they reduced the latency to make a choice, increased the proportion of direct paths toward the virtual stimuli and walked faster toward them. Performance in a short-term retention test showed that bumble bees chose and fixated longer on the correct stimulus in the absence of reinforcement. Body size and weight, although variable across individuals, did not affect cognitive performances and had a mild impact on motor performances. Overall, we show that bumble bees are suitable experimental subjects for experiments on visual learning under VR conditions, which opens important perspectives for invasive studies on the neural and molecular bases of such learning given the robustness of these insects and the accessibility of their brain.展开更多
The fruit fly,Drosophila melanogaster,is able to discriminate visual landmarks and form visual long-term memory in a flight simulator.Studies focused on the molecular mechanism of long-term memory have shown that memo...The fruit fly,Drosophila melanogaster,is able to discriminate visual landmarks and form visual long-term memory in a flight simulator.Studies focused on the molecular mechanism of long-term memory have shown that memory formation requires mRNA transcription and protein synthesis.However,little is known about the molecular mechanisms underlying the visual learning paradigm.The present study demonstrated that both spaced training procedure(STP)and consecutive training procedure(CTP)would induce long-term memory at 12 hour after training,and STP caused significantly higher 12-h memory scores compared with CTP.Labelfree quantification of liquid chromatography-tandem mass spectrometry(LC-MS/MS)and microarray were utilized to analyze proteomic and transcriptomic differences between the STP and CTP groups.Proteomic analysis revealed 30 up-regulated and 27 down-regulated proteins;Transcriptomic analysis revealed 145 up-regulated and 129 down-regulated genes.Among them,five candidate genes were verified by quantitative PCR,which revealed results similar to microarray.These results provide insight into the molecular components influencing visual long-term memory and facilitate further studies on the roles of identified genes in memory formation.展开更多
Sorting objects and events into categories and concepts is an important cognitive prerequisite that spares an individual the learning of every object or situation encountered in its daily life.Accordingly,specific ite...Sorting objects and events into categories and concepts is an important cognitive prerequisite that spares an individual the learning of every object or situation encountered in its daily life.Accordingly,specific items are classified in general groups that allow fast responses to novel situations.The present study assessed whether bamboo sharks Chiloscyllium griseum and Malawi cichlids Pseudotropheus zebra can distinguish sets of stimuli(each stimulus consisting of two abstract,geometric objects)that meet two conceptual preconditions,i.e.,(1)"sameness"versus"difference"and(2)a certain spatial arrangement of both objects.In two alternative forced choice experiments,individuals were first trained to choose two different,vertically arranged objects from two different but horizontally arranged ones.Pair discriminations were followed by extensive transfer test experiments.Transfer tests using stimuli consisting of(a)black and gray circles and(b)squares with novel geometric patterns provided conflicting information with respect to the learnt rule"choose two different,vertically arranged objects",thereby investigating(1)the individuals'ability to transfer previously gained knowledge to novel stimuli and(2)the abstract relational concept(s)or rule(s)applied to categorize these novel objects.Present results suggest that the level of processing and usage of both abstract concepts differed considerably between bamboo sharks and Malawi cichlids.Bamboo sharks seemed to combine both concepts-although not with equal but hierarchical prominence-pointing to advanced cognitive capabilities.Conversely,Malawi cichlids had difficulties in discriminating between symbols and failed to apply the acquired training knowledge on new sets of geometric and,in particular,gray-level transfer stimuli.展开更多
The past decade has witnessed rapid progress in AI research since the breakthrough in deep learning.AI technology has been applied in almost every field;therefore,technical and non-technical endusers must understand t...The past decade has witnessed rapid progress in AI research since the breakthrough in deep learning.AI technology has been applied in almost every field;therefore,technical and non-technical endusers must understand these technologies to exploit them.However existing materials are designed for experts,but non-technical users need appealing materials that deliver complex ideas in easy-tofollow steps.One notable tool that fits such a profile is scrollytelling,an approach to storytelling that provides readers with a natural and rich experience at the reader’s pace,along with in-depth interactive explanations of complex concepts.Hence,this work proposes a novel visualization design for creating a scrollytelling that can effectively explain an AI concept to non-technical users.As a demonstration of our design,we created a scrollytelling to explain the Siamese Neural Network for the visual similarity matching problem.Our approach helps create a visualization valuable for a shorttimeline situation like a sales pitch.The results show that the visualization based on our novel design helps improve non-technical users’perception and machine learning concept knowledge acquisition compared to traditional materials like online articles.展开更多
This article introduces Human–Computer Interaction Laboratory(HCIL)established at Seoul National University,Korea,in 2009.We first summarized the history of foundation,achievement,and collaboration for the last 10 ye...This article introduces Human–Computer Interaction Laboratory(HCIL)established at Seoul National University,Korea,in 2009.We first summarized the history of foundation,achievement,and collaboration for the last 10 years.Then,we delineated our current research directions related to information visualization.Finally,we presented our facilities and equipment to adequately support the research.展开更多
基金supported in part by National Natural Science Foundation of China(No.62176041)in part by Excellent Science and Technique Talent Foundation of Dalian(No.2022RY21).
文摘Significant advancements have beenwitnessed in visual tracking applications leveragingViT in recent years,mainly due to the formidablemodeling capabilities of Vision Transformer(ViT).However,the strong performance of such trackers heavily relies on ViT models pretrained for long periods,limitingmore flexible model designs for tracking tasks.To address this issue,we propose an efficient unsupervised ViT pretraining method for the tracking task based on masked autoencoders,called TrackMAE.During pretraining,we employ two shared-parameter ViTs,serving as the appearance encoder and motion encoder,respectively.The appearance encoder encodes randomly masked image data,while the motion encoder encodes randomly masked pairs of video frames.Subsequently,an appearance decoder and a motion decoder separately reconstruct the original image data and video frame data at the pixel level.In this way,ViT learns to understand both the appearance of images and the motion between video frames simultaneously.Experimental results demonstrate that ViT-Base and ViT-Large models,pretrained with TrackMAE and combined with a simple tracking head,achieve state-of-the-art(SOTA)performance without additional design.Moreover,compared to the currently popular MAE pretraining methods,TrackMAE consumes only 1/5 of the training time,which will facilitate the customization of diverse models for tracking.For instance,we additionally customize a lightweight ViT-XS,which achieves SOTA efficient tracking performance.
基金supported in part by National Natural Science Foundation of China(Nos.62002395,61976250 and U1811463)the National Key R&D Program of China(No.2021ZD0111601)the Guangdong Basic and Applied Basic Research Foundation,China(Nos.2021A15150123 and 2020B1515020048).
文摘Visual representation learning is ubiquitous in various real-world applications,including visual comprehension,video understanding,multi-modal analysis,human-computer interaction,and urban computing.Due to the emergence of huge amounts of multimodal heterogeneous spatial/temporal/spatial-temporal data in the big data era,the lack of interpretability,robustness,and out-of-distribution generalization are becoming the challenges of the existing visual models.The majority of the existing methods tend to fit the original data/variable distributions and ignore the essential causal relations behind the multi-modal knowledge,which lacks unified guidance and analysis about why modern visual representation learning methods easily collapse into data bias and have limited generalization and cognitive abilities.Inspired by the strong inference ability of human-level agents,recent years have therefore witnessed great effort in developing causal reasoning paradigms to realize robust representation and model learning with good cognitive ability.In this paper,we conduct a comprehensive review of existing causal reasoning methods for visual representation learning,covering fundamental theories,models,and datasets.The limitations of current methods and datasets are also discussed.Moreover,we propose some prospective challenges,opportunities,and future research directions for benchmarking causal reasoning algorithms in visual representation learning.This paper aims to provide a comprehensive overview of this emerging field,attract attention,encourage discussions,bring to the forefront the urgency of developing novel causal reasoning methods,publicly available benchmarks,and consensus-building standards for reliable visual representation learning and related real-world applications more efficiently.
文摘This study investigates Chinese elementary, middle, and high school students' L2 perceptual learning styles, imagination, ideal L2 self, and motivated L2 behavior. A perceptual learning style and L2 learning motivation questionnaire was administered to 1,667 students from elementary to high school. Statistical results revealed that Chinese elementary, middle, and high school students were more in favor of a visual learning style in comparison to their auditory and kinesthetic learning styles. This visually oriented style was significantly correlated with their ideal L2 self and motivated L2 behavior. The sequential regression analysis results indicated that the ideal L2 self and visual learning style were the most meaningful predictors for Chinese students' motivated L2 behavior. This study suggests that teachers need to help students create and maintain their ideal L2 self and facilitate their L2 learning motivation by providing more adequate visual teaching materials.
文摘Recent developments allowed establishing virtual-reality (VR) setups to study multiple aspects of visual learning in honey bees under controlled experimental conditions. Here, we adopted a VR environment to investigate the visual learning in the buff-tailed bumble bee Bombus terrestris. Based on responses to appetitive and aversive reinforcements used for conditioning, we show that bumble bees had the proper appetitive motivation to engage in the VR experiments and that they learned efficiently elemental color discriminations. In doing so, they reduced the latency to make a choice, increased the proportion of direct paths toward the virtual stimuli and walked faster toward them. Performance in a short-term retention test showed that bumble bees chose and fixated longer on the correct stimulus in the absence of reinforcement. Body size and weight, although variable across individuals, did not affect cognitive performances and had a mild impact on motor performances. Overall, we show that bumble bees are suitable experimental subjects for experiments on visual learning under VR conditions, which opens important perspectives for invasive studies on the neural and molecular bases of such learning given the robustness of these insects and the accessibility of their brain.
基金the National Basic Research Program of China(the 973 Program)(Grant No.2009CB918702)the National Natural Science Foundation of China(Grant Nos.30921064,30625022,31030037 and 31070944)the External Cooperation Program of the Chinese Academy of Sciences(Grant No.GJHZ1005).
文摘The fruit fly,Drosophila melanogaster,is able to discriminate visual landmarks and form visual long-term memory in a flight simulator.Studies focused on the molecular mechanism of long-term memory have shown that memory formation requires mRNA transcription and protein synthesis.However,little is known about the molecular mechanisms underlying the visual learning paradigm.The present study demonstrated that both spaced training procedure(STP)and consecutive training procedure(CTP)would induce long-term memory at 12 hour after training,and STP caused significantly higher 12-h memory scores compared with CTP.Labelfree quantification of liquid chromatography-tandem mass spectrometry(LC-MS/MS)and microarray were utilized to analyze proteomic and transcriptomic differences between the STP and CTP groups.Proteomic analysis revealed 30 up-regulated and 27 down-regulated proteins;Transcriptomic analysis revealed 145 up-regulated and 129 down-regulated genes.Among them,five candidate genes were verified by quantitative PCR,which revealed results similar to microarray.These results provide insight into the molecular components influencing visual long-term memory and facilitate further studies on the roles of identified genes in memory formation.
基金This study was funded by a DFG Grant(SCHL,1919/4-1)to V.S.
文摘Sorting objects and events into categories and concepts is an important cognitive prerequisite that spares an individual the learning of every object or situation encountered in its daily life.Accordingly,specific items are classified in general groups that allow fast responses to novel situations.The present study assessed whether bamboo sharks Chiloscyllium griseum and Malawi cichlids Pseudotropheus zebra can distinguish sets of stimuli(each stimulus consisting of two abstract,geometric objects)that meet two conceptual preconditions,i.e.,(1)"sameness"versus"difference"and(2)a certain spatial arrangement of both objects.In two alternative forced choice experiments,individuals were first trained to choose two different,vertically arranged objects from two different but horizontally arranged ones.Pair discriminations were followed by extensive transfer test experiments.Transfer tests using stimuli consisting of(a)black and gray circles and(b)squares with novel geometric patterns provided conflicting information with respect to the learnt rule"choose two different,vertically arranged objects",thereby investigating(1)the individuals'ability to transfer previously gained knowledge to novel stimuli and(2)the abstract relational concept(s)or rule(s)applied to categorize these novel objects.Present results suggest that the level of processing and usage of both abstract concepts differed considerably between bamboo sharks and Malawi cichlids.Bamboo sharks seemed to combine both concepts-although not with equal but hierarchical prominence-pointing to advanced cognitive capabilities.Conversely,Malawi cichlids had difficulties in discriminating between symbols and failed to apply the acquired training knowledge on new sets of geometric and,in particular,gray-level transfer stimuli.
基金supported by the National Natural Science Foundation of China(No.62132017).
文摘The past decade has witnessed rapid progress in AI research since the breakthrough in deep learning.AI technology has been applied in almost every field;therefore,technical and non-technical endusers must understand these technologies to exploit them.However existing materials are designed for experts,but non-technical users need appealing materials that deliver complex ideas in easy-tofollow steps.One notable tool that fits such a profile is scrollytelling,an approach to storytelling that provides readers with a natural and rich experience at the reader’s pace,along with in-depth interactive explanations of complex concepts.Hence,this work proposes a novel visualization design for creating a scrollytelling that can effectively explain an AI concept to non-technical users.As a demonstration of our design,we created a scrollytelling to explain the Siamese Neural Network for the visual similarity matching problem.Our approach helps create a visualization valuable for a shorttimeline situation like a sales pitch.The results show that the visualization based on our novel design helps improve non-technical users’perception and machine learning concept knowledge acquisition compared to traditional materials like online articles.
基金This work was supported in part by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.NRF-2019R1A2C2089062).
文摘This article introduces Human–Computer Interaction Laboratory(HCIL)established at Seoul National University,Korea,in 2009.We first summarized the history of foundation,achievement,and collaboration for the last 10 years.Then,we delineated our current research directions related to information visualization.Finally,we presented our facilities and equipment to adequately support the research.