Visual odometry is critical in visual simultaneous localization and mapping for robot navigation.However,the pose estimation performance of most current visual odometry algorithms degrades in scenes with unevenly dist...Visual odometry is critical in visual simultaneous localization and mapping for robot navigation.However,the pose estimation performance of most current visual odometry algorithms degrades in scenes with unevenly distributed features because dense features occupy excessive weight.Herein,a new human visual attention mechanism for point-and-line stereo visual odometry,which is called point-line-weight-mechanism visual odometry(PLWM-VO),is proposed to describe scene features in a global and balanced manner.A weight-adaptive model based on region partition and region growth is generated for the human visual attention mechanism,where sufficient attention is assigned to position-distinctive objects(sparse features in the environment).Furthermore,the sum of absolute differences algorithm is used to improve the accuracy of initialization for line features.Compared with the state-of-the-art method(ORB-VO),PLWM-VO show a 36.79%reduction in the absolute trajectory error on the Kitti and Euroc datasets.Although the time consumption of PLWM-VO is higher than that of ORB-VO,online test results indicate that PLWM-VO satisfies the real-time demand.The proposed algorithm not only significantly promotes the environmental adaptability of visual odometry,but also quantitatively demonstrates the superiority of the human visual attention mechanism.展开更多
Visual attention is a mechanism that enables the visual system to detect potentially important objects in complex environment. Most computational visual attention models are designed with inspirations from mammalian v...Visual attention is a mechanism that enables the visual system to detect potentially important objects in complex environment. Most computational visual attention models are designed with inspirations from mammalian visual systems.However, electrophysiological and behavioral evidences indicate that avian species are animals with high visual capability that can process complex information accurately in real time. Therefore,the visual system of the avian species, especially the nuclei related to the visual attention mechanism, are investigated in this paper. Afterwards, a hierarchical visual attention model is proposed for saliency detection. The optic tectum neuron responses are computed and the self-information is used to compute primary saliency maps in the first hierarchy. The "winner-takeall" network in the tecto-isthmal projection is simulated and final saliency maps are estimated with the regularized random walks ranking in the second hierarchy. Comparison results verify that the proposed model, which can define the focus of attention accurately, outperforms several state-of-the-art models.This study provides insights into the relationship between the visual attention mechanism and the avian visual pathways. The computational visual attention model may reveal the underlying neural mechanism of the nuclei for biological visual attention.展开更多
Inspired by human behaviors, a robot object tracking model is proposed on the basis of visual attention mechanism, which is fit for the theory of topological perception. The model integrates the image-driven, bottom-u...Inspired by human behaviors, a robot object tracking model is proposed on the basis of visual attention mechanism, which is fit for the theory of topological perception. The model integrates the image-driven, bottom-up attention and the object-driven, top-down attention, whereas the previous attention model has mostly focused on either the bottom-up or top-down attention. By the bottom-up component, the whole scene is segmented into the ground region and the salient regions. Guided by top-down strategy which is achieved by a topological graph, the object regions are separated from the salient regions. The salient regions except the object regions are the barrier regions. In order to estimate the model, a mobile robot platform is developed, on which some experiments are implemented. The experimental results indicate that processing an image with a resolution of 752 × 480 pixels takes less than 200 ms and the object regions are unabridged. The analysis obtained by comparing the proposed model with the existing model demonstrates that the proposed model has some advantages in robot object tracking in terms of speed and efficiency.展开更多
It is of great significance to rapidly detect targets in large-field remote sensing images,with limited computation resources.Employing relative achievements of visual attention in perception psychology,this paper pro...It is of great significance to rapidly detect targets in large-field remote sensing images,with limited computation resources.Employing relative achievements of visual attention in perception psychology,this paper proposes a hierarchical attention based model for target detection.Specifically,at the preattention stage,before getting salient regions,a fast computational approach is applied to build a saliency map.After that,the focus of attention(FOA) can be quickly obtained to indicate the salient objects.Then,at the attention stage,under the FOA guidance,the high-level visual features of the region of interest are extracted in parallel.Finally,at the post-attention stage,by integrating these parallel and independent visual attributes,a decision-template based classifier fusion strategy is proposed to discriminate the task-related targets from the other extracted salient objects.For comparison,experiments on ship detection are done for validating the effectiveness and feasibility of the proposed model.展开更多
Background Eye tracking te chnology is receiving increased attention in the field of virtual reality.Specifically,future gaze prediction is crucial in pre-computation for many applications such as gaze-contingent rend...Background Eye tracking te chnology is receiving increased attention in the field of virtual reality.Specifically,future gaze prediction is crucial in pre-computation for many applications such as gaze-contingent rendering,advertisement placement,and content-based design.To explore future gaze prediction,it is necessary to analyze the temporal continuity of visual attention in immersive virtual reality.Methods In this paper,the concept of temporal continuity of visual attention is presented.Subsequently,an autocorrelation function method is proposed to evaluate the temporal continuity.Thereafter,the temporal continuity is analyzed in both free-viewing and task-oriented conditions.Results Specifically,in free-viewing conditions,the analysis of a free-viewing gaze dataset indicates that the temporal continuity performs well only within a short time interval.A task-oriented game scene condition was created and conducted to collect users'gaze data.An analysis of the collected gaze data finds the temporal continuity has a similar performance with that of the free-viewing conditions.Temporal continuity can be applied to future gaze prediction and if it is good,users'current gaze positions can be directly utilized to predict their gaze positions in the future.Conclusions The current gaze's future prediction performances are further evaluated in both free-viewing and task-oriented conditions and discover that the current gaze can be efficiently applied to the task of short-term future gaze prediction.The task of long-term gaze prediction still remains to be explored.展开更多
Video summarization is applied to reduce redundancy and developa concise representation of key frames in the video, more recently, video summaries have been used through visual attention modeling. In these schemes,the...Video summarization is applied to reduce redundancy and developa concise representation of key frames in the video, more recently, video summaries have been used through visual attention modeling. In these schemes,the frames that stand out visually are extracted as key frames based on humanattention modeling theories. The schemes for modeling visual attention haveproven to be effective for video summaries. Nevertheless, the high cost ofcomputing in such techniques restricts their usability in everyday situations.In this context, we propose a method based on KFE (key frame extraction)technique, which is recommended based on an efficient and accurate visualattention model. The calculation effort is minimized by utilizing dynamicvisual highlighting based on the temporal gradient instead of the traditionaloptical flow techniques. In addition, an efficient technique using a discretecosine transformation is utilized for the static visual salience. The dynamic andstatic visual attention metrics are merged by means of a non-linear weightedfusion technique. Results of the system are compared with some existing stateof-the-art techniques for the betterment of accuracy. The experimental resultsof our proposed model indicate the efficiency and high standard in terms ofthe key frames extraction as output.展开更多
In the majority of the interaction process, the operator often focuses on the tracked 3D hand gesture model at the "interaction points" in the collision detectionscene, such as "grasp" and "release" and objects ...In the majority of the interaction process, the operator often focuses on the tracked 3D hand gesture model at the "interaction points" in the collision detectionscene, such as "grasp" and "release" and objects in the scene, without paying attention to the tracked 3D hand gesture model in the total procedure. Thus in this paper, a visual attention distribution model of operator in the "grasp", "translation", "release" and other basic operation procedures is first studied and a 3D hand gesture tracking algorithm based on this distribution model is proposed. Utilizing the algorithm, in the period with a low degree of visual attention, a pre-stored 3D hand gesture animation can be used to directly visualise a 3D hand gesture model in the interactive scene; in the time period with a high degree of visual attention, an existing "frame-by-frame tracking" approach can be adopted to obtain a 3D gesture model. The results demonstrate that the proposed method can achieve real-time tracking of 3D hand gestures with an effective improvement on the efficiency, fluency, and availability of 3D hand gesture interaction.展开更多
U-Net has achieved good performance with the small-scale datasets through skip connections to merge the features of the low-level layers and high-level layers and has been widely utilized in biomedical image segmentat...U-Net has achieved good performance with the small-scale datasets through skip connections to merge the features of the low-level layers and high-level layers and has been widely utilized in biomedical image segmentation as well as recent microstructure image segregation of the materials.Three representative visual attention mechanism modules,named as squeeze-and-excitation networks,convolutional block attention module,and extended calibration algorithm,were intro-duced into the traditional U-Net architecture to further improve the prediction accuracy.It is found that compared with the original U-Net architecture,the evaluation index of the improved U-Net architecture has been significantly improved for the microstructure segmentation of the steels with the ferrite/martensite composite microstructure and pearlite/ferrite composite microstructure and the complex martensite/austenite island/bainite microstructure,which demonstrates the advantages of the utilization of the visual attention mechanism in the microstructure segregation.The reasons for the accuracy improvement were discussed based on the feature maps analysis.展开更多
In transportation architecture,wayfinding quality is a crucial factor for determining transfer efficiency and level of service.When developing architectural design concepts,designers often employ their visual attentio...In transportation architecture,wayfinding quality is a crucial factor for determining transfer efficiency and level of service.When developing architectural design concepts,designers often employ their visual attention to imagine where passengers will look.A saliency model is a software program that can predict human visual attention.This research examined whether a saliency model or designer visual attention is a good predictor of passenger visual attention during wayfinding in side transportation architecture.Using a remote eye-tracking system,the eye-movements of 29 participants watching 100 still images depicting different indoor seenes of transportation architecture were recorded and transformed into saliency maps to illustrate participants'visual attention.Participants were categorized as either"designers"or"laypeople"based on their architectural design expertise.Similarities were compared among the"designers'"visual attention,saliency model predictions,and"laypeople's"visual attention.The results showed that while the"designers'"visual attention was the best predictor of that of"laypeople",followed by saliency models,a single desig ner's visual attend on was not a good predictor.The divergence in visual attention highlights the limitation of designers in predicting passenger wayfinding behavior and implies that integrating a saliency model in practice can be beneficial for wayfinding design.展开更多
Predicting visual attention facilitates an adaptive virtual museum environment and provides a context-aware and interactive user experience.Explorations toward development of a visual attention mechanism using eye-tra...Predicting visual attention facilitates an adaptive virtual museum environment and provides a context-aware and interactive user experience.Explorations toward development of a visual attention mechanism using eye-tracking data have so far been limited to 2D cases,and researchers are yet to approach this topic in a 3D virtual environment and from a spatiotemporal perspective.We present the first 3D Eye-tracking Dataset for Visual Attention modeling in a virtual Museum,known as the EDVAM.In addition,a deep learning model is devised and tested with the EDVAM to predict a user’s subsequent visual attention from previous eye movements.This work provides a reference for visual attention modeling and context-aware interaction in the context of virtual museums.展开更多
Dome display is expected to be used as effective visualization environment for modeling and simulation due to the features of frameless and high immersive sensation.However,since the users in the dome display can see ...Dome display is expected to be used as effective visualization environment for modeling and simulation due to the features of frameless and high immersive sensation.However,since the users in the dome display can see the projected image in arbitrary direction freely,it is difficult to share information among the viewers.In this research,in order to solve such a problem,the effect of visual attention guidance in the dome environment due to the effect of camera work was examined.As a visualization system,DomePlayer that can express the effect of camera work based on the camera work description language was developed.From the result of evaluation experiments using this system,the constraint condition of the camera work in the dome environment was derived and the effect of visual attention guidance by the camera work was evaluated.展开更多
Nowadays, there is a great need to investigate the effects of fatigue on physical as well as mental performance. The issues that are generally associated with extreme fatigue are that one can easily lose one’s focus ...Nowadays, there is a great need to investigate the effects of fatigue on physical as well as mental performance. The issues that are generally associated with extreme fatigue are that one can easily lose one’s focus while performing any particular activity whether it is physical or mental and this decreases one’s motivation to complete the task at hand efficiently and successfully. In the same line of thought, myriads of research studies posited the negative effects of fatigue on mental performance, and most techniques to induce fatigue to require normally long-time and repetitive visual search tasks. In this study, a visual search algorithm task was devised and customized using performance measures such as <em>d</em>’ (<strong>d-prime</strong>) and Speed Accuracy Trade-Off (<strong>SATF</strong>) as well as <strong>ROC</strong> analysis for classifier performance. The visual search algorithm consisted of distractors (<strong>L</strong>) and a target (<strong>T</strong>) whereby human participants had to press the appropriate keyboard button as fast as possible if they notice a target or not upon presentation of a visual stimulus. It was administered to human participants under laboratory conditions, and the reaction times, as well as accuracy of the participants, were monitored. It was found that the test image Size35Int255 was the best image to be used in terms of sensitivity and AUC (Area under Curve). Therefore, ongoing researches can use these findings to create their visual stimuli in such a way that the target and distractor images follow the size and intensity characteristics as found in this research.展开更多
Background:Age related macular degeneration(AMD)is one of the main causes of vision loss in older adults,generating,in most cases,a central scotoma that reduces central visual acuity(Noble&Chaudhary,2010).People a...Background:Age related macular degeneration(AMD)is one of the main causes of vision loss in older adults,generating,in most cases,a central scotoma that reduces central visual acuity(Noble&Chaudhary,2010).People affected by AMD have to rely on peripheral visual information and would highly benefit from efficiently allocating their attention to the periphery.Indeed,attention can improve peripheral spatial resolution(Carrasco,Ling&Read,2004)and can be allocated to a certain expanse of space outside of the central visual span,known as the attentional span.Attentional span has been shown to be decreased in people with AMD with less attention allocated to the periphery and more to the central visual field(Cheong et al.,2008),however it remains unknown whether aging is also a contributing factor.Methods:Fourteen healthy younger(mean age=21.8 years,SD=1.5)and 8 older adults(mean age=69.6 years,SD=7.3)performed a pop-out and a serial version of a visual search task,in the presence of different sized gaze-contingent invisible and visible artificial central scotomata(no scotoma,3°diameter,5°and 7°).Participants were asked to indicate as quickly as possible whether a target was present or not among distractors whose number varied(16,32 or 64 objects).We wished to determine whether the size of the scotoma,occluding different degrees of central vision,affected visual search differently for younger vs.older participants.Results:Both the younger and older participants showed higher reaction times(RTs)to find the target for the serial version(M=2,074 ms for younger adults,M=3,853 ms for older adults)compared to the pop-out version(M=866 ms,M=1,475 ms,P<0.001)and for more distractors(32 distractors compared to 16,and 64 compared to 32,P<0.01).Older adults showed longer RTs than younger adults for both versions of the task(P<0.01).We found a significant effect of scotoma size on older adults(3°scotoma M=3,276 ms;7°scotoma M=3,877 ms,P<0.05),however,accurate performance was higher with no scotoma(96%vs.92%,P<0.05)in the pop-out search task.This suggests that older participants privileged a fast decision at the expense of performance in those cases.For the younger adults,RTs were higher in the serial search task in the presence of a scotoma(M=2,074 ms)compared to the control condition(M=1,665 ms,P>0.05).Conclusions:These results suggest that older adults take longer to perform visual search compared to younger adults and tend to use peripheral visual less than younger adults;larger central scotomas disrupted their performance but not that of younger participants,who performed equally well with different central scotoma sizes.These findings suggest that aging is a contributing factor in the decrease of the peripheral attentional span.展开更多
Background:Research suggests that the analysis of facial expressions by a healthy brain would take place approximately 170 ms after the presentation of a facial expression in the superior temporal sulcus and the fusif...Background:Research suggests that the analysis of facial expressions by a healthy brain would take place approximately 170 ms after the presentation of a facial expression in the superior temporal sulcus and the fusiform gyrus,mostly in the right hemisphere.Some researchers argue that a fast pathway through the amygdala would allow automatic and early emotional treatment around 90 ms after stimulation.This treatment would be done subconsciously,even before this stimulus is perceived and could be approximated by presenting the stimuli quickly on the periphery of the fovea.The present study aimed to identify the neural correlates of a peripheral and simultaneous presentation of emotional expressions through a frequency tagging paradigm.Methods:The presentation of emotional facial expressions at a specific frequency induces in the visual cortex a stable and precise response to the presentation frequency[i.e.,a steady-state visual evoked potential(ssVEP)]that can be used as a frequency tag(i.e.,a frequency-tag to follow the cortical treatment of this stimulus.Here,the use of different specific stimulation frequencies allowed us to label the different facial expressions presented simultaneously and to obtain a reliable cortical response being associated with(I)each of the emotions and(II)the different times of presentations repeated(1/0.170 ms=~5.8 Hz,1/0.090 ms=~10.8 Hz).To identify the regions involved in emotional discrimination,we subtracted the brain activity induced by the rapid presentation of six emotional expressions of the activity induced by the presentation of the same emotion(reduced by neural adaptation).The results were compared to the hemisphere in which attention was sought,emotion and frequency of stimulation.Results:The signal-to-noise ratio of the cerebral oscillations referring to the treatment of the expression of fear was stronger in the regions specific to the emotional treatment when they were presented in the subjects peripheral vision,unbeknownst to them.In addition,the peripheral emotional treatment of fear at 10.8 Hz was associated with greater activation within the Gamma 1 and 2 frequency bands in the expected regions(frontotemporal and T6),as well as desynchronization in the Alpha frequency bands for the temporal regions.This modulation of the spectral power is independent of the attentional request.Conclusions:These results suggest that the emotional stimulation of fear presented in the peripheral vision and outside the attentional framework elicit an increase in brain activity,especially in the temporal lobe.The localization of this activity as well as the optimal stimulation frequency found for this facial expression suggests that it is treated by the fast pathway of the magnocellular layers.展开更多
Infrared target detection models are more required than ever before to be deployed on embedded platforms,which requires models with less memory consumption and better real-time performance while considering accuracy.T...Infrared target detection models are more required than ever before to be deployed on embedded platforms,which requires models with less memory consumption and better real-time performance while considering accuracy.To address the above challenges,we propose a modified You Only Look Once(YOLO)algorithm PF-YOLOv4-Tiny.The algorithm incorpo-rates spatial pyramidal pooling(SPP)and squeeze-and-excitation(SE)visual attention modules to enhance the target localization capability.The PANet-based-feature pyramid networks(P-FPN)are proposed to transfer semantic information and location information simultaneously to ameliorate detection accuracy.To lighten the network,the standard convolutions other than the backbone network are replaced with depthwise separable convolutions.In post-processing the images,the soft-non-maximum suppression(soft-NMS)algorithm is employed to subside the missed and false detection problems caused by the occlusion between targets.The accuracy of our model can finally reach 61.75%,while the total Params is only 9.3 M and GFLOPs is 11.At the same time,the inference speed reaches 87 FPS on NVIDIA GeForce GTX 1650 Ti,which can meet the requirements of the infrared target detection algorithm for the embedded deployments.展开更多
In this paper, we summarize 3D perception-oriented algorithms for perceptually driven 3D video coding. Several perceptual ef- fects have been exploited for 2D video viewing; however, this is not yet the case for 3D vi...In this paper, we summarize 3D perception-oriented algorithms for perceptually driven 3D video coding. Several perceptual ef- fects have been exploited for 2D video viewing; however, this is not yet the case for 3D video viewing. 3D video requires depth perception, which implies binocular effects such as con fl icts, fusion, and rivalry. A better understanding of these effects is necessary for 3D perceptual compression, which provides users with a more comfortable visual experience for video that is de- livered over a channel with limited bandwidth. We present state-of-the-art of 3D visual attention models, 3D just-notice- able difference models, and 3D texture-synthesis models that address 3D human vision issues in 3D video coding and trans-mission.展开更多
Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient ...Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient full-duplex strategy network(FSNet)to address this issue,by considering a better mutual restraint scheme linking motion and appearance allowing exploitation of cross-modal features from the fusion and decoding stage.Specifically,we introduce a relational cross-attention module(RCAM)to achieve bidirectional message propagation across embedding sub-spaces.To improve the model’s robustness and update inconsistent features from the spatiotemporal embeddings,we adopt a bidirectional purification module after the RCAM.Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios(e.g.,motion blur and occlusion),and compares well to leading methods both for video object segmentation and video salient object detection.The project is publicly available at https://github.com/GewelsJI/FSNet.展开更多
Salient object detection remains one of the most important and active research topics in computer vision,with wide-ranging applications to object recognition,scene understanding,image retrieval,context aware image edi...Salient object detection remains one of the most important and active research topics in computer vision,with wide-ranging applications to object recognition,scene understanding,image retrieval,context aware image editing,image compression,etc. Most existing methods directly determine salient objects by exploring various salient object features.Here,we propose a novel graph based ranking method to detect and segment the most salient object in a scene according to its relationship to image border(background) regions,i.e.,the background feature.Firstly,we use regions/super-pixels as graph nodes,which are fully connected to enable both long range and short range relations to be modeled. The relationship of each region to the image border(background) is evaluated in two stages:(i) ranking with hard background queries,and(ii) ranking with soft foreground queries. We experimentally show how this two-stage ranking based salient object detection method is complementary to traditional methods,and that integrated results outperform both. Our method allows the exploitation of intrinsic image structure to achieve high quality salient object determination using a quadratic optimization framework,with a closed form solution which can be easily computed.Extensive method evaluation and comparison using three challenging saliency datasets demonstrate that our method consistently outperforms 10 state-of-theart models by a big margin.展开更多
Purpose–The purpose of this paper is to propose a new algorithm chaotic pigeon-inspired optimization(CPIO),which can effectively improve the computing efficiency of the basic Itti’s model for saliency-based detectio...Purpose–The purpose of this paper is to propose a new algorithm chaotic pigeon-inspired optimization(CPIO),which can effectively improve the computing efficiency of the basic Itti’s model for saliency-based detection.The CPIO algorithm and relevant applications are aimed at air surveillance for target detection.Design/methodology/approach–To compare the improvements of the performance on Itti’s model,three bio-inspired algorithms including particle swarm optimization(PSO),brain storm optimization(BSO)and CPIO are applied to optimize the weight coefficients of each feature map in the saliency computation.Findings–According to the experimental results in optimized Itti’s model,CPIO outperforms PSO in terms of computing efficiency and is superior to BSO in terms of searching ability.Therefore,CPIO provides the best overall properties among the three algorithms.Practical implications–The algorithm proposed in this paper can be extensively applied for fast,accurate and multi-target detections in aerial images.Originality/value–CPIO algorithm is originally proposed,which is very promising in solving complicated optimization problems.展开更多
基金Supported by Tianjin Municipal Natural Science Foundation of China(Grant No.19JCJQJC61600)Hebei Provincial Natural Science Foundation of China(Grant Nos.F2020202051,F2020202053).
文摘Visual odometry is critical in visual simultaneous localization and mapping for robot navigation.However,the pose estimation performance of most current visual odometry algorithms degrades in scenes with unevenly distributed features because dense features occupy excessive weight.Herein,a new human visual attention mechanism for point-and-line stereo visual odometry,which is called point-line-weight-mechanism visual odometry(PLWM-VO),is proposed to describe scene features in a global and balanced manner.A weight-adaptive model based on region partition and region growth is generated for the human visual attention mechanism,where sufficient attention is assigned to position-distinctive objects(sparse features in the environment).Furthermore,the sum of absolute differences algorithm is used to improve the accuracy of initialization for line features.Compared with the state-of-the-art method(ORB-VO),PLWM-VO show a 36.79%reduction in the absolute trajectory error on the Kitti and Euroc datasets.Although the time consumption of PLWM-VO is higher than that of ORB-VO,online test results indicate that PLWM-VO satisfies the real-time demand.The proposed algorithm not only significantly promotes the environmental adaptability of visual odometry,but also quantitatively demonstrates the superiority of the human visual attention mechanism.
基金supported by Natural Science Foundation of China(61425008,61333004,61273054)
文摘Visual attention is a mechanism that enables the visual system to detect potentially important objects in complex environment. Most computational visual attention models are designed with inspirations from mammalian visual systems.However, electrophysiological and behavioral evidences indicate that avian species are animals with high visual capability that can process complex information accurately in real time. Therefore,the visual system of the avian species, especially the nuclei related to the visual attention mechanism, are investigated in this paper. Afterwards, a hierarchical visual attention model is proposed for saliency detection. The optic tectum neuron responses are computed and the self-information is used to compute primary saliency maps in the first hierarchy. The "winner-takeall" network in the tecto-isthmal projection is simulated and final saliency maps are estimated with the regularized random walks ranking in the second hierarchy. Comparison results verify that the proposed model, which can define the focus of attention accurately, outperforms several state-of-the-art models.This study provides insights into the relationship between the visual attention mechanism and the avian visual pathways. The computational visual attention model may reveal the underlying neural mechanism of the nuclei for biological visual attention.
基金supported by National Basic Research Program of China (973 Program) (No. 2006CB300407)National Natural Science Foundation of China (No. 50775017)
文摘Inspired by human behaviors, a robot object tracking model is proposed on the basis of visual attention mechanism, which is fit for the theory of topological perception. The model integrates the image-driven, bottom-up attention and the object-driven, top-down attention, whereas the previous attention model has mostly focused on either the bottom-up or top-down attention. By the bottom-up component, the whole scene is segmented into the ground region and the salient regions. Guided by top-down strategy which is achieved by a topological graph, the object regions are separated from the salient regions. The salient regions except the object regions are the barrier regions. In order to estimate the model, a mobile robot platform is developed, on which some experiments are implemented. The experimental results indicate that processing an image with a resolution of 752 × 480 pixels takes less than 200 ms and the object regions are unabridged. The analysis obtained by comparing the proposed model with the existing model demonstrates that the proposed model has some advantages in robot object tracking in terms of speed and efficiency.
基金supported by the National Natural Science Foundation of China (40871157)
文摘It is of great significance to rapidly detect targets in large-field remote sensing images,with limited computation resources.Employing relative achievements of visual attention in perception psychology,this paper proposes a hierarchical attention based model for target detection.Specifically,at the preattention stage,before getting salient regions,a fast computational approach is applied to build a saliency map.After that,the focus of attention(FOA) can be quickly obtained to indicate the salient objects.Then,at the attention stage,under the FOA guidance,the high-level visual features of the region of interest are extracted in parallel.Finally,at the post-attention stage,by integrating these parallel and independent visual attributes,a decision-template based classifier fusion strategy is proposed to discriminate the task-related targets from the other extracted salient objects.For comparison,experiments on ship detection are done for validating the effectiveness and feasibility of the proposed model.
基金the National Key R&D Program of China(2017 YFB 0203000)National Natural Science Foundation of China(61632003,61661146002,61631001).
文摘Background Eye tracking te chnology is receiving increased attention in the field of virtual reality.Specifically,future gaze prediction is crucial in pre-computation for many applications such as gaze-contingent rendering,advertisement placement,and content-based design.To explore future gaze prediction,it is necessary to analyze the temporal continuity of visual attention in immersive virtual reality.Methods In this paper,the concept of temporal continuity of visual attention is presented.Subsequently,an autocorrelation function method is proposed to evaluate the temporal continuity.Thereafter,the temporal continuity is analyzed in both free-viewing and task-oriented conditions.Results Specifically,in free-viewing conditions,the analysis of a free-viewing gaze dataset indicates that the temporal continuity performs well only within a short time interval.A task-oriented game scene condition was created and conducted to collect users'gaze data.An analysis of the collected gaze data finds the temporal continuity has a similar performance with that of the free-viewing conditions.Temporal continuity can be applied to future gaze prediction and if it is good,users'current gaze positions can be directly utilized to predict their gaze positions in the future.Conclusions The current gaze's future prediction performances are further evaluated in both free-viewing and task-oriented conditions and discover that the current gaze can be efficiently applied to the task of short-term future gaze prediction.The task of long-term gaze prediction still remains to be explored.
基金This work was supported in part by Qatar National Library,Doha,Qatar,and in part by the Qatar University Internal under Grant IRCC-2021-010。
文摘Video summarization is applied to reduce redundancy and developa concise representation of key frames in the video, more recently, video summaries have been used through visual attention modeling. In these schemes,the frames that stand out visually are extracted as key frames based on humanattention modeling theories. The schemes for modeling visual attention haveproven to be effective for video summaries. Nevertheless, the high cost ofcomputing in such techniques restricts their usability in everyday situations.In this context, we propose a method based on KFE (key frame extraction)technique, which is recommended based on an efficient and accurate visualattention model. The calculation effort is minimized by utilizing dynamicvisual highlighting based on the temporal gradient instead of the traditionaloptical flow techniques. In addition, an efficient technique using a discretecosine transformation is utilized for the static visual salience. The dynamic andstatic visual attention metrics are merged by means of a non-linear weightedfusion technique. Results of the system are compared with some existing stateof-the-art techniques for the betterment of accuracy. The experimental resultsof our proposed model indicate the efficiency and high standard in terms ofthe key frames extraction as output.
基金Supported by the National Natural Science Foundation of China(61472163)the National Key Research&Development Plan of China(2016YFB1001403)the Science and Technology Project of Shandong Province(2015GGX101025)
文摘In the majority of the interaction process, the operator often focuses on the tracked 3D hand gesture model at the "interaction points" in the collision detectionscene, such as "grasp" and "release" and objects in the scene, without paying attention to the tracked 3D hand gesture model in the total procedure. Thus in this paper, a visual attention distribution model of operator in the "grasp", "translation", "release" and other basic operation procedures is first studied and a 3D hand gesture tracking algorithm based on this distribution model is proposed. Utilizing the algorithm, in the period with a low degree of visual attention, a pre-stored 3D hand gesture animation can be used to directly visualise a 3D hand gesture model in the interactive scene; in the time period with a high degree of visual attention, an existing "frame-by-frame tracking" approach can be adopted to obtain a 3D gesture model. The results demonstrate that the proposed method can achieve real-time tracking of 3D hand gestures with an effective improvement on the efficiency, fluency, and availability of 3D hand gesture interaction.
基金support from National Natural Science Foundation of China(Nos.52071238 and U20A20279)National Key Research and Development Program of China(2022YFB3706701)the 111 Project(No.D18018)。
文摘U-Net has achieved good performance with the small-scale datasets through skip connections to merge the features of the low-level layers and high-level layers and has been widely utilized in biomedical image segmentation as well as recent microstructure image segregation of the materials.Three representative visual attention mechanism modules,named as squeeze-and-excitation networks,convolutional block attention module,and extended calibration algorithm,were intro-duced into the traditional U-Net architecture to further improve the prediction accuracy.It is found that compared with the original U-Net architecture,the evaluation index of the improved U-Net architecture has been significantly improved for the microstructure segmentation of the steels with the ferrite/martensite composite microstructure and pearlite/ferrite composite microstructure and the complex martensite/austenite island/bainite microstructure,which demonstrates the advantages of the utilization of the visual attention mechanism in the microstructure segregation.The reasons for the accuracy improvement were discussed based on the feature maps analysis.
基金supported by The Fun dame ntal Research Funds for the Cen tral Univ ersities Grant No.2019JBM317.
文摘In transportation architecture,wayfinding quality is a crucial factor for determining transfer efficiency and level of service.When developing architectural design concepts,designers often employ their visual attention to imagine where passengers will look.A saliency model is a software program that can predict human visual attention.This research examined whether a saliency model or designer visual attention is a good predictor of passenger visual attention during wayfinding in side transportation architecture.Using a remote eye-tracking system,the eye-movements of 29 participants watching 100 still images depicting different indoor seenes of transportation architecture were recorded and transformed into saliency maps to illustrate participants'visual attention.Participants were categorized as either"designers"or"laypeople"based on their architectural design expertise.Similarities were compared among the"designers'"visual attention,saliency model predictions,and"laypeople's"visual attention.The results showed that while the"designers'"visual attention was the best predictor of that of"laypeople",followed by saliency models,a single desig ner's visual attend on was not a good predictor.The divergence in visual attention highlights the limitation of designers in predicting passenger wayfinding behavior and implies that integrating a saliency model in practice can be beneficial for wayfinding design.
基金Project supported by the National Natural Science Foundation of China(No.61802341)the National Science and Technology Innovation 2030 Major Project of the Ministry of Science and Technology of China(No.2018AAA0100703)the Research Innovation Plan of the Ministry of Education of China,and the Provincial Key Research and Development Plan of Zhejiang Province,China(No.2019C03137)。
文摘Predicting visual attention facilitates an adaptive virtual museum environment and provides a context-aware and interactive user experience.Explorations toward development of a visual attention mechanism using eye-tracking data have so far been limited to 2D cases,and researchers are yet to approach this topic in a 3D virtual environment and from a spatiotemporal perspective.We present the first 3D Eye-tracking Dataset for Visual Attention modeling in a virtual Museum,known as the EDVAM.In addition,a deep learning model is devised and tested with the EDVAM to predict a user’s subsequent visual attention from previous eye movements.This work provides a reference for visual attention modeling and context-aware interaction in the context of virtual museums.
基金Grant No.25540171 Grant-in-Aid for Scientific Research.
文摘Dome display is expected to be used as effective visualization environment for modeling and simulation due to the features of frameless and high immersive sensation.However,since the users in the dome display can see the projected image in arbitrary direction freely,it is difficult to share information among the viewers.In this research,in order to solve such a problem,the effect of visual attention guidance in the dome environment due to the effect of camera work was examined.As a visualization system,DomePlayer that can express the effect of camera work based on the camera work description language was developed.From the result of evaluation experiments using this system,the constraint condition of the camera work in the dome environment was derived and the effect of visual attention guidance by the camera work was evaluated.
文摘Nowadays, there is a great need to investigate the effects of fatigue on physical as well as mental performance. The issues that are generally associated with extreme fatigue are that one can easily lose one’s focus while performing any particular activity whether it is physical or mental and this decreases one’s motivation to complete the task at hand efficiently and successfully. In the same line of thought, myriads of research studies posited the negative effects of fatigue on mental performance, and most techniques to induce fatigue to require normally long-time and repetitive visual search tasks. In this study, a visual search algorithm task was devised and customized using performance measures such as <em>d</em>’ (<strong>d-prime</strong>) and Speed Accuracy Trade-Off (<strong>SATF</strong>) as well as <strong>ROC</strong> analysis for classifier performance. The visual search algorithm consisted of distractors (<strong>L</strong>) and a target (<strong>T</strong>) whereby human participants had to press the appropriate keyboard button as fast as possible if they notice a target or not upon presentation of a visual stimulus. It was administered to human participants under laboratory conditions, and the reaction times, as well as accuracy of the participants, were monitored. It was found that the test image Size35Int255 was the best image to be used in terms of sensitivity and AUC (Area under Curve). Therefore, ongoing researches can use these findings to create their visual stimuli in such a way that the target and distractor images follow the size and intensity characteristics as found in this research.
文摘Background:Age related macular degeneration(AMD)is one of the main causes of vision loss in older adults,generating,in most cases,a central scotoma that reduces central visual acuity(Noble&Chaudhary,2010).People affected by AMD have to rely on peripheral visual information and would highly benefit from efficiently allocating their attention to the periphery.Indeed,attention can improve peripheral spatial resolution(Carrasco,Ling&Read,2004)and can be allocated to a certain expanse of space outside of the central visual span,known as the attentional span.Attentional span has been shown to be decreased in people with AMD with less attention allocated to the periphery and more to the central visual field(Cheong et al.,2008),however it remains unknown whether aging is also a contributing factor.Methods:Fourteen healthy younger(mean age=21.8 years,SD=1.5)and 8 older adults(mean age=69.6 years,SD=7.3)performed a pop-out and a serial version of a visual search task,in the presence of different sized gaze-contingent invisible and visible artificial central scotomata(no scotoma,3°diameter,5°and 7°).Participants were asked to indicate as quickly as possible whether a target was present or not among distractors whose number varied(16,32 or 64 objects).We wished to determine whether the size of the scotoma,occluding different degrees of central vision,affected visual search differently for younger vs.older participants.Results:Both the younger and older participants showed higher reaction times(RTs)to find the target for the serial version(M=2,074 ms for younger adults,M=3,853 ms for older adults)compared to the pop-out version(M=866 ms,M=1,475 ms,P<0.001)and for more distractors(32 distractors compared to 16,and 64 compared to 32,P<0.01).Older adults showed longer RTs than younger adults for both versions of the task(P<0.01).We found a significant effect of scotoma size on older adults(3°scotoma M=3,276 ms;7°scotoma M=3,877 ms,P<0.05),however,accurate performance was higher with no scotoma(96%vs.92%,P<0.05)in the pop-out search task.This suggests that older participants privileged a fast decision at the expense of performance in those cases.For the younger adults,RTs were higher in the serial search task in the presence of a scotoma(M=2,074 ms)compared to the control condition(M=1,665 ms,P>0.05).Conclusions:These results suggest that older adults take longer to perform visual search compared to younger adults and tend to use peripheral visual less than younger adults;larger central scotomas disrupted their performance but not that of younger participants,who performed equally well with different central scotoma sizes.These findings suggest that aging is a contributing factor in the decrease of the peripheral attentional span.
文摘Background:Research suggests that the analysis of facial expressions by a healthy brain would take place approximately 170 ms after the presentation of a facial expression in the superior temporal sulcus and the fusiform gyrus,mostly in the right hemisphere.Some researchers argue that a fast pathway through the amygdala would allow automatic and early emotional treatment around 90 ms after stimulation.This treatment would be done subconsciously,even before this stimulus is perceived and could be approximated by presenting the stimuli quickly on the periphery of the fovea.The present study aimed to identify the neural correlates of a peripheral and simultaneous presentation of emotional expressions through a frequency tagging paradigm.Methods:The presentation of emotional facial expressions at a specific frequency induces in the visual cortex a stable and precise response to the presentation frequency[i.e.,a steady-state visual evoked potential(ssVEP)]that can be used as a frequency tag(i.e.,a frequency-tag to follow the cortical treatment of this stimulus.Here,the use of different specific stimulation frequencies allowed us to label the different facial expressions presented simultaneously and to obtain a reliable cortical response being associated with(I)each of the emotions and(II)the different times of presentations repeated(1/0.170 ms=~5.8 Hz,1/0.090 ms=~10.8 Hz).To identify the regions involved in emotional discrimination,we subtracted the brain activity induced by the rapid presentation of six emotional expressions of the activity induced by the presentation of the same emotion(reduced by neural adaptation).The results were compared to the hemisphere in which attention was sought,emotion and frequency of stimulation.Results:The signal-to-noise ratio of the cerebral oscillations referring to the treatment of the expression of fear was stronger in the regions specific to the emotional treatment when they were presented in the subjects peripheral vision,unbeknownst to them.In addition,the peripheral emotional treatment of fear at 10.8 Hz was associated with greater activation within the Gamma 1 and 2 frequency bands in the expected regions(frontotemporal and T6),as well as desynchronization in the Alpha frequency bands for the temporal regions.This modulation of the spectral power is independent of the attentional request.Conclusions:These results suggest that the emotional stimulation of fear presented in the peripheral vision and outside the attentional framework elicit an increase in brain activity,especially in the temporal lobe.The localization of this activity as well as the optimal stimulation frequency found for this facial expression suggests that it is treated by the fast pathway of the magnocellular layers.
基金supported by The Natural Science Foundation of the Jiangsu Higher Education Institutions of China(Grants No.19JKB520031).
文摘Infrared target detection models are more required than ever before to be deployed on embedded platforms,which requires models with less memory consumption and better real-time performance while considering accuracy.To address the above challenges,we propose a modified You Only Look Once(YOLO)algorithm PF-YOLOv4-Tiny.The algorithm incorpo-rates spatial pyramidal pooling(SPP)and squeeze-and-excitation(SE)visual attention modules to enhance the target localization capability.The PANet-based-feature pyramid networks(P-FPN)are proposed to transfer semantic information and location information simultaneously to ameliorate detection accuracy.To lighten the network,the standard convolutions other than the backbone network are replaced with depthwise separable convolutions.In post-processing the images,the soft-non-maximum suppression(soft-NMS)algorithm is employed to subside the missed and false detection problems caused by the occlusion between targets.The accuracy of our model can finally reach 61.75%,while the total Params is only 9.3 M and GFLOPs is 11.At the same time,the inference speed reaches 87 FPS on NVIDIA GeForce GTX 1650 Ti,which can meet the requirements of the infrared target detection algorithm for the embedded deployments.
文摘In this paper, we summarize 3D perception-oriented algorithms for perceptually driven 3D video coding. Several perceptual ef- fects have been exploited for 2D video viewing; however, this is not yet the case for 3D video viewing. 3D video requires depth perception, which implies binocular effects such as con fl icts, fusion, and rivalry. A better understanding of these effects is necessary for 3D perceptual compression, which provides users with a more comfortable visual experience for video that is de- livered over a channel with limited bandwidth. We present state-of-the-art of 3D visual attention models, 3D just-notice- able difference models, and 3D texture-synthesis models that address 3D human vision issues in 3D video coding and trans-mission.
基金This work was supported by the National Natural Science Foundation of China(62176169,61703077,and 62102207).
文摘Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient full-duplex strategy network(FSNet)to address this issue,by considering a better mutual restraint scheme linking motion and appearance allowing exploitation of cross-modal features from the fusion and decoding stage.Specifically,we introduce a relational cross-attention module(RCAM)to achieve bidirectional message propagation across embedding sub-spaces.To improve the model’s robustness and update inconsistent features from the spatiotemporal embeddings,we adopt a bidirectional purification module after the RCAM.Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios(e.g.,motion blur and occlusion),and compares well to leading methods both for video object segmentation and video salient object detection.The project is publicly available at https://github.com/GewelsJI/FSNet.
基金funded by the National Natural Science Foundation of China under project No.61231014 and No.61572264,respectivelysupported by Defense Advanced Research Projects Agency (No.HR001110-C-0034)+1 种基金the National Science Foundation (No.BCS-0827764)the Army Research Office (No.W911NF-08-1-0360)
文摘Salient object detection remains one of the most important and active research topics in computer vision,with wide-ranging applications to object recognition,scene understanding,image retrieval,context aware image editing,image compression,etc. Most existing methods directly determine salient objects by exploring various salient object features.Here,we propose a novel graph based ranking method to detect and segment the most salient object in a scene according to its relationship to image border(background) regions,i.e.,the background feature.Firstly,we use regions/super-pixels as graph nodes,which are fully connected to enable both long range and short range relations to be modeled. The relationship of each region to the image border(background) is evaluated in two stages:(i) ranking with hard background queries,and(ii) ranking with soft foreground queries. We experimentally show how this two-stage ranking based salient object detection method is complementary to traditional methods,and that integrated results outperform both. Our method allows the exploitation of intrinsic image structure to achieve high quality salient object determination using a quadratic optimization framework,with a closed form solution which can be easily computed.Extensive method evaluation and comparison using three challenging saliency datasets demonstrate that our method consistently outperforms 10 state-of-theart models by a big margin.
文摘Purpose–The purpose of this paper is to propose a new algorithm chaotic pigeon-inspired optimization(CPIO),which can effectively improve the computing efficiency of the basic Itti’s model for saliency-based detection.The CPIO algorithm and relevant applications are aimed at air surveillance for target detection.Design/methodology/approach–To compare the improvements of the performance on Itti’s model,three bio-inspired algorithms including particle swarm optimization(PSO),brain storm optimization(BSO)and CPIO are applied to optimize the weight coefficients of each feature map in the saliency computation.Findings–According to the experimental results in optimized Itti’s model,CPIO outperforms PSO in terms of computing efficiency and is superior to BSO in terms of searching ability.Therefore,CPIO provides the best overall properties among the three algorithms.Practical implications–The algorithm proposed in this paper can be extensively applied for fast,accurate and multi-target detections in aerial images.Originality/value–CPIO algorithm is originally proposed,which is very promising in solving complicated optimization problems.