Visual attention is a mechanism that enables the visual system to detect potentially important objects in complex environment. Most computational visual attention models are designed with inspirations from mammalian v...Visual attention is a mechanism that enables the visual system to detect potentially important objects in complex environment. Most computational visual attention models are designed with inspirations from mammalian visual systems.However, electrophysiological and behavioral evidences indicate that avian species are animals with high visual capability that can process complex information accurately in real time. Therefore,the visual system of the avian species, especially the nuclei related to the visual attention mechanism, are investigated in this paper. Afterwards, a hierarchical visual attention model is proposed for saliency detection. The optic tectum neuron responses are computed and the self-information is used to compute primary saliency maps in the first hierarchy. The "winner-takeall" network in the tecto-isthmal projection is simulated and final saliency maps are estimated with the regularized random walks ranking in the second hierarchy. Comparison results verify that the proposed model, which can define the focus of attention accurately, outperforms several state-of-the-art models.This study provides insights into the relationship between the visual attention mechanism and the avian visual pathways. The computational visual attention model may reveal the underlying neural mechanism of the nuclei for biological visual attention.展开更多
Inspired by human behaviors, a robot object tracking model is proposed on the basis of visual attention mechanism, which is fit for the theory of topological perception. The model integrates the image-driven, bottom-u...Inspired by human behaviors, a robot object tracking model is proposed on the basis of visual attention mechanism, which is fit for the theory of topological perception. The model integrates the image-driven, bottom-up attention and the object-driven, top-down attention, whereas the previous attention model has mostly focused on either the bottom-up or top-down attention. By the bottom-up component, the whole scene is segmented into the ground region and the salient regions. Guided by top-down strategy which is achieved by a topological graph, the object regions are separated from the salient regions. The salient regions except the object regions are the barrier regions. In order to estimate the model, a mobile robot platform is developed, on which some experiments are implemented. The experimental results indicate that processing an image with a resolution of 752 × 480 pixels takes less than 200 ms and the object regions are unabridged. The analysis obtained by comparing the proposed model with the existing model demonstrates that the proposed model has some advantages in robot object tracking in terms of speed and efficiency.展开更多
An improved method with better selection capability using a single camera was presented in comparison with previous method. To improve performance, two methods were applied to landmark selection in an unfamiliar indoo...An improved method with better selection capability using a single camera was presented in comparison with previous method. To improve performance, two methods were applied to landmark selection in an unfamiliar indoor environment. First, a modified visual attention method was proposed to automatically select a candidate region as a more useful landmark. In visual attention, candidate landmark regions were selected with different characteristics of ambient color and intensity in the image. Then, the more useful landmarks were selected by combining the candidate regions using clustering. As generally implemented, automatic landmark selection by vision-based simultaneous localization and mapping(SLAM) results in many useless landmarks, because the features of images are distinguished from the surrounding environment but detected repeatedly. These useless landmarks create a serious problem for the SLAM system because they complicate data association. To address this, a method was proposed in which the robot initially collected landmarks through automatic detection while traversing the entire area where the robot performed SLAM, and then, the robot selected only those landmarks that exhibited high rarity through clustering, which enhanced the system performance. Experimental results show that this method of automatic landmark selection results in selection of a high-rarity landmark. The average error of the performance of SLAM decreases 52% compared with conventional methods and the accuracy of data associations increases.展开更多
Visual odometry is critical in visual simultaneous localization and mapping for robot navigation.However,the pose estimation performance of most current visual odometry algorithms degrades in scenes with unevenly dist...Visual odometry is critical in visual simultaneous localization and mapping for robot navigation.However,the pose estimation performance of most current visual odometry algorithms degrades in scenes with unevenly distributed features because dense features occupy excessive weight.Herein,a new human visual attention mechanism for point-and-line stereo visual odometry,which is called point-line-weight-mechanism visual odometry(PLWM-VO),is proposed to describe scene features in a global and balanced manner.A weight-adaptive model based on region partition and region growth is generated for the human visual attention mechanism,where sufficient attention is assigned to position-distinctive objects(sparse features in the environment).Furthermore,the sum of absolute differences algorithm is used to improve the accuracy of initialization for line features.Compared with the state-of-the-art method(ORB-VO),PLWM-VO show a 36.79%reduction in the absolute trajectory error on the Kitti and Euroc datasets.Although the time consumption of PLWM-VO is higher than that of ORB-VO,online test results indicate that PLWM-VO satisfies the real-time demand.The proposed algorithm not only significantly promotes the environmental adaptability of visual odometry,but also quantitatively demonstrates the superiority of the human visual attention mechanism.展开更多
It is of great significance to rapidly detect targets in large-field remote sensing images,with limited computation resources.Employing relative achievements of visual attention in perception psychology,this paper pro...It is of great significance to rapidly detect targets in large-field remote sensing images,with limited computation resources.Employing relative achievements of visual attention in perception psychology,this paper proposes a hierarchical attention based model for target detection.Specifically,at the preattention stage,before getting salient regions,a fast computational approach is applied to build a saliency map.After that,the focus of attention(FOA) can be quickly obtained to indicate the salient objects.Then,at the attention stage,under the FOA guidance,the high-level visual features of the region of interest are extracted in parallel.Finally,at the post-attention stage,by integrating these parallel and independent visual attributes,a decision-template based classifier fusion strategy is proposed to discriminate the task-related targets from the other extracted salient objects.For comparison,experiments on ship detection are done for validating the effectiveness and feasibility of the proposed model.展开更多
Background Eye tracking te chnology is receiving increased attention in the field of virtual reality.Specifically,future gaze prediction is crucial in pre-computation for many applications such as gaze-contingent rend...Background Eye tracking te chnology is receiving increased attention in the field of virtual reality.Specifically,future gaze prediction is crucial in pre-computation for many applications such as gaze-contingent rendering,advertisement placement,and content-based design.To explore future gaze prediction,it is necessary to analyze the temporal continuity of visual attention in immersive virtual reality.Methods In this paper,the concept of temporal continuity of visual attention is presented.Subsequently,an autocorrelation function method is proposed to evaluate the temporal continuity.Thereafter,the temporal continuity is analyzed in both free-viewing and task-oriented conditions.Results Specifically,in free-viewing conditions,the analysis of a free-viewing gaze dataset indicates that the temporal continuity performs well only within a short time interval.A task-oriented game scene condition was created and conducted to collect users'gaze data.An analysis of the collected gaze data finds the temporal continuity has a similar performance with that of the free-viewing conditions.Temporal continuity can be applied to future gaze prediction and if it is good,users'current gaze positions can be directly utilized to predict their gaze positions in the future.Conclusions The current gaze's future prediction performances are further evaluated in both free-viewing and task-oriented conditions and discover that the current gaze can be efficiently applied to the task of short-term future gaze prediction.The task of long-term gaze prediction still remains to be explored.展开更多
A method to detect traffic dangers based on visual attention model of sparse sampling was proposed. The hemispherical sparse sampling model was used to decrease the amount of calculation which increases the detection ...A method to detect traffic dangers based on visual attention model of sparse sampling was proposed. The hemispherical sparse sampling model was used to decrease the amount of calculation which increases the detection speed. Bayesian probability model and Gaussian kernel function were applied to calculate the saliency of traffic videos. The method of multiscale saliency was used and the final saliency was the average of all scales, which increased the detection rates extraordinarily. The detection results of several typical traffic dangers show that the proposed method has higher detection rates and speed, which meets the requirement of real-time detection of traffic dangers.展开更多
Video summarization is applied to reduce redundancy and developa concise representation of key frames in the video, more recently, video summaries have been used through visual attention modeling. In these schemes,the...Video summarization is applied to reduce redundancy and developa concise representation of key frames in the video, more recently, video summaries have been used through visual attention modeling. In these schemes,the frames that stand out visually are extracted as key frames based on humanattention modeling theories. The schemes for modeling visual attention haveproven to be effective for video summaries. Nevertheless, the high cost ofcomputing in such techniques restricts their usability in everyday situations.In this context, we propose a method based on KFE (key frame extraction)technique, which is recommended based on an efficient and accurate visualattention model. The calculation effort is minimized by utilizing dynamicvisual highlighting based on the temporal gradient instead of the traditionaloptical flow techniques. In addition, an efficient technique using a discretecosine transformation is utilized for the static visual salience. The dynamic andstatic visual attention metrics are merged by means of a non-linear weightedfusion technique. Results of the system are compared with some existing stateof-the-art techniques for the betterment of accuracy. The experimental resultsof our proposed model indicate the efficiency and high standard in terms ofthe key frames extraction as output.展开更多
In the majority of the interaction process, the operator often focuses on the tracked 3D hand gesture model at the "interaction points" in the collision detectionscene, such as "grasp" and "release" and objects ...In the majority of the interaction process, the operator often focuses on the tracked 3D hand gesture model at the "interaction points" in the collision detectionscene, such as "grasp" and "release" and objects in the scene, without paying attention to the tracked 3D hand gesture model in the total procedure. Thus in this paper, a visual attention distribution model of operator in the "grasp", "translation", "release" and other basic operation procedures is first studied and a 3D hand gesture tracking algorithm based on this distribution model is proposed. Utilizing the algorithm, in the period with a low degree of visual attention, a pre-stored 3D hand gesture animation can be used to directly visualise a 3D hand gesture model in the interactive scene; in the time period with a high degree of visual attention, an existing "frame-by-frame tracking" approach can be adopted to obtain a 3D gesture model. The results demonstrate that the proposed method can achieve real-time tracking of 3D hand gestures with an effective improvement on the efficiency, fluency, and availability of 3D hand gesture interaction.展开更多
Selective visual attention determines what pedestrians notice and ignore in urban environment.If consistency exists between different individuals’visual attention,designers can modify design by underlining mechanisms...Selective visual attention determines what pedestrians notice and ignore in urban environment.If consistency exists between different individuals’visual attention,designers can modify design by underlining mechanisms to better meet user needs.However,the mechanism of pedestrians’visual attention remains poorly understood,and it is challenging to forecast which position will attract pedestrians more in urban environment.To address this gap,we employed 360°video and immersive virtual reality to simulate walking scenarios and record eye movement in 138 participants.Our findings reveal a remarkable consistency in fixation distribution across individuals,exceeding both chance and orientation bias.One driver of this consistency emerges as a strategy of information maximization,with participants tending to fixate areas of higher local entropy.Additionally,we built the first eye movement dataset for panorama videos of diverse urban walking scenes,and developed a predictive model to forecast pedestrians’visual attention by supervised deep learning.The predictive model aids designers in better understanding how pedestrians will visually interact with the urban environment during the design phase.展开更多
U-Net has achieved good performance with the small-scale datasets through skip connections to merge the features of the low-level layers and high-level layers and has been widely utilized in biomedical image segmentat...U-Net has achieved good performance with the small-scale datasets through skip connections to merge the features of the low-level layers and high-level layers and has been widely utilized in biomedical image segmentation as well as recent microstructure image segregation of the materials.Three representative visual attention mechanism modules,named as squeeze-and-excitation networks,convolutional block attention module,and extended calibration algorithm,were intro-duced into the traditional U-Net architecture to further improve the prediction accuracy.It is found that compared with the original U-Net architecture,the evaluation index of the improved U-Net architecture has been significantly improved for the microstructure segmentation of the steels with the ferrite/martensite composite microstructure and pearlite/ferrite composite microstructure and the complex martensite/austenite island/bainite microstructure,which demonstrates the advantages of the utilization of the visual attention mechanism in the microstructure segregation.The reasons for the accuracy improvement were discussed based on the feature maps analysis.展开更多
Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms...Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms to dynamically focus on localized regions of the input image,improving the effectiveness of identifying relevant image regions at each step of caption generation.However,providing image captioning models with the capability of selecting the most relevant visual features from the input image and attending to them can significantly improve the utilization of these features.Consequently,this leads to enhanced captioning network performance.In light of this,we present an image captioning framework that efficiently exploits the extracted representations of the image.Our framework comprises three key components:the Visual Feature Detector module(VFD),the Visual Feature Visual Attention module(VFVA),and the language model.The VFD module is responsible for detecting a subset of the most pertinent features from the local visual features,creating an updated visual features matrix.Subsequently,the VFVA directs its attention to the visual features matrix generated by the VFD,resulting in an updated context vector employed by the language model to generate an informative description.Integrating the VFD and VFVA modules introduces an additional layer of processing for the visual features,thereby contributing to enhancing the image captioning model’s performance.Using the MS-COCO dataset,our experiments show that the proposed framework competes well with state-of-the-art methods,effectively leveraging visual representations to improve performance.The implementation code can be found here:https://github.com/althobhani/VFDICM(accessed on 30 July 2024).展开更多
In transportation architecture,wayfinding quality is a crucial factor for determining transfer efficiency and level of service.When developing architectural design concepts,designers often employ their visual attentio...In transportation architecture,wayfinding quality is a crucial factor for determining transfer efficiency and level of service.When developing architectural design concepts,designers often employ their visual attention to imagine where passengers will look.A saliency model is a software program that can predict human visual attention.This research examined whether a saliency model or designer visual attention is a good predictor of passenger visual attention during wayfinding in side transportation architecture.Using a remote eye-tracking system,the eye-movements of 29 participants watching 100 still images depicting different indoor seenes of transportation architecture were recorded and transformed into saliency maps to illustrate participants'visual attention.Participants were categorized as either"designers"or"laypeople"based on their architectural design expertise.Similarities were compared among the"designers'"visual attention,saliency model predictions,and"laypeople's"visual attention.The results showed that while the"designers'"visual attention was the best predictor of that of"laypeople",followed by saliency models,a single desig ner's visual attend on was not a good predictor.The divergence in visual attention highlights the limitation of designers in predicting passenger wayfinding behavior and implies that integrating a saliency model in practice can be beneficial for wayfinding design.展开更多
Predicting visual attention facilitates an adaptive virtual museum environment and provides a context-aware and interactive user experience.Explorations toward development of a visual attention mechanism using eye-tra...Predicting visual attention facilitates an adaptive virtual museum environment and provides a context-aware and interactive user experience.Explorations toward development of a visual attention mechanism using eye-tracking data have so far been limited to 2D cases,and researchers are yet to approach this topic in a 3D virtual environment and from a spatiotemporal perspective.We present the first 3D Eye-tracking Dataset for Visual Attention modeling in a virtual Museum,known as the EDVAM.In addition,a deep learning model is devised and tested with the EDVAM to predict a user’s subsequent visual attention from previous eye movements.This work provides a reference for visual attention modeling and context-aware interaction in the context of virtual museums.展开更多
Objective video quality assessment plays a very important role in multimedia signal processing. Several extensions of the structural similarity (SSIM) index could not predict the quality of the video sequence effect...Objective video quality assessment plays a very important role in multimedia signal processing. Several extensions of the structural similarity (SSIM) index could not predict the quality of the video sequence effectively. In this paper we propose a structural similarity quality metric for videos based on a spatial-temporal visual attention model. This model acquires the motion attended region and the distortion attended region by computing the motion features and the distortion contrast. It mimics the visual attention shifting between the two attended regions and takes the burst of error into account by introducing the non-linear weighting fimctions to give a much higher weighting factor to the extremely damaged frames. The proposed metric based on the model renders the final object quality rating of the whole video sequence and is validated using the 50 Hz video sequences of Video Quality Experts Group Phase I test database.展开更多
Dome display is expected to be used as effective visualization environment for modeling and simulation due to the features of frameless and high immersive sensation.However,since the users in the dome display can see ...Dome display is expected to be used as effective visualization environment for modeling and simulation due to the features of frameless and high immersive sensation.However,since the users in the dome display can see the projected image in arbitrary direction freely,it is difficult to share information among the viewers.In this research,in order to solve such a problem,the effect of visual attention guidance in the dome environment due to the effect of camera work was examined.As a visualization system,DomePlayer that can express the effect of camera work based on the camera work description language was developed.From the result of evaluation experiments using this system,the constraint condition of the camera work in the dome environment was derived and the effect of visual attention guidance by the camera work was evaluated.展开更多
Nowadays, there is a great need to investigate the effects of fatigue on physical as well as mental performance. The issues that are generally associated with extreme fatigue are that one can easily lose one’s focus ...Nowadays, there is a great need to investigate the effects of fatigue on physical as well as mental performance. The issues that are generally associated with extreme fatigue are that one can easily lose one’s focus while performing any particular activity whether it is physical or mental and this decreases one’s motivation to complete the task at hand efficiently and successfully. In the same line of thought, myriads of research studies posited the negative effects of fatigue on mental performance, and most techniques to induce fatigue to require normally long-time and repetitive visual search tasks. In this study, a visual search algorithm task was devised and customized using performance measures such as <em>d</em>’ (<strong>d-prime</strong>) and Speed Accuracy Trade-Off (<strong>SATF</strong>) as well as <strong>ROC</strong> analysis for classifier performance. The visual search algorithm consisted of distractors (<strong>L</strong>) and a target (<strong>T</strong>) whereby human participants had to press the appropriate keyboard button as fast as possible if they notice a target or not upon presentation of a visual stimulus. It was administered to human participants under laboratory conditions, and the reaction times, as well as accuracy of the participants, were monitored. It was found that the test image Size35Int255 was the best image to be used in terms of sensitivity and AUC (Area under Curve). Therefore, ongoing researches can use these findings to create their visual stimuli in such a way that the target and distractor images follow the size and intensity characteristics as found in this research.展开更多
Background:Age related macular degeneration(AMD)is one of the main causes of vision loss in older adults,generating,in most cases,a central scotoma that reduces central visual acuity(Noble&Chaudhary,2010).People a...Background:Age related macular degeneration(AMD)is one of the main causes of vision loss in older adults,generating,in most cases,a central scotoma that reduces central visual acuity(Noble&Chaudhary,2010).People affected by AMD have to rely on peripheral visual information and would highly benefit from efficiently allocating their attention to the periphery.Indeed,attention can improve peripheral spatial resolution(Carrasco,Ling&Read,2004)and can be allocated to a certain expanse of space outside of the central visual span,known as the attentional span.Attentional span has been shown to be decreased in people with AMD with less attention allocated to the periphery and more to the central visual field(Cheong et al.,2008),however it remains unknown whether aging is also a contributing factor.Methods:Fourteen healthy younger(mean age=21.8 years,SD=1.5)and 8 older adults(mean age=69.6 years,SD=7.3)performed a pop-out and a serial version of a visual search task,in the presence of different sized gaze-contingent invisible and visible artificial central scotomata(no scotoma,3°diameter,5°and 7°).Participants were asked to indicate as quickly as possible whether a target was present or not among distractors whose number varied(16,32 or 64 objects).We wished to determine whether the size of the scotoma,occluding different degrees of central vision,affected visual search differently for younger vs.older participants.Results:Both the younger and older participants showed higher reaction times(RTs)to find the target for the serial version(M=2,074 ms for younger adults,M=3,853 ms for older adults)compared to the pop-out version(M=866 ms,M=1,475 ms,P<0.001)and for more distractors(32 distractors compared to 16,and 64 compared to 32,P<0.01).Older adults showed longer RTs than younger adults for both versions of the task(P<0.01).We found a significant effect of scotoma size on older adults(3°scotoma M=3,276 ms;7°scotoma M=3,877 ms,P<0.05),however,accurate performance was higher with no scotoma(96%vs.92%,P<0.05)in the pop-out search task.This suggests that older participants privileged a fast decision at the expense of performance in those cases.For the younger adults,RTs were higher in the serial search task in the presence of a scotoma(M=2,074 ms)compared to the control condition(M=1,665 ms,P>0.05).Conclusions:These results suggest that older adults take longer to perform visual search compared to younger adults and tend to use peripheral visual less than younger adults;larger central scotomas disrupted their performance but not that of younger participants,who performed equally well with different central scotoma sizes.These findings suggest that aging is a contributing factor in the decrease of the peripheral attentional span.展开更多
Background:Research suggests that the analysis of facial expressions by a healthy brain would take place approximately 170 ms after the presentation of a facial expression in the superior temporal sulcus and the fusif...Background:Research suggests that the analysis of facial expressions by a healthy brain would take place approximately 170 ms after the presentation of a facial expression in the superior temporal sulcus and the fusiform gyrus,mostly in the right hemisphere.Some researchers argue that a fast pathway through the amygdala would allow automatic and early emotional treatment around 90 ms after stimulation.This treatment would be done subconsciously,even before this stimulus is perceived and could be approximated by presenting the stimuli quickly on the periphery of the fovea.The present study aimed to identify the neural correlates of a peripheral and simultaneous presentation of emotional expressions through a frequency tagging paradigm.Methods:The presentation of emotional facial expressions at a specific frequency induces in the visual cortex a stable and precise response to the presentation frequency[i.e.,a steady-state visual evoked potential(ssVEP)]that can be used as a frequency tag(i.e.,a frequency-tag to follow the cortical treatment of this stimulus.Here,the use of different specific stimulation frequencies allowed us to label the different facial expressions presented simultaneously and to obtain a reliable cortical response being associated with(I)each of the emotions and(II)the different times of presentations repeated(1/0.170 ms=~5.8 Hz,1/0.090 ms=~10.8 Hz).To identify the regions involved in emotional discrimination,we subtracted the brain activity induced by the rapid presentation of six emotional expressions of the activity induced by the presentation of the same emotion(reduced by neural adaptation).The results were compared to the hemisphere in which attention was sought,emotion and frequency of stimulation.Results:The signal-to-noise ratio of the cerebral oscillations referring to the treatment of the expression of fear was stronger in the regions specific to the emotional treatment when they were presented in the subjects peripheral vision,unbeknownst to them.In addition,the peripheral emotional treatment of fear at 10.8 Hz was associated with greater activation within the Gamma 1 and 2 frequency bands in the expected regions(frontotemporal and T6),as well as desynchronization in the Alpha frequency bands for the temporal regions.This modulation of the spectral power is independent of the attentional request.Conclusions:These results suggest that the emotional stimulation of fear presented in the peripheral vision and outside the attentional framework elicit an increase in brain activity,especially in the temporal lobe.The localization of this activity as well as the optimal stimulation frequency found for this facial expression suggests that it is treated by the fast pathway of the magnocellular layers.展开更多
A new method for automatic salient object segmentation is presented.Salient object segmentation is an important research area in the field of object recognition,image retrieval,image editing,scene reconstruction,and 2...A new method for automatic salient object segmentation is presented.Salient object segmentation is an important research area in the field of object recognition,image retrieval,image editing,scene reconstruction,and 2D/3D conversion.In this work,salient object segmentation is performed using saliency map and color segmentation.Edge,color and intensity feature are extracted from mean shift segmentation(MSS)image,and saliency map is created using these features.First average saliency per segment image is calculated using the color information from MSS image and generated saliency map.Then,second average saliency per segment image is calculated by applying same procedure for the first image to the thresholding,labeling,and hole-filling applied image.Thresholding,labeling and hole-filling are applied to the mean image of the generated two images to get the final salient object segmentation.The effectiveness of proposed method is proved by showing 80%,89%and 80%of precision,recall and F-measure values from the generated salient object segmentation image and ground truth image.展开更多
基金supported by Natural Science Foundation of China(61425008,61333004,61273054)
文摘Visual attention is a mechanism that enables the visual system to detect potentially important objects in complex environment. Most computational visual attention models are designed with inspirations from mammalian visual systems.However, electrophysiological and behavioral evidences indicate that avian species are animals with high visual capability that can process complex information accurately in real time. Therefore,the visual system of the avian species, especially the nuclei related to the visual attention mechanism, are investigated in this paper. Afterwards, a hierarchical visual attention model is proposed for saliency detection. The optic tectum neuron responses are computed and the self-information is used to compute primary saliency maps in the first hierarchy. The "winner-takeall" network in the tecto-isthmal projection is simulated and final saliency maps are estimated with the regularized random walks ranking in the second hierarchy. Comparison results verify that the proposed model, which can define the focus of attention accurately, outperforms several state-of-the-art models.This study provides insights into the relationship between the visual attention mechanism and the avian visual pathways. The computational visual attention model may reveal the underlying neural mechanism of the nuclei for biological visual attention.
基金supported by National Basic Research Program of China (973 Program) (No. 2006CB300407)National Natural Science Foundation of China (No. 50775017)
文摘Inspired by human behaviors, a robot object tracking model is proposed on the basis of visual attention mechanism, which is fit for the theory of topological perception. The model integrates the image-driven, bottom-up attention and the object-driven, top-down attention, whereas the previous attention model has mostly focused on either the bottom-up or top-down attention. By the bottom-up component, the whole scene is segmented into the ground region and the salient regions. Guided by top-down strategy which is achieved by a topological graph, the object regions are separated from the salient regions. The salient regions except the object regions are the barrier regions. In order to estimate the model, a mobile robot platform is developed, on which some experiments are implemented. The experimental results indicate that processing an image with a resolution of 752 × 480 pixels takes less than 200 ms and the object regions are unabridged. The analysis obtained by comparing the proposed model with the existing model demonstrates that the proposed model has some advantages in robot object tracking in terms of speed and efficiency.
文摘An improved method with better selection capability using a single camera was presented in comparison with previous method. To improve performance, two methods were applied to landmark selection in an unfamiliar indoor environment. First, a modified visual attention method was proposed to automatically select a candidate region as a more useful landmark. In visual attention, candidate landmark regions were selected with different characteristics of ambient color and intensity in the image. Then, the more useful landmarks were selected by combining the candidate regions using clustering. As generally implemented, automatic landmark selection by vision-based simultaneous localization and mapping(SLAM) results in many useless landmarks, because the features of images are distinguished from the surrounding environment but detected repeatedly. These useless landmarks create a serious problem for the SLAM system because they complicate data association. To address this, a method was proposed in which the robot initially collected landmarks through automatic detection while traversing the entire area where the robot performed SLAM, and then, the robot selected only those landmarks that exhibited high rarity through clustering, which enhanced the system performance. Experimental results show that this method of automatic landmark selection results in selection of a high-rarity landmark. The average error of the performance of SLAM decreases 52% compared with conventional methods and the accuracy of data associations increases.
基金Supported by Tianjin Municipal Natural Science Foundation of China(Grant No.19JCJQJC61600)Hebei Provincial Natural Science Foundation of China(Grant Nos.F2020202051,F2020202053).
文摘Visual odometry is critical in visual simultaneous localization and mapping for robot navigation.However,the pose estimation performance of most current visual odometry algorithms degrades in scenes with unevenly distributed features because dense features occupy excessive weight.Herein,a new human visual attention mechanism for point-and-line stereo visual odometry,which is called point-line-weight-mechanism visual odometry(PLWM-VO),is proposed to describe scene features in a global and balanced manner.A weight-adaptive model based on region partition and region growth is generated for the human visual attention mechanism,where sufficient attention is assigned to position-distinctive objects(sparse features in the environment).Furthermore,the sum of absolute differences algorithm is used to improve the accuracy of initialization for line features.Compared with the state-of-the-art method(ORB-VO),PLWM-VO show a 36.79%reduction in the absolute trajectory error on the Kitti and Euroc datasets.Although the time consumption of PLWM-VO is higher than that of ORB-VO,online test results indicate that PLWM-VO satisfies the real-time demand.The proposed algorithm not only significantly promotes the environmental adaptability of visual odometry,but also quantitatively demonstrates the superiority of the human visual attention mechanism.
基金supported by the National Natural Science Foundation of China (40871157)
文摘It is of great significance to rapidly detect targets in large-field remote sensing images,with limited computation resources.Employing relative achievements of visual attention in perception psychology,this paper proposes a hierarchical attention based model for target detection.Specifically,at the preattention stage,before getting salient regions,a fast computational approach is applied to build a saliency map.After that,the focus of attention(FOA) can be quickly obtained to indicate the salient objects.Then,at the attention stage,under the FOA guidance,the high-level visual features of the region of interest are extracted in parallel.Finally,at the post-attention stage,by integrating these parallel and independent visual attributes,a decision-template based classifier fusion strategy is proposed to discriminate the task-related targets from the other extracted salient objects.For comparison,experiments on ship detection are done for validating the effectiveness and feasibility of the proposed model.
基金the National Key R&D Program of China(2017 YFB 0203000)National Natural Science Foundation of China(61632003,61661146002,61631001).
文摘Background Eye tracking te chnology is receiving increased attention in the field of virtual reality.Specifically,future gaze prediction is crucial in pre-computation for many applications such as gaze-contingent rendering,advertisement placement,and content-based design.To explore future gaze prediction,it is necessary to analyze the temporal continuity of visual attention in immersive virtual reality.Methods In this paper,the concept of temporal continuity of visual attention is presented.Subsequently,an autocorrelation function method is proposed to evaluate the temporal continuity.Thereafter,the temporal continuity is analyzed in both free-viewing and task-oriented conditions.Results Specifically,in free-viewing conditions,the analysis of a free-viewing gaze dataset indicates that the temporal continuity performs well only within a short time interval.A task-oriented game scene condition was created and conducted to collect users'gaze data.An analysis of the collected gaze data finds the temporal continuity has a similar performance with that of the free-viewing conditions.Temporal continuity can be applied to future gaze prediction and if it is good,users'current gaze positions can be directly utilized to predict their gaze positions in the future.Conclusions The current gaze's future prediction performances are further evaluated in both free-viewing and task-oriented conditions and discover that the current gaze can be efficiently applied to the task of short-term future gaze prediction.The task of long-term gaze prediction still remains to be explored.
基金Project(50808025)supported by the National Natural Science Foundation of ChinaProject(20090162110057)supported by the Doctoral Fund of Ministry of Education of China
文摘A method to detect traffic dangers based on visual attention model of sparse sampling was proposed. The hemispherical sparse sampling model was used to decrease the amount of calculation which increases the detection speed. Bayesian probability model and Gaussian kernel function were applied to calculate the saliency of traffic videos. The method of multiscale saliency was used and the final saliency was the average of all scales, which increased the detection rates extraordinarily. The detection results of several typical traffic dangers show that the proposed method has higher detection rates and speed, which meets the requirement of real-time detection of traffic dangers.
基金This work was supported in part by Qatar National Library,Doha,Qatar,and in part by the Qatar University Internal under Grant IRCC-2021-010。
文摘Video summarization is applied to reduce redundancy and developa concise representation of key frames in the video, more recently, video summaries have been used through visual attention modeling. In these schemes,the frames that stand out visually are extracted as key frames based on humanattention modeling theories. The schemes for modeling visual attention haveproven to be effective for video summaries. Nevertheless, the high cost ofcomputing in such techniques restricts their usability in everyday situations.In this context, we propose a method based on KFE (key frame extraction)technique, which is recommended based on an efficient and accurate visualattention model. The calculation effort is minimized by utilizing dynamicvisual highlighting based on the temporal gradient instead of the traditionaloptical flow techniques. In addition, an efficient technique using a discretecosine transformation is utilized for the static visual salience. The dynamic andstatic visual attention metrics are merged by means of a non-linear weightedfusion technique. Results of the system are compared with some existing stateof-the-art techniques for the betterment of accuracy. The experimental resultsof our proposed model indicate the efficiency and high standard in terms ofthe key frames extraction as output.
基金Supported by the National Natural Science Foundation of China(61472163)the National Key Research&Development Plan of China(2016YFB1001403)the Science and Technology Project of Shandong Province(2015GGX101025)
文摘In the majority of the interaction process, the operator often focuses on the tracked 3D hand gesture model at the "interaction points" in the collision detectionscene, such as "grasp" and "release" and objects in the scene, without paying attention to the tracked 3D hand gesture model in the total procedure. Thus in this paper, a visual attention distribution model of operator in the "grasp", "translation", "release" and other basic operation procedures is first studied and a 3D hand gesture tracking algorithm based on this distribution model is proposed. Utilizing the algorithm, in the period with a low degree of visual attention, a pre-stored 3D hand gesture animation can be used to directly visualise a 3D hand gesture model in the interactive scene; in the time period with a high degree of visual attention, an existing "frame-by-frame tracking" approach can be adopted to obtain a 3D gesture model. The results demonstrate that the proposed method can achieve real-time tracking of 3D hand gestures with an effective improvement on the efficiency, fluency, and availability of 3D hand gesture interaction.
基金supported by the China National Key R&D Program(No.2022YFC3801500)the National Natural Science Foundation of China(No.52278023)and Cyrus Tang Foundation.
文摘Selective visual attention determines what pedestrians notice and ignore in urban environment.If consistency exists between different individuals’visual attention,designers can modify design by underlining mechanisms to better meet user needs.However,the mechanism of pedestrians’visual attention remains poorly understood,and it is challenging to forecast which position will attract pedestrians more in urban environment.To address this gap,we employed 360°video and immersive virtual reality to simulate walking scenarios and record eye movement in 138 participants.Our findings reveal a remarkable consistency in fixation distribution across individuals,exceeding both chance and orientation bias.One driver of this consistency emerges as a strategy of information maximization,with participants tending to fixate areas of higher local entropy.Additionally,we built the first eye movement dataset for panorama videos of diverse urban walking scenes,and developed a predictive model to forecast pedestrians’visual attention by supervised deep learning.The predictive model aids designers in better understanding how pedestrians will visually interact with the urban environment during the design phase.
基金support from National Natural Science Foundation of China(Nos.52071238 and U20A20279)National Key Research and Development Program of China(2022YFB3706701)the 111 Project(No.D18018)。
文摘U-Net has achieved good performance with the small-scale datasets through skip connections to merge the features of the low-level layers and high-level layers and has been widely utilized in biomedical image segmentation as well as recent microstructure image segregation of the materials.Three representative visual attention mechanism modules,named as squeeze-and-excitation networks,convolutional block attention module,and extended calibration algorithm,were intro-duced into the traditional U-Net architecture to further improve the prediction accuracy.It is found that compared with the original U-Net architecture,the evaluation index of the improved U-Net architecture has been significantly improved for the microstructure segmentation of the steels with the ferrite/martensite composite microstructure and pearlite/ferrite composite microstructure and the complex martensite/austenite island/bainite microstructure,which demonstrates the advantages of the utilization of the visual attention mechanism in the microstructure segregation.The reasons for the accuracy improvement were discussed based on the feature maps analysis.
基金supported by the National Natural Science Foundation of China(Nos.U22A2034,62177047)High Caliber Foreign Experts Introduction Plan funded by MOST,and Central South University Research Programme of Advanced Interdisciplinary Studies(No.2023QYJC020).
文摘Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms to dynamically focus on localized regions of the input image,improving the effectiveness of identifying relevant image regions at each step of caption generation.However,providing image captioning models with the capability of selecting the most relevant visual features from the input image and attending to them can significantly improve the utilization of these features.Consequently,this leads to enhanced captioning network performance.In light of this,we present an image captioning framework that efficiently exploits the extracted representations of the image.Our framework comprises three key components:the Visual Feature Detector module(VFD),the Visual Feature Visual Attention module(VFVA),and the language model.The VFD module is responsible for detecting a subset of the most pertinent features from the local visual features,creating an updated visual features matrix.Subsequently,the VFVA directs its attention to the visual features matrix generated by the VFD,resulting in an updated context vector employed by the language model to generate an informative description.Integrating the VFD and VFVA modules introduces an additional layer of processing for the visual features,thereby contributing to enhancing the image captioning model’s performance.Using the MS-COCO dataset,our experiments show that the proposed framework competes well with state-of-the-art methods,effectively leveraging visual representations to improve performance.The implementation code can be found here:https://github.com/althobhani/VFDICM(accessed on 30 July 2024).
基金supported by The Fun dame ntal Research Funds for the Cen tral Univ ersities Grant No.2019JBM317.
文摘In transportation architecture,wayfinding quality is a crucial factor for determining transfer efficiency and level of service.When developing architectural design concepts,designers often employ their visual attention to imagine where passengers will look.A saliency model is a software program that can predict human visual attention.This research examined whether a saliency model or designer visual attention is a good predictor of passenger visual attention during wayfinding in side transportation architecture.Using a remote eye-tracking system,the eye-movements of 29 participants watching 100 still images depicting different indoor seenes of transportation architecture were recorded and transformed into saliency maps to illustrate participants'visual attention.Participants were categorized as either"designers"or"laypeople"based on their architectural design expertise.Similarities were compared among the"designers'"visual attention,saliency model predictions,and"laypeople's"visual attention.The results showed that while the"designers'"visual attention was the best predictor of that of"laypeople",followed by saliency models,a single desig ner's visual attend on was not a good predictor.The divergence in visual attention highlights the limitation of designers in predicting passenger wayfinding behavior and implies that integrating a saliency model in practice can be beneficial for wayfinding design.
基金Project supported by the National Natural Science Foundation of China(No.61802341)the National Science and Technology Innovation 2030 Major Project of the Ministry of Science and Technology of China(No.2018AAA0100703)the Research Innovation Plan of the Ministry of Education of China,and the Provincial Key Research and Development Plan of Zhejiang Province,China(No.2019C03137)。
文摘Predicting visual attention facilitates an adaptive virtual museum environment and provides a context-aware and interactive user experience.Explorations toward development of a visual attention mechanism using eye-tracking data have so far been limited to 2D cases,and researchers are yet to approach this topic in a 3D virtual environment and from a spatiotemporal perspective.We present the first 3D Eye-tracking Dataset for Visual Attention modeling in a virtual Museum,known as the EDVAM.In addition,a deep learning model is devised and tested with the EDVAM to predict a user’s subsequent visual attention from previous eye movements.This work provides a reference for visual attention modeling and context-aware interaction in the context of virtual museums.
文摘Objective video quality assessment plays a very important role in multimedia signal processing. Several extensions of the structural similarity (SSIM) index could not predict the quality of the video sequence effectively. In this paper we propose a structural similarity quality metric for videos based on a spatial-temporal visual attention model. This model acquires the motion attended region and the distortion attended region by computing the motion features and the distortion contrast. It mimics the visual attention shifting between the two attended regions and takes the burst of error into account by introducing the non-linear weighting fimctions to give a much higher weighting factor to the extremely damaged frames. The proposed metric based on the model renders the final object quality rating of the whole video sequence and is validated using the 50 Hz video sequences of Video Quality Experts Group Phase I test database.
基金Grant No.25540171 Grant-in-Aid for Scientific Research.
文摘Dome display is expected to be used as effective visualization environment for modeling and simulation due to the features of frameless and high immersive sensation.However,since the users in the dome display can see the projected image in arbitrary direction freely,it is difficult to share information among the viewers.In this research,in order to solve such a problem,the effect of visual attention guidance in the dome environment due to the effect of camera work was examined.As a visualization system,DomePlayer that can express the effect of camera work based on the camera work description language was developed.From the result of evaluation experiments using this system,the constraint condition of the camera work in the dome environment was derived and the effect of visual attention guidance by the camera work was evaluated.
文摘Nowadays, there is a great need to investigate the effects of fatigue on physical as well as mental performance. The issues that are generally associated with extreme fatigue are that one can easily lose one’s focus while performing any particular activity whether it is physical or mental and this decreases one’s motivation to complete the task at hand efficiently and successfully. In the same line of thought, myriads of research studies posited the negative effects of fatigue on mental performance, and most techniques to induce fatigue to require normally long-time and repetitive visual search tasks. In this study, a visual search algorithm task was devised and customized using performance measures such as <em>d</em>’ (<strong>d-prime</strong>) and Speed Accuracy Trade-Off (<strong>SATF</strong>) as well as <strong>ROC</strong> analysis for classifier performance. The visual search algorithm consisted of distractors (<strong>L</strong>) and a target (<strong>T</strong>) whereby human participants had to press the appropriate keyboard button as fast as possible if they notice a target or not upon presentation of a visual stimulus. It was administered to human participants under laboratory conditions, and the reaction times, as well as accuracy of the participants, were monitored. It was found that the test image Size35Int255 was the best image to be used in terms of sensitivity and AUC (Area under Curve). Therefore, ongoing researches can use these findings to create their visual stimuli in such a way that the target and distractor images follow the size and intensity characteristics as found in this research.
文摘Background:Age related macular degeneration(AMD)is one of the main causes of vision loss in older adults,generating,in most cases,a central scotoma that reduces central visual acuity(Noble&Chaudhary,2010).People affected by AMD have to rely on peripheral visual information and would highly benefit from efficiently allocating their attention to the periphery.Indeed,attention can improve peripheral spatial resolution(Carrasco,Ling&Read,2004)and can be allocated to a certain expanse of space outside of the central visual span,known as the attentional span.Attentional span has been shown to be decreased in people with AMD with less attention allocated to the periphery and more to the central visual field(Cheong et al.,2008),however it remains unknown whether aging is also a contributing factor.Methods:Fourteen healthy younger(mean age=21.8 years,SD=1.5)and 8 older adults(mean age=69.6 years,SD=7.3)performed a pop-out and a serial version of a visual search task,in the presence of different sized gaze-contingent invisible and visible artificial central scotomata(no scotoma,3°diameter,5°and 7°).Participants were asked to indicate as quickly as possible whether a target was present or not among distractors whose number varied(16,32 or 64 objects).We wished to determine whether the size of the scotoma,occluding different degrees of central vision,affected visual search differently for younger vs.older participants.Results:Both the younger and older participants showed higher reaction times(RTs)to find the target for the serial version(M=2,074 ms for younger adults,M=3,853 ms for older adults)compared to the pop-out version(M=866 ms,M=1,475 ms,P<0.001)and for more distractors(32 distractors compared to 16,and 64 compared to 32,P<0.01).Older adults showed longer RTs than younger adults for both versions of the task(P<0.01).We found a significant effect of scotoma size on older adults(3°scotoma M=3,276 ms;7°scotoma M=3,877 ms,P<0.05),however,accurate performance was higher with no scotoma(96%vs.92%,P<0.05)in the pop-out search task.This suggests that older participants privileged a fast decision at the expense of performance in those cases.For the younger adults,RTs were higher in the serial search task in the presence of a scotoma(M=2,074 ms)compared to the control condition(M=1,665 ms,P>0.05).Conclusions:These results suggest that older adults take longer to perform visual search compared to younger adults and tend to use peripheral visual less than younger adults;larger central scotomas disrupted their performance but not that of younger participants,who performed equally well with different central scotoma sizes.These findings suggest that aging is a contributing factor in the decrease of the peripheral attentional span.
文摘Background:Research suggests that the analysis of facial expressions by a healthy brain would take place approximately 170 ms after the presentation of a facial expression in the superior temporal sulcus and the fusiform gyrus,mostly in the right hemisphere.Some researchers argue that a fast pathway through the amygdala would allow automatic and early emotional treatment around 90 ms after stimulation.This treatment would be done subconsciously,even before this stimulus is perceived and could be approximated by presenting the stimuli quickly on the periphery of the fovea.The present study aimed to identify the neural correlates of a peripheral and simultaneous presentation of emotional expressions through a frequency tagging paradigm.Methods:The presentation of emotional facial expressions at a specific frequency induces in the visual cortex a stable and precise response to the presentation frequency[i.e.,a steady-state visual evoked potential(ssVEP)]that can be used as a frequency tag(i.e.,a frequency-tag to follow the cortical treatment of this stimulus.Here,the use of different specific stimulation frequencies allowed us to label the different facial expressions presented simultaneously and to obtain a reliable cortical response being associated with(I)each of the emotions and(II)the different times of presentations repeated(1/0.170 ms=~5.8 Hz,1/0.090 ms=~10.8 Hz).To identify the regions involved in emotional discrimination,we subtracted the brain activity induced by the rapid presentation of six emotional expressions of the activity induced by the presentation of the same emotion(reduced by neural adaptation).The results were compared to the hemisphere in which attention was sought,emotion and frequency of stimulation.Results:The signal-to-noise ratio of the cerebral oscillations referring to the treatment of the expression of fear was stronger in the regions specific to the emotional treatment when they were presented in the subjects peripheral vision,unbeknownst to them.In addition,the peripheral emotional treatment of fear at 10.8 Hz was associated with greater activation within the Gamma 1 and 2 frequency bands in the expected regions(frontotemporal and T6),as well as desynchronization in the Alpha frequency bands for the temporal regions.This modulation of the spectral power is independent of the attentional request.Conclusions:These results suggest that the emotional stimulation of fear presented in the peripheral vision and outside the attentional framework elicit an increase in brain activity,especially in the temporal lobe.The localization of this activity as well as the optimal stimulation frequency found for this facial expression suggests that it is treated by the fast pathway of the magnocellular layers.
文摘A new method for automatic salient object segmentation is presented.Salient object segmentation is an important research area in the field of object recognition,image retrieval,image editing,scene reconstruction,and 2D/3D conversion.In this work,salient object segmentation is performed using saliency map and color segmentation.Edge,color and intensity feature are extracted from mean shift segmentation(MSS)image,and saliency map is created using these features.First average saliency per segment image is calculated using the color information from MSS image and generated saliency map.Then,second average saliency per segment image is calculated by applying same procedure for the first image to the thresholding,labeling,and hole-filling applied image.Thresholding,labeling and hole-filling are applied to the mean image of the generated two images to get the final salient object segmentation.The effectiveness of proposed method is proved by showing 80%,89%and 80%of precision,recall and F-measure values from the generated salient object segmentation image and ground truth image.