Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks s...Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks self-adaptability,information leakage,or weak concealment.To address these issues,this study proposes a universal and adaptable image-hiding method.First,a domain attention mechanism is designed by combining the Atrous convolution,which makes better use of the relationship between the secret image domain and the cover image domain.Second,to improve perceived human similarity,perceptual loss is incorporated into the training process.The experimental results are promising,with the proposed method achieving an average pixel discrepancy(APD)of 1.83 and a peak signal-to-noise ratio(PSNR)value of 40.72 dB between the cover and stego images,indicative of its high-quality output.Furthermore,the structural similarity index measure(SSIM)reaches 0.985 while the learned perceptual image patch similarity(LPIPS)remarkably registers at 0.0001.Moreover,self-testing and cross-experiments demonstrate the model’s adaptability and generalization in unknown hidden spaces,making it suitable for diverse computer vision tasks.展开更多
Visual odometry is critical in visual simultaneous localization and mapping for robot navigation.However,the pose estimation performance of most current visual odometry algorithms degrades in scenes with unevenly dist...Visual odometry is critical in visual simultaneous localization and mapping for robot navigation.However,the pose estimation performance of most current visual odometry algorithms degrades in scenes with unevenly distributed features because dense features occupy excessive weight.Herein,a new human visual attention mechanism for point-and-line stereo visual odometry,which is called point-line-weight-mechanism visual odometry(PLWM-VO),is proposed to describe scene features in a global and balanced manner.A weight-adaptive model based on region partition and region growth is generated for the human visual attention mechanism,where sufficient attention is assigned to position-distinctive objects(sparse features in the environment).Furthermore,the sum of absolute differences algorithm is used to improve the accuracy of initialization for line features.Compared with the state-of-the-art method(ORB-VO),PLWM-VO show a 36.79%reduction in the absolute trajectory error on the Kitti and Euroc datasets.Although the time consumption of PLWM-VO is higher than that of ORB-VO,online test results indicate that PLWM-VO satisfies the real-time demand.The proposed algorithm not only significantly promotes the environmental adaptability of visual odometry,but also quantitatively demonstrates the superiority of the human visual attention mechanism.展开更多
Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to ach...Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.展开更多
In the robotic welding process with thick steel plates,laser vision sensors are widely used to profile the weld seam to implement automatic seam tracking.The weld seam profile extraction(WSPE)result is a crucial step ...In the robotic welding process with thick steel plates,laser vision sensors are widely used to profile the weld seam to implement automatic seam tracking.The weld seam profile extraction(WSPE)result is a crucial step for identifying the feature points of the extracted profile to guide the welding torch in real time.The visual information processing system may collapse when interference data points in the image survive during the phase of feature point identification,which results in low tracking accuracy and poor welding quality.This paper presents a visual attention featurebased method to extract the weld seam profile(WSP)from the strong arc background using clustering results.First,a binary image is obtained through the preprocessing stage.Second,all data points with a gray value 255 are clustered with the nearest neighborhood clustering algorithm.Third,a strategy is developed to discern one cluster belonging to the WSP from the appointed candidate clusters in each loop,and a scheme is proposed to extract the entire WSP using visual continuity.Compared with the previous methods the proposed method in this paper can extract more useful details of the WSP and has better stability in terms of removing the interference data.Considerable WSPE tests with butt joints and T-joints show the anti-interference ability of the proposed method,which contributes to smoothing the welding process and shows its practical value in robotic automated welding with thick steel plates.展开更多
Visual attention is a mechanism that enables the visual system to detect potentially important objects in complex environment. Most computational visual attention models are designed with inspirations from mammalian v...Visual attention is a mechanism that enables the visual system to detect potentially important objects in complex environment. Most computational visual attention models are designed with inspirations from mammalian visual systems.However, electrophysiological and behavioral evidences indicate that avian species are animals with high visual capability that can process complex information accurately in real time. Therefore,the visual system of the avian species, especially the nuclei related to the visual attention mechanism, are investigated in this paper. Afterwards, a hierarchical visual attention model is proposed for saliency detection. The optic tectum neuron responses are computed and the self-information is used to compute primary saliency maps in the first hierarchy. The "winner-takeall" network in the tecto-isthmal projection is simulated and final saliency maps are estimated with the regularized random walks ranking in the second hierarchy. Comparison results verify that the proposed model, which can define the focus of attention accurately, outperforms several state-of-the-art models.This study provides insights into the relationship between the visual attention mechanism and the avian visual pathways. The computational visual attention model may reveal the underlying neural mechanism of the nuclei for biological visual attention.展开更多
Inspired by human behaviors, a robot object tracking model is proposed on the basis of visual attention mechanism, which is fit for the theory of topological perception. The model integrates the image-driven, bottom-u...Inspired by human behaviors, a robot object tracking model is proposed on the basis of visual attention mechanism, which is fit for the theory of topological perception. The model integrates the image-driven, bottom-up attention and the object-driven, top-down attention, whereas the previous attention model has mostly focused on either the bottom-up or top-down attention. By the bottom-up component, the whole scene is segmented into the ground region and the salient regions. Guided by top-down strategy which is achieved by a topological graph, the object regions are separated from the salient regions. The salient regions except the object regions are the barrier regions. In order to estimate the model, a mobile robot platform is developed, on which some experiments are implemented. The experimental results indicate that processing an image with a resolution of 752 × 480 pixels takes less than 200 ms and the object regions are unabridged. The analysis obtained by comparing the proposed model with the existing model demonstrates that the proposed model has some advantages in robot object tracking in terms of speed and efficiency.展开更多
It is of great significance to rapidly detect targets in large-field remote sensing images,with limited computation resources.Employing relative achievements of visual attention in perception psychology,this paper pro...It is of great significance to rapidly detect targets in large-field remote sensing images,with limited computation resources.Employing relative achievements of visual attention in perception psychology,this paper proposes a hierarchical attention based model for target detection.Specifically,at the preattention stage,before getting salient regions,a fast computational approach is applied to build a saliency map.After that,the focus of attention(FOA) can be quickly obtained to indicate the salient objects.Then,at the attention stage,under the FOA guidance,the high-level visual features of the region of interest are extracted in parallel.Finally,at the post-attention stage,by integrating these parallel and independent visual attributes,a decision-template based classifier fusion strategy is proposed to discriminate the task-related targets from the other extracted salient objects.For comparison,experiments on ship detection are done for validating the effectiveness and feasibility of the proposed model.展开更多
In many medical image segmentation applications identifying and extracting the region of interest (ROI) accurately is an important step. The usual approach to extract ROI is to apply image segmentation methods. In thi...In many medical image segmentation applications identifying and extracting the region of interest (ROI) accurately is an important step. The usual approach to extract ROI is to apply image segmentation methods. In this paper, we focus on extracting ROI by segmentation based on visual attended locations. Chan-Vese active contour model is used for image segmentation and attended locations are determined by SaliencyToolbox. The implementation of the toolbox is extension of the saliency map-based model of bottom-up attention, by a process of inferring the extent of a proto-object at the attended location from the maps that are used to compute the saliency map. When the set of regions of interest is selected, these regions need to be represented with the highest quality while the remaining parts of the processed image could be represented with a lower quality. The method has been successfully tested on medical images and ROIs are extracted.展开更多
An improved method with better selection capability using a single camera was presented in comparison with previous method. To improve performance, two methods were applied to landmark selection in an unfamiliar indoo...An improved method with better selection capability using a single camera was presented in comparison with previous method. To improve performance, two methods were applied to landmark selection in an unfamiliar indoor environment. First, a modified visual attention method was proposed to automatically select a candidate region as a more useful landmark. In visual attention, candidate landmark regions were selected with different characteristics of ambient color and intensity in the image. Then, the more useful landmarks were selected by combining the candidate regions using clustering. As generally implemented, automatic landmark selection by vision-based simultaneous localization and mapping(SLAM) results in many useless landmarks, because the features of images are distinguished from the surrounding environment but detected repeatedly. These useless landmarks create a serious problem for the SLAM system because they complicate data association. To address this, a method was proposed in which the robot initially collected landmarks through automatic detection while traversing the entire area where the robot performed SLAM, and then, the robot selected only those landmarks that exhibited high rarity through clustering, which enhanced the system performance. Experimental results show that this method of automatic landmark selection results in selection of a high-rarity landmark. The average error of the performance of SLAM decreases 52% compared with conventional methods and the accuracy of data associations increases.展开更多
Contour extraction of skin tumors accurately is an important task for further feature generation of their borders and sur-faces to early diagnose melanomas. An integrated approach, combining visual attention model and...Contour extraction of skin tumors accurately is an important task for further feature generation of their borders and sur-faces to early diagnose melanomas. An integrated approach, combining visual attention model and GVF-snake, is pro-posed in the paper to provide a general framework for locating tumor boundaries in case of noise and boundaries with large concavity. For any skin image, the visual attention model is implemented to locate the Region of Interests (ROIs) based on saliency maps. Then an algorithm called GVF-snake is utilized to iteratively drive an initial contour, deriving from the extracted ROIs, towards real boundary of skin tumors by minimizing an energy function. It is shown from ex-periments that the proposed approach exceeds in two aspects compared with other contour-deforming methods: 1) ini-tial contours generated from saliency maps are definitely located at neighboring regions of real boundaries of skin tu-mors to speed up converges of contour deformation and achieve higher accuracy;2) the method is not sensitive to nois-es on skins and initial contours extracted.展开更多
Saliency detection models, which are used to extract salient regions in visual scenes, are widely used in various multimedia processing applications. It has attracted much attention in the area of computer vision over...Saliency detection models, which are used to extract salient regions in visual scenes, are widely used in various multimedia processing applications. It has attracted much attention in the area of computer vision over the past decades. Since most images or videos over the Internet are stored in compressed domains such as images in JPEG format and videos in MPEG2 format, H.264 format, and MPEG4 Visual format, many saliency detection models have been proposed in the compressed domain recently. We provide a review of our works on saliency detection models in the compressed domain in this paper.Besides, we introduce some commonly used fusion strategies to combine spatial saliency map and temporal saliency map to compute the final video saliency map.展开更多
In the majority of the interaction process, the operator often focuses on the tracked 3D hand gesture model at the "interaction points" in the collision detectionscene, such as "grasp" and "release" and objects ...In the majority of the interaction process, the operator often focuses on the tracked 3D hand gesture model at the "interaction points" in the collision detectionscene, such as "grasp" and "release" and objects in the scene, without paying attention to the tracked 3D hand gesture model in the total procedure. Thus in this paper, a visual attention distribution model of operator in the "grasp", "translation", "release" and other basic operation procedures is first studied and a 3D hand gesture tracking algorithm based on this distribution model is proposed. Utilizing the algorithm, in the period with a low degree of visual attention, a pre-stored 3D hand gesture animation can be used to directly visualise a 3D hand gesture model in the interactive scene; in the time period with a high degree of visual attention, an existing "frame-by-frame tracking" approach can be adopted to obtain a 3D gesture model. The results demonstrate that the proposed method can achieve real-time tracking of 3D hand gestures with an effective improvement on the efficiency, fluency, and availability of 3D hand gesture interaction.展开更多
Background Eye tracking te chnology is receiving increased attention in the field of virtual reality.Specifically,future gaze prediction is crucial in pre-computation for many applications such as gaze-contingent rend...Background Eye tracking te chnology is receiving increased attention in the field of virtual reality.Specifically,future gaze prediction is crucial in pre-computation for many applications such as gaze-contingent rendering,advertisement placement,and content-based design.To explore future gaze prediction,it is necessary to analyze the temporal continuity of visual attention in immersive virtual reality.Methods In this paper,the concept of temporal continuity of visual attention is presented.Subsequently,an autocorrelation function method is proposed to evaluate the temporal continuity.Thereafter,the temporal continuity is analyzed in both free-viewing and task-oriented conditions.Results Specifically,in free-viewing conditions,the analysis of a free-viewing gaze dataset indicates that the temporal continuity performs well only within a short time interval.A task-oriented game scene condition was created and conducted to collect users'gaze data.An analysis of the collected gaze data finds the temporal continuity has a similar performance with that of the free-viewing conditions.Temporal continuity can be applied to future gaze prediction and if it is good,users'current gaze positions can be directly utilized to predict their gaze positions in the future.Conclusions The current gaze's future prediction performances are further evaluated in both free-viewing and task-oriented conditions and discover that the current gaze can be efficiently applied to the task of short-term future gaze prediction.The task of long-term gaze prediction still remains to be explored.展开更多
Video summarization is applied to reduce redundancy and developa concise representation of key frames in the video, more recently, video summaries have been used through visual attention modeling. In these schemes,the...Video summarization is applied to reduce redundancy and developa concise representation of key frames in the video, more recently, video summaries have been used through visual attention modeling. In these schemes,the frames that stand out visually are extracted as key frames based on humanattention modeling theories. The schemes for modeling visual attention haveproven to be effective for video summaries. Nevertheless, the high cost ofcomputing in such techniques restricts their usability in everyday situations.In this context, we propose a method based on KFE (key frame extraction)technique, which is recommended based on an efficient and accurate visualattention model. The calculation effort is minimized by utilizing dynamicvisual highlighting based on the temporal gradient instead of the traditionaloptical flow techniques. In addition, an efficient technique using a discretecosine transformation is utilized for the static visual salience. The dynamic andstatic visual attention metrics are merged by means of a non-linear weightedfusion technique. Results of the system are compared with some existing stateof-the-art techniques for the betterment of accuracy. The experimental resultsof our proposed model indicate the efficiency and high standard in terms ofthe key frames extraction as output.展开更多
A method to detect traffic dangers based on visual attention model of sparse sampling was proposed. The hemispherical sparse sampling model was used to decrease the amount of calculation which increases the detection ...A method to detect traffic dangers based on visual attention model of sparse sampling was proposed. The hemispherical sparse sampling model was used to decrease the amount of calculation which increases the detection speed. Bayesian probability model and Gaussian kernel function were applied to calculate the saliency of traffic videos. The method of multiscale saliency was used and the final saliency was the average of all scales, which increased the detection rates extraordinarily. The detection results of several typical traffic dangers show that the proposed method has higher detection rates and speed, which meets the requirement of real-time detection of traffic dangers.展开更多
Nowadays, there is a great need to investigate the effects of fatigue on physical as well as mental performance. The issues that are generally associated with extreme fatigue are that one can easily lose one’s focus ...Nowadays, there is a great need to investigate the effects of fatigue on physical as well as mental performance. The issues that are generally associated with extreme fatigue are that one can easily lose one’s focus while performing any particular activity whether it is physical or mental and this decreases one’s motivation to complete the task at hand efficiently and successfully. In the same line of thought, myriads of research studies posited the negative effects of fatigue on mental performance, and most techniques to induce fatigue to require normally long-time and repetitive visual search tasks. In this study, a visual search algorithm task was devised and customized using performance measures such as <em>d</em>’ (<strong>d-prime</strong>) and Speed Accuracy Trade-Off (<strong>SATF</strong>) as well as <strong>ROC</strong> analysis for classifier performance. The visual search algorithm consisted of distractors (<strong>L</strong>) and a target (<strong>T</strong>) whereby human participants had to press the appropriate keyboard button as fast as possible if they notice a target or not upon presentation of a visual stimulus. It was administered to human participants under laboratory conditions, and the reaction times, as well as accuracy of the participants, were monitored. It was found that the test image Size35Int255 was the best image to be used in terms of sensitivity and AUC (Area under Curve). Therefore, ongoing researches can use these findings to create their visual stimuli in such a way that the target and distractor images follow the size and intensity characteristics as found in this research.展开更多
Background:Age related macular degeneration(AMD)is one of the main causes of vision loss in older adults,generating,in most cases,a central scotoma that reduces central visual acuity(Noble&Chaudhary,2010).People a...Background:Age related macular degeneration(AMD)is one of the main causes of vision loss in older adults,generating,in most cases,a central scotoma that reduces central visual acuity(Noble&Chaudhary,2010).People affected by AMD have to rely on peripheral visual information and would highly benefit from efficiently allocating their attention to the periphery.Indeed,attention can improve peripheral spatial resolution(Carrasco,Ling&Read,2004)and can be allocated to a certain expanse of space outside of the central visual span,known as the attentional span.Attentional span has been shown to be decreased in people with AMD with less attention allocated to the periphery and more to the central visual field(Cheong et al.,2008),however it remains unknown whether aging is also a contributing factor.Methods:Fourteen healthy younger(mean age=21.8 years,SD=1.5)and 8 older adults(mean age=69.6 years,SD=7.3)performed a pop-out and a serial version of a visual search task,in the presence of different sized gaze-contingent invisible and visible artificial central scotomata(no scotoma,3°diameter,5°and 7°).Participants were asked to indicate as quickly as possible whether a target was present or not among distractors whose number varied(16,32 or 64 objects).We wished to determine whether the size of the scotoma,occluding different degrees of central vision,affected visual search differently for younger vs.older participants.Results:Both the younger and older participants showed higher reaction times(RTs)to find the target for the serial version(M=2,074 ms for younger adults,M=3,853 ms for older adults)compared to the pop-out version(M=866 ms,M=1,475 ms,P<0.001)and for more distractors(32 distractors compared to 16,and 64 compared to 32,P<0.01).Older adults showed longer RTs than younger adults for both versions of the task(P<0.01).We found a significant effect of scotoma size on older adults(3°scotoma M=3,276 ms;7°scotoma M=3,877 ms,P<0.05),however,accurate performance was higher with no scotoma(96%vs.92%,P<0.05)in the pop-out search task.This suggests that older participants privileged a fast decision at the expense of performance in those cases.For the younger adults,RTs were higher in the serial search task in the presence of a scotoma(M=2,074 ms)compared to the control condition(M=1,665 ms,P>0.05).Conclusions:These results suggest that older adults take longer to perform visual search compared to younger adults and tend to use peripheral visual less than younger adults;larger central scotomas disrupted their performance but not that of younger participants,who performed equally well with different central scotoma sizes.These findings suggest that aging is a contributing factor in the decrease of the peripheral attentional span.展开更多
Reliable detection of fundus lesion is important for automated screening of diabetic retinopathy. This paper presents a novel method to detect the fundus lesion in retinal fundus image based on a visual attention mode...Reliable detection of fundus lesion is important for automated screening of diabetic retinopathy. This paper presents a novel method to detect the fundus lesion in retinal fundus image based on a visual attention model. The proposed method intends to model the visual attention mechanism of ophthalmologists during observing fundus images. That is, the abnormal structures, such as the dark and bright lesions in the image, usually attract the most attention of experts, however, the normal structures, such as optic disc and vessels, have been usually selectively ignored. To measure the visual attention for abnormal and normal areas, the incremental coding length is computed in local and global manner respectively. The final saliency map of fundus lesion is a fusion of attention maps computed for the abnormal and normal areas. Experimental results conducted on the publicly DiaRetDB1 dataset show that the proposed method achieved a sensitivity of 0.71 at a specificity of 0.82 and an AUC of 0.76 for fundus lesion detection, and achieved an accuracy of 100% for normal area (optic disc) detection. The proposed method can assist the ophthalmologists in the inspection of fundus lesion.展开更多
基金supported by the National Key R&D Program of China(Grant Number 2021YFB2700900)the National Natural Science Foundation of China(Grant Numbers 62172232,62172233)the Jiangsu Basic Research Program Natural Science Foundation(Grant Number BK20200039).
文摘Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks self-adaptability,information leakage,or weak concealment.To address these issues,this study proposes a universal and adaptable image-hiding method.First,a domain attention mechanism is designed by combining the Atrous convolution,which makes better use of the relationship between the secret image domain and the cover image domain.Second,to improve perceived human similarity,perceptual loss is incorporated into the training process.The experimental results are promising,with the proposed method achieving an average pixel discrepancy(APD)of 1.83 and a peak signal-to-noise ratio(PSNR)value of 40.72 dB between the cover and stego images,indicative of its high-quality output.Furthermore,the structural similarity index measure(SSIM)reaches 0.985 while the learned perceptual image patch similarity(LPIPS)remarkably registers at 0.0001.Moreover,self-testing and cross-experiments demonstrate the model’s adaptability and generalization in unknown hidden spaces,making it suitable for diverse computer vision tasks.
基金Supported by Tianjin Municipal Natural Science Foundation of China(Grant No.19JCJQJC61600)Hebei Provincial Natural Science Foundation of China(Grant Nos.F2020202051,F2020202053).
文摘Visual odometry is critical in visual simultaneous localization and mapping for robot navigation.However,the pose estimation performance of most current visual odometry algorithms degrades in scenes with unevenly distributed features because dense features occupy excessive weight.Herein,a new human visual attention mechanism for point-and-line stereo visual odometry,which is called point-line-weight-mechanism visual odometry(PLWM-VO),is proposed to describe scene features in a global and balanced manner.A weight-adaptive model based on region partition and region growth is generated for the human visual attention mechanism,where sufficient attention is assigned to position-distinctive objects(sparse features in the environment).Furthermore,the sum of absolute differences algorithm is used to improve the accuracy of initialization for line features.Compared with the state-of-the-art method(ORB-VO),PLWM-VO show a 36.79%reduction in the absolute trajectory error on the Kitti and Euroc datasets.Although the time consumption of PLWM-VO is higher than that of ORB-VO,online test results indicate that PLWM-VO satisfies the real-time demand.The proposed algorithm not only significantly promotes the environmental adaptability of visual odometry,but also quantitatively demonstrates the superiority of the human visual attention mechanism.
基金This work was supported by the Sichuan Science and Technology Program(2021YFQ0003).
文摘Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.
基金Supported by National Natural Science Foundation of China(Grant Nos.51575349,51665037,51575348)State Key Laboratory of Smart Manufacturing for Special Vehicles and Transmission System(Grant No.GZ2016KF002).
文摘In the robotic welding process with thick steel plates,laser vision sensors are widely used to profile the weld seam to implement automatic seam tracking.The weld seam profile extraction(WSPE)result is a crucial step for identifying the feature points of the extracted profile to guide the welding torch in real time.The visual information processing system may collapse when interference data points in the image survive during the phase of feature point identification,which results in low tracking accuracy and poor welding quality.This paper presents a visual attention featurebased method to extract the weld seam profile(WSP)from the strong arc background using clustering results.First,a binary image is obtained through the preprocessing stage.Second,all data points with a gray value 255 are clustered with the nearest neighborhood clustering algorithm.Third,a strategy is developed to discern one cluster belonging to the WSP from the appointed candidate clusters in each loop,and a scheme is proposed to extract the entire WSP using visual continuity.Compared with the previous methods the proposed method in this paper can extract more useful details of the WSP and has better stability in terms of removing the interference data.Considerable WSPE tests with butt joints and T-joints show the anti-interference ability of the proposed method,which contributes to smoothing the welding process and shows its practical value in robotic automated welding with thick steel plates.
基金supported by Natural Science Foundation of China(61425008,61333004,61273054)
文摘Visual attention is a mechanism that enables the visual system to detect potentially important objects in complex environment. Most computational visual attention models are designed with inspirations from mammalian visual systems.However, electrophysiological and behavioral evidences indicate that avian species are animals with high visual capability that can process complex information accurately in real time. Therefore,the visual system of the avian species, especially the nuclei related to the visual attention mechanism, are investigated in this paper. Afterwards, a hierarchical visual attention model is proposed for saliency detection. The optic tectum neuron responses are computed and the self-information is used to compute primary saliency maps in the first hierarchy. The "winner-takeall" network in the tecto-isthmal projection is simulated and final saliency maps are estimated with the regularized random walks ranking in the second hierarchy. Comparison results verify that the proposed model, which can define the focus of attention accurately, outperforms several state-of-the-art models.This study provides insights into the relationship between the visual attention mechanism and the avian visual pathways. The computational visual attention model may reveal the underlying neural mechanism of the nuclei for biological visual attention.
基金supported by National Basic Research Program of China (973 Program) (No. 2006CB300407)National Natural Science Foundation of China (No. 50775017)
文摘Inspired by human behaviors, a robot object tracking model is proposed on the basis of visual attention mechanism, which is fit for the theory of topological perception. The model integrates the image-driven, bottom-up attention and the object-driven, top-down attention, whereas the previous attention model has mostly focused on either the bottom-up or top-down attention. By the bottom-up component, the whole scene is segmented into the ground region and the salient regions. Guided by top-down strategy which is achieved by a topological graph, the object regions are separated from the salient regions. The salient regions except the object regions are the barrier regions. In order to estimate the model, a mobile robot platform is developed, on which some experiments are implemented. The experimental results indicate that processing an image with a resolution of 752 × 480 pixels takes less than 200 ms and the object regions are unabridged. The analysis obtained by comparing the proposed model with the existing model demonstrates that the proposed model has some advantages in robot object tracking in terms of speed and efficiency.
基金supported by the National Natural Science Foundation of China (40871157)
文摘It is of great significance to rapidly detect targets in large-field remote sensing images,with limited computation resources.Employing relative achievements of visual attention in perception psychology,this paper proposes a hierarchical attention based model for target detection.Specifically,at the preattention stage,before getting salient regions,a fast computational approach is applied to build a saliency map.After that,the focus of attention(FOA) can be quickly obtained to indicate the salient objects.Then,at the attention stage,under the FOA guidance,the high-level visual features of the region of interest are extracted in parallel.Finally,at the post-attention stage,by integrating these parallel and independent visual attributes,a decision-template based classifier fusion strategy is proposed to discriminate the task-related targets from the other extracted salient objects.For comparison,experiments on ship detection are done for validating the effectiveness and feasibility of the proposed model.
文摘In many medical image segmentation applications identifying and extracting the region of interest (ROI) accurately is an important step. The usual approach to extract ROI is to apply image segmentation methods. In this paper, we focus on extracting ROI by segmentation based on visual attended locations. Chan-Vese active contour model is used for image segmentation and attended locations are determined by SaliencyToolbox. The implementation of the toolbox is extension of the saliency map-based model of bottom-up attention, by a process of inferring the extent of a proto-object at the attended location from the maps that are used to compute the saliency map. When the set of regions of interest is selected, these regions need to be represented with the highest quality while the remaining parts of the processed image could be represented with a lower quality. The method has been successfully tested on medical images and ROIs are extracted.
文摘An improved method with better selection capability using a single camera was presented in comparison with previous method. To improve performance, two methods were applied to landmark selection in an unfamiliar indoor environment. First, a modified visual attention method was proposed to automatically select a candidate region as a more useful landmark. In visual attention, candidate landmark regions were selected with different characteristics of ambient color and intensity in the image. Then, the more useful landmarks were selected by combining the candidate regions using clustering. As generally implemented, automatic landmark selection by vision-based simultaneous localization and mapping(SLAM) results in many useless landmarks, because the features of images are distinguished from the surrounding environment but detected repeatedly. These useless landmarks create a serious problem for the SLAM system because they complicate data association. To address this, a method was proposed in which the robot initially collected landmarks through automatic detection while traversing the entire area where the robot performed SLAM, and then, the robot selected only those landmarks that exhibited high rarity through clustering, which enhanced the system performance. Experimental results show that this method of automatic landmark selection results in selection of a high-rarity landmark. The average error of the performance of SLAM decreases 52% compared with conventional methods and the accuracy of data associations increases.
文摘Contour extraction of skin tumors accurately is an important task for further feature generation of their borders and sur-faces to early diagnose melanomas. An integrated approach, combining visual attention model and GVF-snake, is pro-posed in the paper to provide a general framework for locating tumor boundaries in case of noise and boundaries with large concavity. For any skin image, the visual attention model is implemented to locate the Region of Interests (ROIs) based on saliency maps. Then an algorithm called GVF-snake is utilized to iteratively drive an initial contour, deriving from the extracted ROIs, towards real boundary of skin tumors by minimizing an energy function. It is shown from ex-periments that the proposed approach exceeds in two aspects compared with other contour-deforming methods: 1) ini-tial contours generated from saliency maps are definitely located at neighboring regions of real boundaries of skin tu-mors to speed up converges of contour deformation and achieve higher accuracy;2) the method is not sensitive to nois-es on skins and initial contours extracted.
基金This work was supported in part by the Foundation of Guangdong Educational Committee (2014KTSCX191) and the National Natural Science Foundation of China (61201087).
文摘Saliency detection models, which are used to extract salient regions in visual scenes, are widely used in various multimedia processing applications. It has attracted much attention in the area of computer vision over the past decades. Since most images or videos over the Internet are stored in compressed domains such as images in JPEG format and videos in MPEG2 format, H.264 format, and MPEG4 Visual format, many saliency detection models have been proposed in the compressed domain recently. We provide a review of our works on saliency detection models in the compressed domain in this paper.Besides, we introduce some commonly used fusion strategies to combine spatial saliency map and temporal saliency map to compute the final video saliency map.
基金Supported by the National Natural Science Foundation of China(61472163)the National Key Research&Development Plan of China(2016YFB1001403)the Science and Technology Project of Shandong Province(2015GGX101025)
文摘In the majority of the interaction process, the operator often focuses on the tracked 3D hand gesture model at the "interaction points" in the collision detectionscene, such as "grasp" and "release" and objects in the scene, without paying attention to the tracked 3D hand gesture model in the total procedure. Thus in this paper, a visual attention distribution model of operator in the "grasp", "translation", "release" and other basic operation procedures is first studied and a 3D hand gesture tracking algorithm based on this distribution model is proposed. Utilizing the algorithm, in the period with a low degree of visual attention, a pre-stored 3D hand gesture animation can be used to directly visualise a 3D hand gesture model in the interactive scene; in the time period with a high degree of visual attention, an existing "frame-by-frame tracking" approach can be adopted to obtain a 3D gesture model. The results demonstrate that the proposed method can achieve real-time tracking of 3D hand gestures with an effective improvement on the efficiency, fluency, and availability of 3D hand gesture interaction.
基金the National Key R&D Program of China(2017 YFB 0203000)National Natural Science Foundation of China(61632003,61661146002,61631001).
文摘Background Eye tracking te chnology is receiving increased attention in the field of virtual reality.Specifically,future gaze prediction is crucial in pre-computation for many applications such as gaze-contingent rendering,advertisement placement,and content-based design.To explore future gaze prediction,it is necessary to analyze the temporal continuity of visual attention in immersive virtual reality.Methods In this paper,the concept of temporal continuity of visual attention is presented.Subsequently,an autocorrelation function method is proposed to evaluate the temporal continuity.Thereafter,the temporal continuity is analyzed in both free-viewing and task-oriented conditions.Results Specifically,in free-viewing conditions,the analysis of a free-viewing gaze dataset indicates that the temporal continuity performs well only within a short time interval.A task-oriented game scene condition was created and conducted to collect users'gaze data.An analysis of the collected gaze data finds the temporal continuity has a similar performance with that of the free-viewing conditions.Temporal continuity can be applied to future gaze prediction and if it is good,users'current gaze positions can be directly utilized to predict their gaze positions in the future.Conclusions The current gaze's future prediction performances are further evaluated in both free-viewing and task-oriented conditions and discover that the current gaze can be efficiently applied to the task of short-term future gaze prediction.The task of long-term gaze prediction still remains to be explored.
基金This work was supported in part by Qatar National Library,Doha,Qatar,and in part by the Qatar University Internal under Grant IRCC-2021-010。
文摘Video summarization is applied to reduce redundancy and developa concise representation of key frames in the video, more recently, video summaries have been used through visual attention modeling. In these schemes,the frames that stand out visually are extracted as key frames based on humanattention modeling theories. The schemes for modeling visual attention haveproven to be effective for video summaries. Nevertheless, the high cost ofcomputing in such techniques restricts their usability in everyday situations.In this context, we propose a method based on KFE (key frame extraction)technique, which is recommended based on an efficient and accurate visualattention model. The calculation effort is minimized by utilizing dynamicvisual highlighting based on the temporal gradient instead of the traditionaloptical flow techniques. In addition, an efficient technique using a discretecosine transformation is utilized for the static visual salience. The dynamic andstatic visual attention metrics are merged by means of a non-linear weightedfusion technique. Results of the system are compared with some existing stateof-the-art techniques for the betterment of accuracy. The experimental resultsof our proposed model indicate the efficiency and high standard in terms ofthe key frames extraction as output.
基金Project(50808025)supported by the National Natural Science Foundation of ChinaProject(20090162110057)supported by the Doctoral Fund of Ministry of Education of China
文摘A method to detect traffic dangers based on visual attention model of sparse sampling was proposed. The hemispherical sparse sampling model was used to decrease the amount of calculation which increases the detection speed. Bayesian probability model and Gaussian kernel function were applied to calculate the saliency of traffic videos. The method of multiscale saliency was used and the final saliency was the average of all scales, which increased the detection rates extraordinarily. The detection results of several typical traffic dangers show that the proposed method has higher detection rates and speed, which meets the requirement of real-time detection of traffic dangers.
文摘Nowadays, there is a great need to investigate the effects of fatigue on physical as well as mental performance. The issues that are generally associated with extreme fatigue are that one can easily lose one’s focus while performing any particular activity whether it is physical or mental and this decreases one’s motivation to complete the task at hand efficiently and successfully. In the same line of thought, myriads of research studies posited the negative effects of fatigue on mental performance, and most techniques to induce fatigue to require normally long-time and repetitive visual search tasks. In this study, a visual search algorithm task was devised and customized using performance measures such as <em>d</em>’ (<strong>d-prime</strong>) and Speed Accuracy Trade-Off (<strong>SATF</strong>) as well as <strong>ROC</strong> analysis for classifier performance. The visual search algorithm consisted of distractors (<strong>L</strong>) and a target (<strong>T</strong>) whereby human participants had to press the appropriate keyboard button as fast as possible if they notice a target or not upon presentation of a visual stimulus. It was administered to human participants under laboratory conditions, and the reaction times, as well as accuracy of the participants, were monitored. It was found that the test image Size35Int255 was the best image to be used in terms of sensitivity and AUC (Area under Curve). Therefore, ongoing researches can use these findings to create their visual stimuli in such a way that the target and distractor images follow the size and intensity characteristics as found in this research.
文摘Background:Age related macular degeneration(AMD)is one of the main causes of vision loss in older adults,generating,in most cases,a central scotoma that reduces central visual acuity(Noble&Chaudhary,2010).People affected by AMD have to rely on peripheral visual information and would highly benefit from efficiently allocating their attention to the periphery.Indeed,attention can improve peripheral spatial resolution(Carrasco,Ling&Read,2004)and can be allocated to a certain expanse of space outside of the central visual span,known as the attentional span.Attentional span has been shown to be decreased in people with AMD with less attention allocated to the periphery and more to the central visual field(Cheong et al.,2008),however it remains unknown whether aging is also a contributing factor.Methods:Fourteen healthy younger(mean age=21.8 years,SD=1.5)and 8 older adults(mean age=69.6 years,SD=7.3)performed a pop-out and a serial version of a visual search task,in the presence of different sized gaze-contingent invisible and visible artificial central scotomata(no scotoma,3°diameter,5°and 7°).Participants were asked to indicate as quickly as possible whether a target was present or not among distractors whose number varied(16,32 or 64 objects).We wished to determine whether the size of the scotoma,occluding different degrees of central vision,affected visual search differently for younger vs.older participants.Results:Both the younger and older participants showed higher reaction times(RTs)to find the target for the serial version(M=2,074 ms for younger adults,M=3,853 ms for older adults)compared to the pop-out version(M=866 ms,M=1,475 ms,P<0.001)and for more distractors(32 distractors compared to 16,and 64 compared to 32,P<0.01).Older adults showed longer RTs than younger adults for both versions of the task(P<0.01).We found a significant effect of scotoma size on older adults(3°scotoma M=3,276 ms;7°scotoma M=3,877 ms,P<0.05),however,accurate performance was higher with no scotoma(96%vs.92%,P<0.05)in the pop-out search task.This suggests that older participants privileged a fast decision at the expense of performance in those cases.For the younger adults,RTs were higher in the serial search task in the presence of a scotoma(M=2,074 ms)compared to the control condition(M=1,665 ms,P>0.05).Conclusions:These results suggest that older adults take longer to perform visual search compared to younger adults and tend to use peripheral visual less than younger adults;larger central scotomas disrupted their performance but not that of younger participants,who performed equally well with different central scotoma sizes.These findings suggest that aging is a contributing factor in the decrease of the peripheral attentional span.
基金The authors would like to thank those who provided materials that were used in this study. This work was supported in part by the Natural Science Foundation of China under Grant 61472102, in part by the Fundamental Research Funds for the Central Universities under Grant HIT.NSRIF.2013091, and in part by the Humanity and Social Science Youth foundation of Ministry of Education of China under Grant 14YJC760001.
文摘Reliable detection of fundus lesion is important for automated screening of diabetic retinopathy. This paper presents a novel method to detect the fundus lesion in retinal fundus image based on a visual attention model. The proposed method intends to model the visual attention mechanism of ophthalmologists during observing fundus images. That is, the abnormal structures, such as the dark and bright lesions in the image, usually attract the most attention of experts, however, the normal structures, such as optic disc and vessels, have been usually selectively ignored. To measure the visual attention for abnormal and normal areas, the incremental coding length is computed in local and global manner respectively. The final saliency map of fundus lesion is a fusion of attention maps computed for the abnormal and normal areas. Experimental results conducted on the publicly DiaRetDB1 dataset show that the proposed method achieved a sensitivity of 0.71 at a specificity of 0.82 and an AUC of 0.76 for fundus lesion detection, and achieved an accuracy of 100% for normal area (optic disc) detection. The proposed method can assist the ophthalmologists in the inspection of fundus lesion.