Advances in machine vision systems have revolutionized applications such as autonomous driving,robotic navigation,and augmented reality.Despite substantial progress,challenges persist,including dynamic backgrounds,occ...Advances in machine vision systems have revolutionized applications such as autonomous driving,robotic navigation,and augmented reality.Despite substantial progress,challenges persist,including dynamic backgrounds,occlusion,and limited labeled data.To address these challenges,we introduce a comprehensive methodology toenhance image classification and object detection accuracy.The proposed approach involves the integration ofmultiple methods in a complementary way.The process commences with the application of Gaussian filters tomitigate the impact of noise interference.These images are then processed for segmentation using Fuzzy C-Meanssegmentation in parallel with saliency mapping techniques to find the most prominent regions.The Binary RobustIndependent Elementary Features(BRIEF)characteristics are then extracted fromdata derived fromsaliency mapsand segmented images.For precise object separation,Oriented FAST and Rotated BRIEF(ORB)algorithms areemployed.Genetic Algorithms(GAs)are used to optimize Random Forest classifier parameters which lead toimproved performance.Our method stands out due to its comprehensive approach,adeptly addressing challengessuch as changing backdrops,occlusion,and limited labeled data concurrently.A significant enhancement hasbeen achieved by integrating Genetic Algorithms(GAs)to precisely optimize parameters.This minor adjustmentnot only boosts the uniqueness of our system but also amplifies its overall efficacy.The proposed methodologyhas demonstrated notable classification accuracies of 90.9%and 89.0%on the challenging Corel-1k and MSRCdatasets,respectively.Furthermore,detection accuracies of 87.2%and 86.6%have been attained.Although ourmethod performed well in both datasets it may face difficulties in real-world data especially where datasets havehighly complex backgrounds.Despite these limitations,GAintegration for parameter optimization shows a notablestrength in enhancing the overall adaptability and performance of our system.展开更多
Classical mathematical morphology operations use a fixed size and shape structuring element to process the whole image.Due to the diversity of image content and the complexity of target structure,for processed image,i...Classical mathematical morphology operations use a fixed size and shape structuring element to process the whole image.Due to the diversity of image content and the complexity of target structure,for processed image,its shape may be changed and part of the information may be lost.Therefore,we propose a method for constructing salience adaptive morphological structuring elements based on minimum spanning tree(MST).First,the gradient image of the input image is calculated,the edge image is obtained by non-maximum suppression(NMS)of the gradient image,and then chamfer distance transformation is performed on the edge image to obtain a salience map(SM).Second,the radius of structuring element is determined by calculating the maximum and minimum values of SM and then the minimum spanning tree is calculated on the SM.Finally,the radius is used to construct a structuring element whose shape and size adaptively change with the local features of the input image.In addition,the basic morphological operators such as erosion,dilation,opening and closing are redefined using the adaptive structuring elements and then compared with the classical morphological operators.The simulation results show that the proposed method can make full use of the local features of the image and has better processing results in image structure preservation and image filtering.展开更多
A new method for automatic salient object segmentation is presented.Salient object segmentation is an important research area in the field of object recognition,image retrieval,image editing,scene reconstruction,and 2...A new method for automatic salient object segmentation is presented.Salient object segmentation is an important research area in the field of object recognition,image retrieval,image editing,scene reconstruction,and 2D/3D conversion.In this work,salient object segmentation is performed using saliency map and color segmentation.Edge,color and intensity feature are extracted from mean shift segmentation(MSS)image,and saliency map is created using these features.First average saliency per segment image is calculated using the color information from MSS image and generated saliency map.Then,second average saliency per segment image is calculated by applying same procedure for the first image to the thresholding,labeling,and hole-filling applied image.Thresholding,labeling and hole-filling are applied to the mean image of the generated two images to get the final salient object segmentation.The effectiveness of proposed method is proved by showing 80%,89%and 80%of precision,recall and F-measure values from the generated salient object segmentation image and ground truth image.展开更多
In order to further improve the efficiency of video compression, we introduce a perceptual characteristics of Human Visual System (HVS) to video coding, and propose a novel video coding rate control algorithm based on...In order to further improve the efficiency of video compression, we introduce a perceptual characteristics of Human Visual System (HVS) to video coding, and propose a novel video coding rate control algorithm based on human visual saliency model in H.264/AVC. Firstly, we modifie Itti's saliency model. Secondly, target bits of each frame are allocated through the correlation of saliency region between the current and previous frame, and the complexity of each MB is modified through the saliency value and its Mean Absolute Difference (MAD) value. Lastly, the algorithm was implemented in JVT JM12.2. Simulation results show that, comparing with traditional rate control algorithm, the proposed one can reduce the coding bit rate and improve the reconstructed video subjective quality, especially for visual saliency region. It is very suitable for wireless video transmission.展开更多
Interpreting deep neural networks is of great importance to understand and verify deep models for natural language processing(NLP)tasks.However,most existing approaches only focus on improving the performance of model...Interpreting deep neural networks is of great importance to understand and verify deep models for natural language processing(NLP)tasks.However,most existing approaches only focus on improving the performance of models but ignore their interpretability.In this work,we propose a Randomly Wired Graph Neural Network(RWGNN)by using graph to model the structure of Neural Network,which could solve two major problems(word-boundary ambiguity and polysemy)of ChineseNER.Besides,we develop a pipeline to explain the RWGNNby using Saliency Map and Adversarial Attacks.Experimental results demonstrate that our approach can identify meaningful and reasonable interpretations for hidden states of RWGNN.展开更多
Underwater imagery and transmission possess numerous challenges like lower signal bandwidth,slower data transmission bit rates,Noise,underwater blue/green light haze etc.These factors distort the estimation of Region ...Underwater imagery and transmission possess numerous challenges like lower signal bandwidth,slower data transmission bit rates,Noise,underwater blue/green light haze etc.These factors distort the estimation of Region of Interest and are prime hurdles in deploying efficient compression techniques.Due to the presence of blue/green light in underwater imagery,shape adaptive or block-wise compression techniques faces failures as it becomes very difficult to estimate the compression levels/coefficients for a particular region.This method is proposed to efficiently deploy an Extreme Learning Machine(ELM)model-based shape adaptive Discrete Cosine Transformation(DCT)for underwater images.Underwater color image restoration techniques based on veiling light estimation and restoration of images followed by Saliency map estimation based on Gray Level Cooccurrence Matrix(GLCM)features are explained.An ELM network is modeled which takes two parameters,signal strength and saliency value of the region to be compressed and level of compression(DCT coefficients and compression steps)are predicted by ELM.This method ensures lesser errors in the Region of Interest and a better trade-off between available signal strength and compression level.展开更多
Diabetic Retinopathy(DR)is a type of disease in eyes as a result of a diabetic condition that ends up damaging the retina,leading to blindness or loss of vision.Morphological and physiological retinal variations invol...Diabetic Retinopathy(DR)is a type of disease in eyes as a result of a diabetic condition that ends up damaging the retina,leading to blindness or loss of vision.Morphological and physiological retinal variations involving slowdown of blood flow in the retina,elevation of leukocyte cohesion,basement membrane dystrophy,and decline of pericyte cells,develop.As DR in its initial stage has no symptoms,early detection and automated diagnosis can prevent further visual damage.In this research,using a Deep Neural Network(DNN),segmentation methods are proposed to detect the retinal defects such as exudates,hemorrhages,microaneurysms from digital fundus images and then the conditions are classified accurately to identify the grades as mild,moderate,severe,no PDR,PDR in DR.Initially,saliency detection is applied on color images to detect maximum salient foreground objects from the background.Next,structure tensor is applied powerfully to enhance the local patterns of edge elements and intensity changes that occur on edges of the object.Finally,active contours approximation is performed using gradient descent to segment the lesions from the images.Afterwards,the output images from the proposed segmentation process are subjected to evaluate the ratio between the total contour area and the total true contour arc length to label the classes as mild,moderate,severe,No PDR and PDR.Based on the computed ratio obtained from segmented images,the severity levels were identified.Meanwhile,statistical parameters like the mean and the standard deviation of pixel intensities,mean of hue,saturation and deviation clustering,are estimated through K-means,which are computed as features from the output images of the proposed segmentation process.Using these derived feature sets as input to the classifier,the classification of DR was performed.Finally,a VGG-19 deep neural network was trained and tested using the derived feature sets from the KAGGLE fundus image dataset containing 35,126 images in total.The VGG-19 is trained with features extracted from 20,000 images and tested with features extracted from 5,000 images to achieve a sensitivity of 82%and an accuracy of 96%.The proposed system was able to label and classify DR grades automatically.展开更多
With the improvement of people’s security awareness,numerous monitoring equipment has been put into use,resulting in the explosive growth of surveillance video data.Key frame extraction technology is a paramount tech...With the improvement of people’s security awareness,numerous monitoring equipment has been put into use,resulting in the explosive growth of surveillance video data.Key frame extraction technology is a paramount technology for improving video storage efficiency and enhancing the accuracy of video retrieval.It can extract key frame sets that can express video content from massive videos.However,the existing key frame extraction algorithms of surveillance video still have deficiencies,such as the destruction of image information integrity and the inability to extract key frames accurately.To this end,this paper proposes a key frame extraction algorithm of surveillance video based on quaternion Fourier saliency detection.Firstly,the algorithm used colors,and intensity features to perform quaternion Fourier transform on surveillance video sequences.Next,the phase spectrum of the quaternion Fourier transformed image was obtained,and he image visual saliency map was obtained according to the quaternion Fourier phase spectrum.Then,the image visual saliency map of two adjacent frames is used to characterize the change of target motion state.Finally,the frames that can accurately express the motion state of the target are selected as key frames.The experimental results show that the method proposed in this paper can accurately capture the changes of the local motion state of the target while maintaining the integrity of the image information.展开更多
Medical image registration is widely used in image-guided therapy and image-guided surgery to estimate spatial correspondence between planning and treatment images.However,most methods based on intensity have the prob...Medical image registration is widely used in image-guided therapy and image-guided surgery to estimate spatial correspondence between planning and treatment images.However,most methods based on intensity have the problems of matching ambiguity and ignoring the influence of weak correspondence areas on the overall registration.In this study,we propose a novel general-purpose registration algorithm based on free-form deformation by non-subsampled contourlet transform and saliency map,which can reduce the matching ambiguities and maintain the topological structure of weak correspondence areas.An optimization method based on Markov random fields is used to optimize the registration process.Experiments on four public datasets from brain,cardiac,and lung have demonstrated the general applicability and the accuracy of our algorithm compared with two state-of-the-art methods.展开更多
In order to enhance the contrast of the fused image and reduce the loss of fine details in the process of image fusion,a novel fusion algorithm of infrared and visible images is proposed.First of all,regions of intere...In order to enhance the contrast of the fused image and reduce the loss of fine details in the process of image fusion,a novel fusion algorithm of infrared and visible images is proposed.First of all,regions of interest(RoIs)are detected in two original images by using saliency map.Then,nonsubsampled contourlet transform(NSCT)on both the infrared image and the visible image is performed to get a low-frequency sub-band and a certain amount of high-frequency sub-bands.Subsequently,the coefcients of all sub-bands are classified into four categories based on the result of RoI detection:the region of interest in the low-frequency sub-band(LSRoI),the region of interest in the high-frequency sub-band(HSRoI),the region of non-interest in the low-frequency sub-band(LSNRoI)and the region of non-interest in the high-frequency sub-band(HSNRoI).Fusion rules are customized for each kind of coefcients and fused image is achieved by performing the inverse NSCT to the fused coefcients.Experimental results show that the fusion scheme proposed in this paper achieves better efect than the other fusion algorithms both in visual efect and quantitative metrics.展开更多
A collage is a composite artwork made from the spatial layout of multiple pictures on a canvas,collected from the Internet or user photographs.Collages,usually made by skilled artists,involve a complex manual process,...A collage is a composite artwork made from the spatial layout of multiple pictures on a canvas,collected from the Internet or user photographs.Collages,usually made by skilled artists,involve a complex manual process,especially when searching for component pictures and adjusting their spatial layout to meet artistic requirements.In this paper,we present a visual perception driven method for automatically synthesizing visually pleasing collages.Unlike previous works,we focus on how to design a collage layout which not only provides easy access to the theme of the overall image,but also conforms to human visual perception.To achieve this goal,we formulate the generation of collages as a mapping problem:given a canvas image,first,compute a saliency map for it and a vector field for each sub-region of it.Second,using a divide-and-conquer strategy,generate a series of patch sets from the canvas image,where the salient map and the vector field are used to determine each patch’s size and direction respectively.Third,construct a Gestalt-based energy function to choose the most visually pleasing and orderly patch set as the final layout.Finally,using a semantic-color metric,map the picture set to the patch set to generate the final collage.Extensive experimental and user study results show that this method can generate visual pleasing collages.展开更多
The increasing amount of videos on the Internet and digital libraries highlights the necessity and importance of interactive video services such as automatically associating additional materials(e.g.,advertising logos...The increasing amount of videos on the Internet and digital libraries highlights the necessity and importance of interactive video services such as automatically associating additional materials(e.g.,advertising logos and relevant selling information) with the video content so as to enrich the viewing experience.Toward this end,this paper presents a novel approach for user-targeted video content association(VCA) .In this approach,the salient objects are extracted automatically from the video stream using complementary saliency maps.According to these salient objects,the VCA system can push the related logo images to the users.Since the salient objects often correspond to important video content,the associated images can be considered as content-related.Our VCA system also allows users to associate images to the preferred video content through simple interactions by the mouse and an infrared pen.Moreover,by learning the preference of each user through collecting feedbacks on the pulled or pushed images,the VCA system can provide user-targeted services.Experimental results show that our approach can effectively and efficiently extract the salient objects.Moreover,subjective evaluations show that our system can provide content-related and user-targeted VCA services in a less intrusive way.展开更多
基金a grant from the Basic Science Research Program through the National Research Foundation(NRF)(2021R1F1A1063634)funded by the Ministry of Science and ICT(MSIT)Republic of Korea.This research is supported and funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R410)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors are thankful to the Deanship of Scientific Research at Najran University for funding this work under the Research Group Funding program Grant Code(NU/RG/SERC/12/6).
文摘Advances in machine vision systems have revolutionized applications such as autonomous driving,robotic navigation,and augmented reality.Despite substantial progress,challenges persist,including dynamic backgrounds,occlusion,and limited labeled data.To address these challenges,we introduce a comprehensive methodology toenhance image classification and object detection accuracy.The proposed approach involves the integration ofmultiple methods in a complementary way.The process commences with the application of Gaussian filters tomitigate the impact of noise interference.These images are then processed for segmentation using Fuzzy C-Meanssegmentation in parallel with saliency mapping techniques to find the most prominent regions.The Binary RobustIndependent Elementary Features(BRIEF)characteristics are then extracted fromdata derived fromsaliency mapsand segmented images.For precise object separation,Oriented FAST and Rotated BRIEF(ORB)algorithms areemployed.Genetic Algorithms(GAs)are used to optimize Random Forest classifier parameters which lead toimproved performance.Our method stands out due to its comprehensive approach,adeptly addressing challengessuch as changing backdrops,occlusion,and limited labeled data concurrently.A significant enhancement hasbeen achieved by integrating Genetic Algorithms(GAs)to precisely optimize parameters.This minor adjustmentnot only boosts the uniqueness of our system but also amplifies its overall efficacy.The proposed methodologyhas demonstrated notable classification accuracies of 90.9%and 89.0%on the challenging Corel-1k and MSRCdatasets,respectively.Furthermore,detection accuracies of 87.2%and 86.6%have been attained.Although ourmethod performed well in both datasets it may face difficulties in real-world data especially where datasets havehighly complex backgrounds.Despite these limitations,GAintegration for parameter optimization shows a notablestrength in enhancing the overall adaptability and performance of our system.
基金National Natural Science Foundation of China(No.61761027)。
文摘Classical mathematical morphology operations use a fixed size and shape structuring element to process the whole image.Due to the diversity of image content and the complexity of target structure,for processed image,its shape may be changed and part of the information may be lost.Therefore,we propose a method for constructing salience adaptive morphological structuring elements based on minimum spanning tree(MST).First,the gradient image of the input image is calculated,the edge image is obtained by non-maximum suppression(NMS)of the gradient image,and then chamfer distance transformation is performed on the edge image to obtain a salience map(SM).Second,the radius of structuring element is determined by calculating the maximum and minimum values of SM and then the minimum spanning tree is calculated on the SM.Finally,the radius is used to construct a structuring element whose shape and size adaptively change with the local features of the input image.In addition,the basic morphological operators such as erosion,dilation,opening and closing are redefined using the adaptive structuring elements and then compared with the classical morphological operators.The simulation results show that the proposed method can make full use of the local features of the image and has better processing results in image structure preservation and image filtering.
文摘A new method for automatic salient object segmentation is presented.Salient object segmentation is an important research area in the field of object recognition,image retrieval,image editing,scene reconstruction,and 2D/3D conversion.In this work,salient object segmentation is performed using saliency map and color segmentation.Edge,color and intensity feature are extracted from mean shift segmentation(MSS)image,and saliency map is created using these features.First average saliency per segment image is calculated using the color information from MSS image and generated saliency map.Then,second average saliency per segment image is calculated by applying same procedure for the first image to the thresholding,labeling,and hole-filling applied image.Thresholding,labeling and hole-filling are applied to the mean image of the generated two images to get the final salient object segmentation.The effectiveness of proposed method is proved by showing 80%,89%and 80%of precision,recall and F-measure values from the generated salient object segmentation image and ground truth image.
基金supported by National Natural Science Foundation of China under Grant No.610700800973 Sub-Program Projects under Grant No.2009CB320906+3 种基金National Science and Technology of Major Special Projects under Grant No.2010ZX03004-003S&T Planning Project of Hubei Provincial Department of Education under Grant No. Q20112805H&SPlanning Project of Hubei Provincial Department of Education under Grant No.2011jyte142Science Foundation of HubeiProvincial under Grant No.2010CDB05103
文摘In order to further improve the efficiency of video compression, we introduce a perceptual characteristics of Human Visual System (HVS) to video coding, and propose a novel video coding rate control algorithm based on human visual saliency model in H.264/AVC. Firstly, we modifie Itti's saliency model. Secondly, target bits of each frame are allocated through the correlation of saliency region between the current and previous frame, and the complexity of each MB is modified through the saliency value and its Mean Absolute Difference (MAD) value. Lastly, the algorithm was implemented in JVT JM12.2. Simulation results show that, comparing with traditional rate control algorithm, the proposed one can reduce the coding bit rate and improve the reconstructed video subjective quality, especially for visual saliency region. It is very suitable for wireless video transmission.
基金supported by the National Science Foundation of China(NSFC)underGrants 61876217 and 62176175the Innovative Team of Jiangsu Province under Grant XYDXX-086Jiangsu Postgraduate Research and Innovation Plan(KYCX20_2762).
文摘Interpreting deep neural networks is of great importance to understand and verify deep models for natural language processing(NLP)tasks.However,most existing approaches only focus on improving the performance of models but ignore their interpretability.In this work,we propose a Randomly Wired Graph Neural Network(RWGNN)by using graph to model the structure of Neural Network,which could solve two major problems(word-boundary ambiguity and polysemy)of ChineseNER.Besides,we develop a pipeline to explain the RWGNNby using Saliency Map and Adversarial Attacks.Experimental results demonstrate that our approach can identify meaningful and reasonable interpretations for hidden states of RWGNN.
文摘Underwater imagery and transmission possess numerous challenges like lower signal bandwidth,slower data transmission bit rates,Noise,underwater blue/green light haze etc.These factors distort the estimation of Region of Interest and are prime hurdles in deploying efficient compression techniques.Due to the presence of blue/green light in underwater imagery,shape adaptive or block-wise compression techniques faces failures as it becomes very difficult to estimate the compression levels/coefficients for a particular region.This method is proposed to efficiently deploy an Extreme Learning Machine(ELM)model-based shape adaptive Discrete Cosine Transformation(DCT)for underwater images.Underwater color image restoration techniques based on veiling light estimation and restoration of images followed by Saliency map estimation based on Gray Level Cooccurrence Matrix(GLCM)features are explained.An ELM network is modeled which takes two parameters,signal strength and saliency value of the region to be compressed and level of compression(DCT coefficients and compression steps)are predicted by ELM.This method ensures lesser errors in the Region of Interest and a better trade-off between available signal strength and compression level.
文摘Diabetic Retinopathy(DR)is a type of disease in eyes as a result of a diabetic condition that ends up damaging the retina,leading to blindness or loss of vision.Morphological and physiological retinal variations involving slowdown of blood flow in the retina,elevation of leukocyte cohesion,basement membrane dystrophy,and decline of pericyte cells,develop.As DR in its initial stage has no symptoms,early detection and automated diagnosis can prevent further visual damage.In this research,using a Deep Neural Network(DNN),segmentation methods are proposed to detect the retinal defects such as exudates,hemorrhages,microaneurysms from digital fundus images and then the conditions are classified accurately to identify the grades as mild,moderate,severe,no PDR,PDR in DR.Initially,saliency detection is applied on color images to detect maximum salient foreground objects from the background.Next,structure tensor is applied powerfully to enhance the local patterns of edge elements and intensity changes that occur on edges of the object.Finally,active contours approximation is performed using gradient descent to segment the lesions from the images.Afterwards,the output images from the proposed segmentation process are subjected to evaluate the ratio between the total contour area and the total true contour arc length to label the classes as mild,moderate,severe,No PDR and PDR.Based on the computed ratio obtained from segmented images,the severity levels were identified.Meanwhile,statistical parameters like the mean and the standard deviation of pixel intensities,mean of hue,saturation and deviation clustering,are estimated through K-means,which are computed as features from the output images of the proposed segmentation process.Using these derived feature sets as input to the classifier,the classification of DR was performed.Finally,a VGG-19 deep neural network was trained and tested using the derived feature sets from the KAGGLE fundus image dataset containing 35,126 images in total.The VGG-19 is trained with features extracted from 20,000 images and tested with features extracted from 5,000 images to achieve a sensitivity of 82%and an accuracy of 96%.The proposed system was able to label and classify DR grades automatically.
基金spported by he key-area andevelopment programof Guangdongprovince(Grant No.2019B01013700)National Nature Science Foundation of China(Grant No.61702347)+2 种基金Natural Science Foundation of Hebei Province(Grant No.F2017210161)Science andTechnology Research Project of Higher Education in Hebei Province(Grant No.QN2017132)Graduate Innovation Program(Grant No.YC2022051).
文摘With the improvement of people’s security awareness,numerous monitoring equipment has been put into use,resulting in the explosive growth of surveillance video data.Key frame extraction technology is a paramount technology for improving video storage efficiency and enhancing the accuracy of video retrieval.It can extract key frame sets that can express video content from massive videos.However,the existing key frame extraction algorithms of surveillance video still have deficiencies,such as the destruction of image information integrity and the inability to extract key frames accurately.To this end,this paper proposes a key frame extraction algorithm of surveillance video based on quaternion Fourier saliency detection.Firstly,the algorithm used colors,and intensity features to perform quaternion Fourier transform on surveillance video sequences.Next,the phase spectrum of the quaternion Fourier transformed image was obtained,and he image visual saliency map was obtained according to the quaternion Fourier phase spectrum.Then,the image visual saliency map of two adjacent frames is used to characterize the change of target motion state.Finally,the frames that can accurately express the motion state of the target are selected as key frames.The experimental results show that the method proposed in this paper can accurately capture the changes of the local motion state of the target while maintaining the integrity of the image information.
基金the National Natural Science Foundation of China(No.61976091)。
文摘Medical image registration is widely used in image-guided therapy and image-guided surgery to estimate spatial correspondence between planning and treatment images.However,most methods based on intensity have the problems of matching ambiguity and ignoring the influence of weak correspondence areas on the overall registration.In this study,we propose a novel general-purpose registration algorithm based on free-form deformation by non-subsampled contourlet transform and saliency map,which can reduce the matching ambiguities and maintain the topological structure of weak correspondence areas.An optimization method based on Markov random fields is used to optimize the registration process.Experiments on four public datasets from brain,cardiac,and lung have demonstrated the general applicability and the accuracy of our algorithm compared with two state-of-the-art methods.
基金the National Natural Science Foundation of China(No.61105022)the Research Fund for the Doctoral Program of Higher Education of China(No.20110073120028)the Jiangsu Provincial Natural Science Foundation(No.BK2012296)
文摘In order to enhance the contrast of the fused image and reduce the loss of fine details in the process of image fusion,a novel fusion algorithm of infrared and visible images is proposed.First of all,regions of interest(RoIs)are detected in two original images by using saliency map.Then,nonsubsampled contourlet transform(NSCT)on both the infrared image and the visible image is performed to get a low-frequency sub-band and a certain amount of high-frequency sub-bands.Subsequently,the coefcients of all sub-bands are classified into four categories based on the result of RoI detection:the region of interest in the low-frequency sub-band(LSRoI),the region of interest in the high-frequency sub-band(HSRoI),the region of non-interest in the low-frequency sub-band(LSNRoI)and the region of non-interest in the high-frequency sub-band(HSNRoI).Fusion rules are customized for each kind of coefcients and fused image is achieved by performing the inverse NSCT to the fused coefcients.Experimental results show that the fusion scheme proposed in this paper achieves better efect than the other fusion algorithms both in visual efect and quantitative metrics.
基金supported by the National Natural Science Foundation of China(No.61772440)the Aeronautical Science Foundation of China(No.20165168007)Science and Technology of Electrooptic Control Laboratory.
文摘A collage is a composite artwork made from the spatial layout of multiple pictures on a canvas,collected from the Internet or user photographs.Collages,usually made by skilled artists,involve a complex manual process,especially when searching for component pictures and adjusting their spatial layout to meet artistic requirements.In this paper,we present a visual perception driven method for automatically synthesizing visually pleasing collages.Unlike previous works,we focus on how to design a collage layout which not only provides easy access to the theme of the overall image,but also conforms to human visual perception.To achieve this goal,we formulate the generation of collages as a mapping problem:given a canvas image,first,compute a saliency map for it and a vector field for each sub-region of it.Second,using a divide-and-conquer strategy,generate a series of patch sets from the canvas image,where the salient map and the vector field are used to determine each patch’s size and direction respectively.Third,construct a Gestalt-based energy function to choose the most visually pleasing and orderly patch set as the final layout.Finally,using a semantic-color metric,map the picture set to the patch set to generate the final collage.Extensive experimental and user study results show that this method can generate visual pleasing collages.
基金Project supported by the CADAL Project and the National Natural Science Foundation of China(Nos.60973055 and 90820003)
文摘The increasing amount of videos on the Internet and digital libraries highlights the necessity and importance of interactive video services such as automatically associating additional materials(e.g.,advertising logos and relevant selling information) with the video content so as to enrich the viewing experience.Toward this end,this paper presents a novel approach for user-targeted video content association(VCA) .In this approach,the salient objects are extracted automatically from the video stream using complementary saliency maps.According to these salient objects,the VCA system can push the related logo images to the users.Since the salient objects often correspond to important video content,the associated images can be considered as content-related.Our VCA system also allows users to associate images to the preferred video content through simple interactions by the mouse and an infrared pen.Moreover,by learning the preference of each user through collecting feedbacks on the pulled or pushed images,the VCA system can provide user-targeted services.Experimental results show that our approach can effectively and efficiently extract the salient objects.Moreover,subjective evaluations show that our system can provide content-related and user-targeted VCA services in a less intrusive way.