In this paper, we study autonomous landing scene recognition with knowledge transfer for drones. Considering the difficulties in aerial remote sensing, especially that some scenes are extremely similar, or the same sc...In this paper, we study autonomous landing scene recognition with knowledge transfer for drones. Considering the difficulties in aerial remote sensing, especially that some scenes are extremely similar, or the same scene has different representations in different altitudes, we employ a deep convolutional neural network(CNN) based on knowledge transfer and fine-tuning to solve the problem. Then, LandingScenes-7 dataset is established and divided into seven classes. Moreover, there is still a novelty detection problem in the classifier, and we address this by excluding other landing scenes using the approach of thresholding in the prediction stage. We employ the transfer learning method based on ResNeXt-50 backbone with the adaptive momentum(ADAM) optimization algorithm. We also compare ResNet-50 backbone and the momentum stochastic gradient descent(SGD) optimizer. Experiment results show that ResNeXt-50 based on the ADAM optimization algorithm has better performance. With a pre-trained model and fine-tuning, it can achieve 97.845 0% top-1 accuracy on the LandingScenes-7dataset, paving the way for drones to autonomously learn landing scenes.展开更多
Scene recognition is a popular open problem in the computer vision field.Among lots of methods proposed in recent years,Convolutional Neural Network(CNN)based approaches achieve the best performance in scene recogniti...Scene recognition is a popular open problem in the computer vision field.Among lots of methods proposed in recent years,Convolutional Neural Network(CNN)based approaches achieve the best performance in scene recognition.We propose in this paper an advanced feature fusion algorithm using Multiple Convolutional Neural Network(Multi-CNN)for scene recognition.Unlike existing works that usually use individual convolutional neural network,a fusion of multiple different convolutional neural networks is applied for scene recognition.Firstly,we split training images in two directions and apply to three deep CNN model,and then extract features from the last full-connected(FC)layer and probabilistic layer on each model.Finally,feature vectors are fused with different fusion strategies in groups forwarded into SoftMax classifier.Our proposed algorithm is evaluated on three scene datasets for scene recognition.The experimental results demonstrate the effectiveness of proposed algorithm compared with other state-of-art approaches.展开更多
Scene recognition is a fundamental task in computer vision,which generally includes three vital stages,namely feature extraction,feature transformation and classification.Early research mainly focuses on feature extra...Scene recognition is a fundamental task in computer vision,which generally includes three vital stages,namely feature extraction,feature transformation and classification.Early research mainly focuses on feature extraction,but with the rise of Convolutional Neural Networks(CNNs),more and more feature transformation methods are proposed based on CNN features.In this work,a novel feature transformation algorithm called Graph Encoded Local Discriminative Region Representation(GEDRR)is proposed to find discriminative local representations for scene images and explore the relationship between the discriminative regions.In addition,we propose a method using the multi-head attention module to enhance and fuse convolutional feature maps.Combining the two methods and the global representation,a scene recognition framework called Global and Graph Encoded Local Discriminative Region Representation(G2ELDR2)is proposed.The experimental results on three scene datasets demonstrate the effectiveness of our model,which outperforms many state-of-the-arts.展开更多
In the context of improved navigation for micro aerial vehicles,a new scene recognition visual descriptor,called spatial color gist wavelet descriptor(SCGWD),is proposed.SCGWD was developed by combining proposed Ohta ...In the context of improved navigation for micro aerial vehicles,a new scene recognition visual descriptor,called spatial color gist wavelet descriptor(SCGWD),is proposed.SCGWD was developed by combining proposed Ohta color-GIST wavelet descriptors with census transform histogram(CENTRIST)spatial pyramid representation descriptors for categorizing indoor versus outdoor scenes.A binary and multiclass support vector machine(SVM)classifier with linear and non-linear kernels was used to classify indoor versus outdoor scenes and indoor scenes,respectively.In this paper,we have also discussed the feature extraction methodology of several,state-of-the-art visual descriptors,and four proposed visual descriptors(Ohta color-GIST descriptors,Ohta color-GIST wavelet descriptors,enhanced Ohta color histogram descriptors,and SCGWDs),in terms of experimental perspectives.The proposed enhanced Ohta color histogram descriptors,Ohta color-GIST descriptors,Ohta color-GIST wavelet descriptors,SCGWD,and state-of-the-art visual descriptors were evaluated,using the Indian Institute of Technology Madras Scene Classification Image Database two,an Indoor-Outdoor Dataset,and the Massachusetts Institute of Technology indoor scene classification dataset[(MIT)-67].Experimental results showed that the indoor versus outdoor scene recognition algorithm,employing SVM with SCGWDs,produced the highest classification rates(CRs)—95.48%and 99.82%using radial basis function kernel(RBF)kernel and 95.29%and 99.45%using linear kernel for the IITM SCID2 and Indoor-Outdoor datasets,respectively.The lowest CRs—2.08%and 4.92%,respectively—were obtained when RBF and linear kernels were used with the MIT-67 dataset.In addition,higher CRs,precision,recall,and area under the receiver operating characteristic curve values were obtained for the proposed SCGWDs,in comparison with state-of-the-art visual descriptors.展开更多
In this paper,we study scene image recognition with knowledge transfer for drone navigation.We divide navigation scenes into three macro-classes,namely outdoor special scenes(OSSs),the space from indoors to outdoors o...In this paper,we study scene image recognition with knowledge transfer for drone navigation.We divide navigation scenes into three macro-classes,namely outdoor special scenes(OSSs),the space from indoors to outdoors or from outdoors to indoors transitional scenes(TSs),and others.However,there are difficulties in how to recognize the TSs,to this end,we employ deep convolutional neural network(CNN)based on knowledge transfer,techniques for image augmentation,and fine tuning to solve the issue.Moreover,there is still a novelty detection prob-lem in the classifier,and we use global navigation satellite sys-tems(GNSS)to solve it in the prediction stage.Experiment results show our method,with a pre-trained model and fine tun-ing,can achieve 91.3196%top-1 accuracy on Scenes21 dataset,paving the way for drones to learn to understand the scenes around them autonomously.展开更多
This paper proposes a simple and discriminative framework, using graphical model and 3D geometry to understand the diversity of urban scenes with varying viewpoints. Our algorithm constructs a conditional random field...This paper proposes a simple and discriminative framework, using graphical model and 3D geometry to understand the diversity of urban scenes with varying viewpoints. Our algorithm constructs a conditional random field (CRF) network using over-segmented superpixels and learns the appearance model from different set of features for specific classes of our interest. Also, we introduce a training algorithm to learn a model for edge potential among these superpixel areas based on their feature difference. The proposed algorithm gives competitive and visually pleasing results for urban scene segmentation. We show the inference from our trained network improves the class labeling performance compared to the result when using the appearance model solely.展开更多
Recognizing scene information in images or has attracted much attention in computer vision or videos, such as locating the objects and answering "Where am research field. Many existing scene recognition methods focus...Recognizing scene information in images or has attracted much attention in computer vision or videos, such as locating the objects and answering "Where am research field. Many existing scene recognition methods focus on static images, and cannot achieve satisfactory results on videos which contain more complex scenes features than images. In this paper, we propose a robust movie scene recognition approach based on panoramic frame and representative feature patch. More specifically, the movie is first efficiently segmented into video shots and scenes. Secondly, we introduce a novel key-frame extraction method using panoramic frame and also a local feature extraction process is applied to get the representative feature patches (RFPs) in each video shot. Thirdly, a Latent Dirichlet Allocation (LDA) based recognition model is trained to recognize the scene within each individual video scene clip. The correlations between video clips are considered to enhance the recognition performance. When our proposed approach is implemented to recognize the scene in realistic movies, the experimental results shows that it can achieve satisfactory performance.展开更多
An automatic approach is presented to track a wide screen in a multipurpose hall video scene. Once the screen is located, this system also generates the temporal rate of change by using the edge detection based method...An automatic approach is presented to track a wide screen in a multipurpose hall video scene. Once the screen is located, this system also generates the temporal rate of change by using the edge detection based method. Our approach adopts a scene segmentation algorithm that explores visual features (texture) and depth information to perform efficient screen localization. The cropped region which refers to the wide screen undergoes salient visual cues extraction to retrieve the emphasized changes required in rate-of- change computation. In addition to video document indexing and retrieval, this work can improve the machine vision capability in the behavior analysis and pattern recognition.展开更多
Scene text recognition(STR)is the task of recognizing character sequences in natural scenes.Although STR method has been greatly developed,the existing methods still can't recognize any shape of text,such as very ...Scene text recognition(STR)is the task of recognizing character sequences in natural scenes.Although STR method has been greatly developed,the existing methods still can't recognize any shape of text,such as very rich curve text or rotating text in daily life,irregular scene text has complex layout in two-dimensional space,which is used to recognize scene text in the past Recently,some recognizers correct irregular text to regular text image with approximate 1D layout,or convert 2D image feature mapping to one-dimensional feature sequence.Although these methods have achieved good performance,their robustness and accuracy are limited due to the loss of spatial information in the process of two-dimensional to one-dimensional transformation.In this paper,we proposes a framework to directly convert the irregular text of two-dimensional layout into character sequence by using the relationship attention module to capture the correlation of feature mapping Through a large number of experiments on multiple common benchmarks,our method can effectively identify regular and irregular scene text,and is superior to the previous methods in accuracy.展开更多
基金supported by the National Natural Science Foundation of China (62103104)the China Postdoctoral Science Foundation(2021M690615)。
文摘In this paper, we study autonomous landing scene recognition with knowledge transfer for drones. Considering the difficulties in aerial remote sensing, especially that some scenes are extremely similar, or the same scene has different representations in different altitudes, we employ a deep convolutional neural network(CNN) based on knowledge transfer and fine-tuning to solve the problem. Then, LandingScenes-7 dataset is established and divided into seven classes. Moreover, there is still a novelty detection problem in the classifier, and we address this by excluding other landing scenes using the approach of thresholding in the prediction stage. We employ the transfer learning method based on ResNeXt-50 backbone with the adaptive momentum(ADAM) optimization algorithm. We also compare ResNet-50 backbone and the momentum stochastic gradient descent(SGD) optimizer. Experiment results show that ResNeXt-50 based on the ADAM optimization algorithm has better performance. With a pre-trained model and fine-tuning, it can achieve 97.845 0% top-1 accuracy on the LandingScenes-7dataset, paving the way for drones to autonomously learn landing scenes.
文摘Scene recognition is a popular open problem in the computer vision field.Among lots of methods proposed in recent years,Convolutional Neural Network(CNN)based approaches achieve the best performance in scene recognition.We propose in this paper an advanced feature fusion algorithm using Multiple Convolutional Neural Network(Multi-CNN)for scene recognition.Unlike existing works that usually use individual convolutional neural network,a fusion of multiple different convolutional neural networks is applied for scene recognition.Firstly,we split training images in two directions and apply to three deep CNN model,and then extract features from the last full-connected(FC)layer and probabilistic layer on each model.Finally,feature vectors are fused with different fusion strategies in groups forwarded into SoftMax classifier.Our proposed algorithm is evaluated on three scene datasets for scene recognition.The experimental results demonstrate the effectiveness of proposed algorithm compared with other state-of-art approaches.
基金This research is partially supported by the Programme for Professor of Special Appointment(Eastern Scholar)at Shanghai Institutions of Higher Learning,and also partially supported by JSPS KAKENHI Grant No.15K00159.
文摘Scene recognition is a fundamental task in computer vision,which generally includes three vital stages,namely feature extraction,feature transformation and classification.Early research mainly focuses on feature extraction,but with the rise of Convolutional Neural Networks(CNNs),more and more feature transformation methods are proposed based on CNN features.In this work,a novel feature transformation algorithm called Graph Encoded Local Discriminative Region Representation(GEDRR)is proposed to find discriminative local representations for scene images and explore the relationship between the discriminative regions.In addition,we propose a method using the multi-head attention module to enhance and fuse convolutional feature maps.Combining the two methods and the global representation,a scene recognition framework called Global and Graph Encoded Local Discriminative Region Representation(G2ELDR2)is proposed.The experimental results on three scene datasets demonstrate the effectiveness of our model,which outperforms many state-of-the-arts.
文摘In the context of improved navigation for micro aerial vehicles,a new scene recognition visual descriptor,called spatial color gist wavelet descriptor(SCGWD),is proposed.SCGWD was developed by combining proposed Ohta color-GIST wavelet descriptors with census transform histogram(CENTRIST)spatial pyramid representation descriptors for categorizing indoor versus outdoor scenes.A binary and multiclass support vector machine(SVM)classifier with linear and non-linear kernels was used to classify indoor versus outdoor scenes and indoor scenes,respectively.In this paper,we have also discussed the feature extraction methodology of several,state-of-the-art visual descriptors,and four proposed visual descriptors(Ohta color-GIST descriptors,Ohta color-GIST wavelet descriptors,enhanced Ohta color histogram descriptors,and SCGWDs),in terms of experimental perspectives.The proposed enhanced Ohta color histogram descriptors,Ohta color-GIST descriptors,Ohta color-GIST wavelet descriptors,SCGWD,and state-of-the-art visual descriptors were evaluated,using the Indian Institute of Technology Madras Scene Classification Image Database two,an Indoor-Outdoor Dataset,and the Massachusetts Institute of Technology indoor scene classification dataset[(MIT)-67].Experimental results showed that the indoor versus outdoor scene recognition algorithm,employing SVM with SCGWDs,produced the highest classification rates(CRs)—95.48%and 99.82%using radial basis function kernel(RBF)kernel and 95.29%and 99.45%using linear kernel for the IITM SCID2 and Indoor-Outdoor datasets,respectively.The lowest CRs—2.08%and 4.92%,respectively—were obtained when RBF and linear kernels were used with the MIT-67 dataset.In addition,higher CRs,precision,recall,and area under the receiver operating characteristic curve values were obtained for the proposed SCGWDs,in comparison with state-of-the-art visual descriptors.
基金supported by the National Natural Science Foundation of China(62103104)the Natural Science Foundation of Jiangsu Province(BK20210215)the China Postdoctoral Science Foundation(2021M690615).
文摘In this paper,we study scene image recognition with knowledge transfer for drone navigation.We divide navigation scenes into three macro-classes,namely outdoor special scenes(OSSs),the space from indoors to outdoors or from outdoors to indoors transitional scenes(TSs),and others.However,there are difficulties in how to recognize the TSs,to this end,we employ deep convolutional neural network(CNN)based on knowledge transfer,techniques for image augmentation,and fine tuning to solve the issue.Moreover,there is still a novelty detection prob-lem in the classifier,and we use global navigation satellite sys-tems(GNSS)to solve it in the prediction stage.Experiment results show our method,with a pre-trained model and fine tun-ing,can achieve 91.3196%top-1 accuracy on Scenes21 dataset,paving the way for drones to learn to understand the scenes around them autonomously.
基金supported by the National Natural Science Foundation of China (60803103)Research Found For Doctoral Program of Higher Education of China (200800131026)Fundamental Research Funds for the Central Universities (2009RC0603, 2009RC0601)
文摘This paper proposes a simple and discriminative framework, using graphical model and 3D geometry to understand the diversity of urban scenes with varying viewpoints. Our algorithm constructs a conditional random field (CRF) network using over-segmented superpixels and learns the appearance model from different set of features for specific classes of our interest. Also, we introduce a training algorithm to learn a model for edge potential among these superpixel areas based on their feature difference. The proposed algorithm gives competitive and visually pleasing results for urban scene segmentation. We show the inference from our trained network improves the class labeling performance compared to the result when using the appearance model solely.
基金supported by the National Funds for Distinguished Young Scientists of China under Grant No.60925010the Specialized Research Fund for the Doctoral Program of Higher Education of China under Grant No.20120005130002+1 种基金the Cosponsored Project of Beijing Committee of Education,the Funds for Creative Research Groups of China under Grant No.61121001the Program for Changjiang Scholars and Innovative Research Team in University of China under Grant No.IRT1049
文摘Recognizing scene information in images or has attracted much attention in computer vision or videos, such as locating the objects and answering "Where am research field. Many existing scene recognition methods focus on static images, and cannot achieve satisfactory results on videos which contain more complex scenes features than images. In this paper, we propose a robust movie scene recognition approach based on panoramic frame and representative feature patch. More specifically, the movie is first efficiently segmented into video shots and scenes. Secondly, we introduce a novel key-frame extraction method using panoramic frame and also a local feature extraction process is applied to get the representative feature patches (RFPs) in each video shot. Thirdly, a Latent Dirichlet Allocation (LDA) based recognition model is trained to recognize the scene within each individual video scene clip. The correlations between video clips are considered to enhance the recognition performance. When our proposed approach is implemented to recognize the scene in realistic movies, the experimental results shows that it can achieve satisfactory performance.
文摘An automatic approach is presented to track a wide screen in a multipurpose hall video scene. Once the screen is located, this system also generates the temporal rate of change by using the edge detection based method. Our approach adopts a scene segmentation algorithm that explores visual features (texture) and depth information to perform efficient screen localization. The cropped region which refers to the wide screen undergoes salient visual cues extraction to retrieve the emphasized changes required in rate-of- change computation. In addition to video document indexing and retrieval, this work can improve the machine vision capability in the behavior analysis and pattern recognition.
文摘Scene text recognition(STR)is the task of recognizing character sequences in natural scenes.Although STR method has been greatly developed,the existing methods still can't recognize any shape of text,such as very rich curve text or rotating text in daily life,irregular scene text has complex layout in two-dimensional space,which is used to recognize scene text in the past Recently,some recognizers correct irregular text to regular text image with approximate 1D layout,or convert 2D image feature mapping to one-dimensional feature sequence.Although these methods have achieved good performance,their robustness and accuracy are limited due to the loss of spatial information in the process of two-dimensional to one-dimensional transformation.In this paper,we proposes a framework to directly convert the irregular text of two-dimensional layout into character sequence by using the relationship attention module to capture the correlation of feature mapping Through a large number of experiments on multiple common benchmarks,our method can effectively identify regular and irregular scene text,and is superior to the previous methods in accuracy.