Scene recognition is a fundamental task in computer vision,which generally includes three vital stages,namely feature extraction,feature transformation and classification.Early research mainly focuses on feature extra...Scene recognition is a fundamental task in computer vision,which generally includes three vital stages,namely feature extraction,feature transformation and classification.Early research mainly focuses on feature extraction,but with the rise of Convolutional Neural Networks(CNNs),more and more feature transformation methods are proposed based on CNN features.In this work,a novel feature transformation algorithm called Graph Encoded Local Discriminative Region Representation(GEDRR)is proposed to find discriminative local representations for scene images and explore the relationship between the discriminative regions.In addition,we propose a method using the multi-head attention module to enhance and fuse convolutional feature maps.Combining the two methods and the global representation,a scene recognition framework called Global and Graph Encoded Local Discriminative Region Representation(G2ELDR2)is proposed.The experimental results on three scene datasets demonstrate the effectiveness of our model,which outperforms many state-of-the-arts.展开更多
Scene recognition is a popular open problem in the computer vision field.Among lots of methods proposed in recent years,Convolutional Neural Network(CNN)based approaches achieve the best performance in scene recogniti...Scene recognition is a popular open problem in the computer vision field.Among lots of methods proposed in recent years,Convolutional Neural Network(CNN)based approaches achieve the best performance in scene recognition.We propose in this paper an advanced feature fusion algorithm using Multiple Convolutional Neural Network(Multi-CNN)for scene recognition.Unlike existing works that usually use individual convolutional neural network,a fusion of multiple different convolutional neural networks is applied for scene recognition.Firstly,we split training images in two directions and apply to three deep CNN model,and then extract features from the last full-connected(FC)layer and probabilistic layer on each model.Finally,feature vectors are fused with different fusion strategies in groups forwarded into SoftMax classifier.Our proposed algorithm is evaluated on three scene datasets for scene recognition.The experimental results demonstrate the effectiveness of proposed algorithm compared with other state-of-art approaches.展开更多
Image retrieval has become more and more important because of the explosive growth of images on the Internet.Traditional image retrieval methods have limited image retrieval performance due to the poor image expressio...Image retrieval has become more and more important because of the explosive growth of images on the Internet.Traditional image retrieval methods have limited image retrieval performance due to the poor image expression abhility of visual feature and high dimension of feature.Hashing is a widely-used method for Approximate Nearest Neighbor(ANN)search due to its rapidity and timeliness.Meanwhile,Convolutional Neural Networks(CNNs)have strong discriminative characteristics which are used for image classification.In this paper,we propose a CNN architecture based on improved deep supervised hashing(IDSH)method,by which the binary compact codes can be generated directly.The main contributions of this paper are as follows:first,we add a Batch Normalization(BN)layer before each activation layer to prevent the gradient from vanishing and improve the training speed;secondly,we use Divide-and-Encode Module to map image features to approximate hash codes;finally,we adopt center loss to optimize training.Extensive experimental results on four large-scale datasets:MNIST,CIFAR-10,NUS-WIDE and SVHN demonstrate the effectiveness of the proposed method compared with other state-of-the-art hashing methods.展开更多
基金This research is partially supported by the Programme for Professor of Special Appointment(Eastern Scholar)at Shanghai Institutions of Higher Learning,and also partially supported by JSPS KAKENHI Grant No.15K00159.
文摘Scene recognition is a fundamental task in computer vision,which generally includes three vital stages,namely feature extraction,feature transformation and classification.Early research mainly focuses on feature extraction,but with the rise of Convolutional Neural Networks(CNNs),more and more feature transformation methods are proposed based on CNN features.In this work,a novel feature transformation algorithm called Graph Encoded Local Discriminative Region Representation(GEDRR)is proposed to find discriminative local representations for scene images and explore the relationship between the discriminative regions.In addition,we propose a method using the multi-head attention module to enhance and fuse convolutional feature maps.Combining the two methods and the global representation,a scene recognition framework called Global and Graph Encoded Local Discriminative Region Representation(G2ELDR2)is proposed.The experimental results on three scene datasets demonstrate the effectiveness of our model,which outperforms many state-of-the-arts.
文摘Scene recognition is a popular open problem in the computer vision field.Among lots of methods proposed in recent years,Convolutional Neural Network(CNN)based approaches achieve the best performance in scene recognition.We propose in this paper an advanced feature fusion algorithm using Multiple Convolutional Neural Network(Multi-CNN)for scene recognition.Unlike existing works that usually use individual convolutional neural network,a fusion of multiple different convolutional neural networks is applied for scene recognition.Firstly,we split training images in two directions and apply to three deep CNN model,and then extract features from the last full-connected(FC)layer and probabilistic layer on each model.Finally,feature vectors are fused with different fusion strategies in groups forwarded into SoftMax classifier.Our proposed algorithm is evaluated on three scene datasets for scene recognition.The experimental results demonstrate the effectiveness of proposed algorithm compared with other state-of-art approaches.
文摘Image retrieval has become more and more important because of the explosive growth of images on the Internet.Traditional image retrieval methods have limited image retrieval performance due to the poor image expression abhility of visual feature and high dimension of feature.Hashing is a widely-used method for Approximate Nearest Neighbor(ANN)search due to its rapidity and timeliness.Meanwhile,Convolutional Neural Networks(CNNs)have strong discriminative characteristics which are used for image classification.In this paper,we propose a CNN architecture based on improved deep supervised hashing(IDSH)method,by which the binary compact codes can be generated directly.The main contributions of this paper are as follows:first,we add a Batch Normalization(BN)layer before each activation layer to prevent the gradient from vanishing and improve the training speed;secondly,we use Divide-and-Encode Module to map image features to approximate hash codes;finally,we adopt center loss to optimize training.Extensive experimental results on four large-scale datasets:MNIST,CIFAR-10,NUS-WIDE and SVHN demonstrate the effectiveness of the proposed method compared with other state-of-the-art hashing methods.