Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of time-frequency info...Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of time-frequency information. In this study, we present an approach for exploring the benefits of deep scalogram representations, extracted in segments from an audio stream. The approach presented firstly transforms the segmented acoustic scenes into bump and morse scalograms, as well as spectrograms; secondly, the spectrograms or scalograms are sent into pre-trained convolutional neural networks; thirdly,the features extracted from a subsequent fully connected layer are fed into(bidirectional) gated recurrent neural networks, which are followed by a single highway layer and a softmax layer;finally, predictions from these three systems are fused by a margin sampling value strategy. We then evaluate the proposed approach using the acoustic scene classification data set of 2017 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events(DCASE). On the evaluation set, an accuracy of 64.0 % from bidirectional gated recurrent neural networks is obtained when fusing the spectrogram and the bump scalogram, which is an improvement on the 61.0 % baseline result provided by the DCASE 2017 organisers. This result shows that extracted bump scalograms are capable of improving the classification accuracy,when fusing with a spectrogram-based system.展开更多
Latest advancements in the integration of camera sensors paves a way for newUnmannedAerialVehicles(UAVs)applications such as analyzing geographical(spatial)variations of earth science in mitigating harmful environment...Latest advancements in the integration of camera sensors paves a way for newUnmannedAerialVehicles(UAVs)applications such as analyzing geographical(spatial)variations of earth science in mitigating harmful environmental impacts and climate change.UAVs have achieved significant attention as a remote sensing environment,which captures high-resolution images from different scenes such as land,forest fire,flooding threats,road collision,landslides,and so on to enhance data analysis and decision making.Dynamic scene classification has attracted much attention in the examination of earth data captured by UAVs.This paper proposes a new multi-modal fusion based earth data classification(MMF-EDC)model.The MMF-EDC technique aims to identify the patterns that exist in the earth data and classifies them into appropriate class labels.The MMF-EDC technique involves a fusion of histogram of gradients(HOG),local binary patterns(LBP),and residual network(ResNet)models.This fusion process integrates many feature vectors and an entropy based fusion process is carried out to enhance the classification performance.In addition,the quantum artificial flora optimization(QAFO)algorithm is applied as a hyperparameter optimization technique.The AFO algorithm is inspired by the reproduction and the migration of flora helps to decide the optimal parameters of the ResNet model namely learning rate,number of hidden layers,and their number of neurons.Besides,Variational Autoencoder(VAE)based classification model is applied to assign appropriate class labels for a useful set of feature vectors.The proposedMMF-EDCmodel has been tested using UCM and WHU-RS datasets.The proposed MMFEDC model attains exhibits promising classification results on the applied remote sensing images with the accuracy of 0.989 and 0.994 on the test UCM and WHU-RS dataset respectively.展开更多
With the rapid development of computer technology,millions of images are produced everyday by different sources.How to efficiently process these images and accurately discern the scene in them becomes an important but...With the rapid development of computer technology,millions of images are produced everyday by different sources.How to efficiently process these images and accurately discern the scene in them becomes an important but tough task.In this paper,we propose a novel supervised learning framework based on proposed adaptive binary coding for scene classification.Specifically,we first extract some high-level features of images under consideration based on available models trained on public datasets.Then,we further design a binary encoding method called one-hot encoding to make the feature representation more efficient.Benefiting from the proposed adaptive binary coding,our method is free of time to train or fine-tune the deep network and can effectively handle different applications.Experimental results on three public datasets,i.e.,UIUC sports event dataset,MIT Indoor dataset,and UC Merced dataset in terms of three different classifiers,demonstrate that our method is superior to the state-of-the-art methods with large margins.展开更多
The process of human natural scene categorization consists of two correlated stages: visual perception and visual cognition of natural scenes.Inspired by this fact,we propose a biologically plausible approach for natu...The process of human natural scene categorization consists of two correlated stages: visual perception and visual cognition of natural scenes.Inspired by this fact,we propose a biologically plausible approach for natural scene image classification.This approach consists of one visual perception model and two visual cognition models.The visual perception model,composed of two steps,is used to extract discriminative features from natural scene images.In the first step,we mimic the oriented and bandpass properties of human primary visual cortex by a special complex wavelets transform,which can decompose a natural scene image into a series of 2D spatial structure signals.In the second step,a hybrid statistical feature extraction method is used to generate gist features from those 2D spatial structure signals.Then we design a cognitive feedback model to realize adaptive optimization for the visual perception model.At last,we build a multiple semantics based cognition model to imitate human cognitive mode in rapid natural scene categorization.Experiments on natural scene datasets show that the proposed method achieves high efficiency and accuracy for natural scene classification.展开更多
Deep learning significantly improves the accuracy of remote sensing image scene classification,benefiting from the large-scale datasets.However,annotating the remote sensing images is time-consuming and even tough for...Deep learning significantly improves the accuracy of remote sensing image scene classification,benefiting from the large-scale datasets.However,annotating the remote sensing images is time-consuming and even tough for experts.Deep neural networks trained using a few labeled samples usually generalize less to new unseen images.In this paper,we propose a semi-supervised approach for remote sensing image scene classification based on the prototype-based consistency,by exploring massive unlabeled images.To this end,we,first,propose a feature enhancement module to extract discriminative features.This is achieved by focusing the model on the foreground areas.Then,the prototype-based classifier is introduced to the framework,which is used to acquire consistent feature representations.We conduct a series of experiments on NWPU-RESISC45 and Aerial Image Dataset(AID).Our method improves the State-Of-The-Art(SOTA)method on NWPU-RESISC45 from 92.03%to 93.08%and on AID from 94.25%to 95.24%in terms of accuracy.展开更多
Remote sensing image scene classification and remote sensing technology applications are hot research topics.Although CNN-based models have reached high average accuracy,some classes are still misclassified,such as“f...Remote sensing image scene classification and remote sensing technology applications are hot research topics.Although CNN-based models have reached high average accuracy,some classes are still misclassified,such as“freeway,”“spare residential,”and“commercial_area.”These classes contain typical decisive features,spatial-relation features,and mixed decisive and spatial-relation features,which limit high-quality image scene classification.To address this issue,this paper proposes a Grad-CAM and capsule network hybrid method for image scene classification.The Grad-CAM and capsule network structures have the potential to recognize decisive features and spatial-relation features,respectively.By using a pre-trained model,hybrid structure,and structure adjustment,the proposed model can recognize both decisive and spatial-relation features.A group of experiments is designed on three popular data sets with increasing classification difficulties.In the most advanced experiment,92.67%average accuracy is achieved.Specifically,83%,75%,and 86%accuracies are obtained in the classes of“church,”“palace,”and“commercial_area,”respectively.This research demonstrates that the hybrid structure can effectively improve performance by considering both decisive and spatial-relation features.Therefore,Grad-CAM-CapsNet is a promising and powerful structure for image scene classification.展开更多
Recently, deep neural networks, which include convolutional neural networks(CNNs), have been widely applied to acoustic scene classification(ASC). Motivated by the fact that some simplified CNNs have shown improve...Recently, deep neural networks, which include convolutional neural networks(CNNs), have been widely applied to acoustic scene classification(ASC). Motivated by the fact that some simplified CNNs have shown improvements over deep CNNs, such as Visual Geometry Group Net(VGG-Net), we have figured out how to simplify the VGG-Net style architecture to a shallow CNN with improved performance. Max pooling and batch normalization are also applied for better accuracy. With a series of controlled tests on detection and classification of acoustic scenes and events(DCASE) 2016 data sets, our shallow CNN achieves 6.7% improvement, and reduces time complexity to 5%, compared with the VGG-Net style CNN.展开更多
The scene classification plays an essential role in processing very high resolution(VHR)images for understanding.The scene classification in remote sensing faces two difficulties:the mismatching features caused by the...The scene classification plays an essential role in processing very high resolution(VHR)images for understanding.The scene classification in remote sensing faces two difficulties:the mismatching features caused by the model overfitting problem and the semantic information losing problem.The multi-task method helps solve the problems by using the share weights of multiply tasks.We propose a feature boosting method with a multi-task framework that combines the scene classification task and the semantic segmentation task to overcome the difficulties.Different from the traditional multi-task learning method,the two tasks are coupled together via a weakly supervised learning method so that it does not require the labelled semantic segmentation samples.First,we proposed a weakly supervised segmentation method to create the interconnection of the segmentation task and the classification task.And we achieve a coarse segmentation result which is highly correlated to the classification by the weakly supervised method.Second,according to the surface distribution of remote sensing,we propose a sparse surface constraint to obtain fine segmentation results.Fine features are obtained by constraining the shared weights of the weakly supervised segmentation method.Last,we classify the scenes using the fine features and conduct experiments on the public remote sensing scene classification datasets.Experimental results demonstrate that the proposed coupled multi-task model outperforms the stateof-the-art methods on remote sensing scene classification.展开更多
Acoustic scene classification(ASC)is a method of recognizing and classifying environments that employ acoustic signals.Various ASC approaches based on deep learning have been developed,with convolutional neural networ...Acoustic scene classification(ASC)is a method of recognizing and classifying environments that employ acoustic signals.Various ASC approaches based on deep learning have been developed,with convolutional neural networks(CNNs)proving to be the most reliable and commonly utilized in ASC systems due to their suitability for constructing lightweight models.When using ASC systems in the real world,model complexity and device robustness are essential considerations.In this paper,we propose a two-pass mobile network for low-complexity classification of the acoustic scene,named TP-MobNet.With inverse residuals and linear bottlenecks,TPMobNet is based on MobileNetV2,and following mobile blocks,coordinate attention and two-pass fusion approaches are utilized.The log-range dependencies and precise position information in feature maps can be trained via coordinate attention.By capturing more diverse feature resolutions at the network’s end sides,two-pass fusions can also train generalization.Also,the model size is reduced by applying weight quantization to the trained model.By adding weight quantization to the trained model,the model size is also lowered.The TAU Urban Acoustic Scenes 2020 Mobile development set was used for all of the experiments.It has been confirmed that the proposed model,with a model size of 219.6 kB,achieves an accuracy of 73.94%.展开更多
Over the past decade,the significant growth of the convolutional neural network(CNN)based on deep learning(DL)approaches has greatly improved the machine learning(ML)algorithm’s performance on the semantic scene clas...Over the past decade,the significant growth of the convolutional neural network(CNN)based on deep learning(DL)approaches has greatly improved the machine learning(ML)algorithm’s performance on the semantic scene classification(SSC)of remote sensing images(RSI).However,the unbalanced attention to classification accuracy and efficiency has made the superiority of DL-based algorithms,e.g.,automation and simplicity,partially lost.Traditional ML strategies(e.g.,the handcrafted features or indicators)and accuracy-aimed strategies with a high trade-off(e.g.,the multi-stage CNNs and ensemble of multi-CNNs)are widely used without any training efficiency optimization involved,which may result in suboptimal performance.To address this problem,we propose a fast and simple training CNN framework(named FST-EfficientNet)for RSI-SSC based on an EfficientNetversion2 small(EfficientNetV2-S)CNN model.The whole algorithm flow is completely one-stage and end-to-end without any handcrafted features or discriminators introduced.In the implementation of training efficiency optimization,only several routine data augmentation tricks coupled with a fixed ratio of resolution or a gradually increasing resolution strategy are employed,so that the algorithm’s trade-off is very cheap.The performance evaluation shows that our FST-EfficientNet achieves new state-of-the-art(SOTA)records in the overall accuracy(OA)with about 0.8%to 2.7%ahead of all earlier methods on the Aerial Image Dataset(AID)and Northwestern Poly-technical University Remote Sensing Image Scene Classification 45 Dataset(NWPU-RESISC45D).Meanwhile,the results also demonstrate the importance and indispensability of training efficiency optimization strategies for RSI-SSC by DL.In fact,it is not necessary to gain better classification accuracy by completely relying on an excessive trade-off without efficiency.Ultimately,these findings are expected to contribute to the development of more efficient CNN-based approaches in RSI-SSC.展开更多
Purpose–The purpose of this paper is to build a classification system which mimics the perceptual ability of human vision,in gathering knowledge about the structure,content and the surrounding environment of a real-w...Purpose–The purpose of this paper is to build a classification system which mimics the perceptual ability of human vision,in gathering knowledge about the structure,content and the surrounding environment of a real-world natural scene,at a quick glance accurately.This paper proposes a set of novel features to determine the gist of a given scene based on dominant color,dominant direction,openness and roughness features.Design/methodology/approach–The classification system is designed at two different levels.At the first level,a set of low level features are extracted for each semantic feature.At the second level the extracted features are subjected to the process of feature evaluation,based on inter-class and intra-class distances.The most discriminating features are retained and used for training the support vector machine(SVM)classifier for two different data sets.Findings–Accuracy of the proposed system has been evaluated on two data sets:the well-known Oliva-Torralba data set and the customized image data set comprising of high-resolution images of natural landscapes.The experimentation on these two data sets with the proposed novel feature set and SVM classifier has provided 92.68 percent average classification accuracy,using ten-fold cross validation approach.The set of proposed features efficiently represent visual information and are therefore capable of narrowing the semantic gap between low-level image representation and high-level human perception.Originality/value–The method presented in this paper represents a new approach for extracting low-level features of reduced dimensionality that is able to model human perception for the task of scene classification.The methods of mapping primitive features to high-level features are intuitive to the user and are capable of reducing the semantic gap.The proposed feature evaluation technique is general and can be applied across any domain.展开更多
Aiming at the convergence between Earth observation(EO)Big Data and Artificial General Intelligence(AGI),this two-part paper identifies an innovative,but realistic EO optical sensory imagederived semantics-enriched An...Aiming at the convergence between Earth observation(EO)Big Data and Artificial General Intelligence(AGI),this two-part paper identifies an innovative,but realistic EO optical sensory imagederived semantics-enriched Analysis Ready Data(ARD)productpair and process gold standard as linchpin for success of a new notion of Space Economy 4.0.To be implemented in operational mode at the space segment and/or midstream segment by both public and private EO big data providers,it is regarded as necessarybut-not-sufficient“horizontal”(enabling)precondition for:(I)Transforming existing EO big raster-based data cubes at the midstream segment,typically affected by the so-called data-rich information-poor syndrome,into a new generation of semanticsenabled EO big raster-based numerical data and vector-based categorical(symbolic,semi-symbolic or subsymbolic)information cube management systems,eligible for semantic content-based image retrieval and semantics-enabled information/knowledge discovery.(II)Boosting the downstream segment in the development of an ever-increasing ensemble of“vertical”(deep and narrow,user-specific and domain-dependent)value–adding information products and services,suitable for a potentially huge worldwide market of institutional and private end-users of space technology.For the sake of readability,this paper consists of two parts.In the present Part 1,first,background notions in the remote sensing metascience domain are critically revised for harmonization across the multidisciplinary domain of cognitive science.In short,keyword“information”is disambiguated into the two complementary notions of quantitative/unequivocal information-as-thing and qualitative/equivocal/inherently ill-posed information-as-data-interpretation.Moreover,buzzword“artificial intelligence”is disambiguated into the two better-constrained notions of Artificial Narrow Intelligence as part-without-inheritance-of AGI.Second,based on a betterdefined and better-understood vocabulary of multidisciplinary terms,existing EO optical sensory image-derived Level 2/ARD products and processes are investigated at the Marr five levels of understanding of an information processing system.To overcome their drawbacks,an innovative,but realistic EO optical sensory image-derived semantics-enriched ARD product-pair and process gold standard is proposed in the subsequent Part 2.展开更多
Aiming at the convergence between Earth observation(EO)Big Data and Artificial General Intelligence(AGI),this paper consists of two parts.In the previous Part 1,existing EO optical sensory imagederived Level 2/Analysi...Aiming at the convergence between Earth observation(EO)Big Data and Artificial General Intelligence(AGI),this paper consists of two parts.In the previous Part 1,existing EO optical sensory imagederived Level 2/Analysis Ready Data(ARD)products and processes are critically compared,to overcome their lack of harmonization/standardization/interoperability and suitability in a new notion of Space Economy 4.0.In the present Part 2,original contributions comprise,at the Marr five levels of system understanding:(1)an innovative,but realistic EO optical sensory image-derived semantics-enriched ARD co-product pair requirements specification.First,in the pursuit of third-level semantic/ontological interoperability,a novel ARD symbolic(categorical and semantic)co-product,known as Scene Classification Map(SCM),adopts an augmented Cloud versus Not-Cloud taxonomy,whose Not-Cloud class legend complies with the standard fully-nested Land Cover Classification System’s Dichotomous Phase taxonomy proposed by the United Nations Food and Agriculture Organization.Second,a novel ARD subsymbolic numerical co-product,specifically,a panchromatic or multispectral EO image whose dimensionless digital numbers are radiometrically calibrated into a physical unit of radiometric measure,ranging from top-of-atmosphere reflectance to surface reflectance and surface albedo values,in a five-stage radiometric correction sequence.(2)An original ARD process requirements specification.(3)An innovative ARD processing system design(architecture),where stepwise SCM generation and stepwise SCM-conditional EO optical image radiometric correction are alternated in sequence.(4)An original modular hierarchical hybrid(combined deductive and inductive)computer vision subsystem design,provided with feedback loops,where software solutions at the Marr two shallowest levels of system understanding,specifically,algorithm and implementation,are selected from the scientific literature,to benefit from their technology readiness level as proof of feasibility,required in addition to proven suitability.To be implemented in operational mode at the space segment and/or midstream segment by both public and private EO big data providers,the proposed EO optical sensory image-derived semantics-enriched ARD product-pair and process reference standard is highlighted as linchpin for success of a new notion of Space Economy 4.0.展开更多
基金supported by the German National BMBF IKT2020-Grant(16SV7213)(EmotAsS)the European-Unions Horizon 2020 Research and Innovation Programme(688835)(DE-ENIGMA)the China Scholarship Council(CSC)
文摘Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of time-frequency information. In this study, we present an approach for exploring the benefits of deep scalogram representations, extracted in segments from an audio stream. The approach presented firstly transforms the segmented acoustic scenes into bump and morse scalograms, as well as spectrograms; secondly, the spectrograms or scalograms are sent into pre-trained convolutional neural networks; thirdly,the features extracted from a subsequent fully connected layer are fed into(bidirectional) gated recurrent neural networks, which are followed by a single highway layer and a softmax layer;finally, predictions from these three systems are fused by a margin sampling value strategy. We then evaluate the proposed approach using the acoustic scene classification data set of 2017 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events(DCASE). On the evaluation set, an accuracy of 64.0 % from bidirectional gated recurrent neural networks is obtained when fusing the spectrogram and the bump scalogram, which is an improvement on the 61.0 % baseline result provided by the DCASE 2017 organisers. This result shows that extracted bump scalograms are capable of improving the classification accuracy,when fusing with a spectrogram-based system.
基金The authors would like to thank the Taif University for funding this work through Taif University Research Supporting,Project Number.(TURSP-2020/277),Taif University,Taif,Saudi Arabia.
文摘Latest advancements in the integration of camera sensors paves a way for newUnmannedAerialVehicles(UAVs)applications such as analyzing geographical(spatial)variations of earth science in mitigating harmful environmental impacts and climate change.UAVs have achieved significant attention as a remote sensing environment,which captures high-resolution images from different scenes such as land,forest fire,flooding threats,road collision,landslides,and so on to enhance data analysis and decision making.Dynamic scene classification has attracted much attention in the examination of earth data captured by UAVs.This paper proposes a new multi-modal fusion based earth data classification(MMF-EDC)model.The MMF-EDC technique aims to identify the patterns that exist in the earth data and classifies them into appropriate class labels.The MMF-EDC technique involves a fusion of histogram of gradients(HOG),local binary patterns(LBP),and residual network(ResNet)models.This fusion process integrates many feature vectors and an entropy based fusion process is carried out to enhance the classification performance.In addition,the quantum artificial flora optimization(QAFO)algorithm is applied as a hyperparameter optimization technique.The AFO algorithm is inspired by the reproduction and the migration of flora helps to decide the optimal parameters of the ResNet model namely learning rate,number of hidden layers,and their number of neurons.Besides,Variational Autoencoder(VAE)based classification model is applied to assign appropriate class labels for a useful set of feature vectors.The proposedMMF-EDCmodel has been tested using UCM and WHU-RS datasets.The proposed MMFEDC model attains exhibits promising classification results on the applied remote sensing images with the accuracy of 0.989 and 0.994 on the test UCM and WHU-RS dataset respectively.
基金supported by the National Key R&D Program of China 2018YFB1003205by the National Natural Science Foundation of China U1836208,U1536206,U1836110,61972207+2 种基金by the Engineering Research Center of Digital Forensics,Ministry of Educationby the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)fundby the Collaborative Innovation Center of Atmospheric Environment and Equipment Technology(CICAEET)fund,China。
文摘With the rapid development of computer technology,millions of images are produced everyday by different sources.How to efficiently process these images and accurately discern the scene in them becomes an important but tough task.In this paper,we propose a novel supervised learning framework based on proposed adaptive binary coding for scene classification.Specifically,we first extract some high-level features of images under consideration based on available models trained on public datasets.Then,we further design a binary encoding method called one-hot encoding to make the feature representation more efficient.Benefiting from the proposed adaptive binary coding,our method is free of time to train or fine-tune the deep network and can effectively handle different applications.Experimental results on three public datasets,i.e.,UIUC sports event dataset,MIT Indoor dataset,and UC Merced dataset in terms of three different classifiers,demonstrate that our method is superior to the state-of-the-art methods with large margins.
文摘The process of human natural scene categorization consists of two correlated stages: visual perception and visual cognition of natural scenes.Inspired by this fact,we propose a biologically plausible approach for natural scene image classification.This approach consists of one visual perception model and two visual cognition models.The visual perception model,composed of two steps,is used to extract discriminative features from natural scene images.In the first step,we mimic the oriented and bandpass properties of human primary visual cortex by a special complex wavelets transform,which can decompose a natural scene image into a series of 2D spatial structure signals.In the second step,a hybrid statistical feature extraction method is used to generate gist features from those 2D spatial structure signals.Then we design a cognitive feedback model to realize adaptive optimization for the visual perception model.At last,we build a multiple semantics based cognition model to imitate human cognitive mode in rapid natural scene categorization.Experiments on natural scene datasets show that the proposed method achieves high efficiency and accuracy for natural scene classification.
基金supported in part by the National Natural Science Foundation of China(No.12302252)。
文摘Deep learning significantly improves the accuracy of remote sensing image scene classification,benefiting from the large-scale datasets.However,annotating the remote sensing images is time-consuming and even tough for experts.Deep neural networks trained using a few labeled samples usually generalize less to new unseen images.In this paper,we propose a semi-supervised approach for remote sensing image scene classification based on the prototype-based consistency,by exploring massive unlabeled images.To this end,we,first,propose a feature enhancement module to extract discriminative features.This is achieved by focusing the model on the foreground areas.Then,the prototype-based classifier is introduced to the framework,which is used to acquire consistent feature representations.We conduct a series of experiments on NWPU-RESISC45 and Aerial Image Dataset(AID).Our method improves the State-Of-The-Art(SOTA)method on NWPU-RESISC45 from 92.03%to 93.08%and on AID from 94.25%to 95.24%in terms of accuracy.
基金funded by the open fund of the Key Laboratory of Jianghuai Arable Land Resources Protection and Eco-restoration(Ministry of Natural Resources)(No.2022-ARPE-KF04)the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation(Ministry of Natural Resources)(No.KF-2020-05-084).
文摘Remote sensing image scene classification and remote sensing technology applications are hot research topics.Although CNN-based models have reached high average accuracy,some classes are still misclassified,such as“freeway,”“spare residential,”and“commercial_area.”These classes contain typical decisive features,spatial-relation features,and mixed decisive and spatial-relation features,which limit high-quality image scene classification.To address this issue,this paper proposes a Grad-CAM and capsule network hybrid method for image scene classification.The Grad-CAM and capsule network structures have the potential to recognize decisive features and spatial-relation features,respectively.By using a pre-trained model,hybrid structure,and structure adjustment,the proposed model can recognize both decisive and spatial-relation features.A group of experiments is designed on three popular data sets with increasing classification difficulties.In the most advanced experiment,92.67%average accuracy is achieved.Specifically,83%,75%,and 86%accuracies are obtained in the classes of“church,”“palace,”and“commercial_area,”respectively.This research demonstrates that the hybrid structure can effectively improve performance by considering both decisive and spatial-relation features.Therefore,Grad-CAM-CapsNet is a promising and powerful structure for image scene classification.
基金Supported by the National Natural Science Foundation of China(61102127,61231015)National High Technology Research and Development Program of China(863 Program,2015AA016306)+3 种基金National Key Research and Development Program(2016YFB0502204)the Innovation Fund of Shanghai Aerospace Science and Technology(SAST,2015014)the Key Technology R&D Program of Hubei Provence(2014BAA153)SKLSE-2015-A-06
文摘Recently, deep neural networks, which include convolutional neural networks(CNNs), have been widely applied to acoustic scene classification(ASC). Motivated by the fact that some simplified CNNs have shown improvements over deep CNNs, such as Visual Geometry Group Net(VGG-Net), we have figured out how to simplify the VGG-Net style architecture to a shallow CNN with improved performance. Max pooling and batch normalization are also applied for better accuracy. With a series of controlled tests on detection and classification of acoustic scenes and events(DCASE) 2016 data sets, our shallow CNN achieves 6.7% improvement, and reduces time complexity to 5%, compared with the VGG-Net style CNN.
基金supported in part by the National Natural Science Foundation of Key International Cooperation(Grant No.61720106002)the National Natural Science Foundation for Outstanding Scholars(Grant No.62025107)the National Natural Science Foundation of China(Grant No.61901141)。
文摘The scene classification plays an essential role in processing very high resolution(VHR)images for understanding.The scene classification in remote sensing faces two difficulties:the mismatching features caused by the model overfitting problem and the semantic information losing problem.The multi-task method helps solve the problems by using the share weights of multiply tasks.We propose a feature boosting method with a multi-task framework that combines the scene classification task and the semantic segmentation task to overcome the difficulties.Different from the traditional multi-task learning method,the two tasks are coupled together via a weakly supervised learning method so that it does not require the labelled semantic segmentation samples.First,we proposed a weakly supervised segmentation method to create the interconnection of the segmentation task and the classification task.And we achieve a coarse segmentation result which is highly correlated to the classification by the weakly supervised method.Second,according to the surface distribution of remote sensing,we propose a sparse surface constraint to obtain fine segmentation results.Fine features are obtained by constraining the shared weights of the weakly supervised segmentation method.Last,we classify the scenes using the fine features and conduct experiments on the public remote sensing scene classification datasets.Experimental results demonstrate that the proposed coupled multi-task model outperforms the stateof-the-art methods on remote sensing scene classification.
基金This work was supported by Institute of Information&communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)[No.2021-0-0268,Artificial Intelligence Innovation Hub(Artificial Intelligence Institute,Seoul National University)]。
文摘Acoustic scene classification(ASC)is a method of recognizing and classifying environments that employ acoustic signals.Various ASC approaches based on deep learning have been developed,with convolutional neural networks(CNNs)proving to be the most reliable and commonly utilized in ASC systems due to their suitability for constructing lightweight models.When using ASC systems in the real world,model complexity and device robustness are essential considerations.In this paper,we propose a two-pass mobile network for low-complexity classification of the acoustic scene,named TP-MobNet.With inverse residuals and linear bottlenecks,TPMobNet is based on MobileNetV2,and following mobile blocks,coordinate attention and two-pass fusion approaches are utilized.The log-range dependencies and precise position information in feature maps can be trained via coordinate attention.By capturing more diverse feature resolutions at the network’s end sides,two-pass fusions can also train generalization.Also,the model size is reduced by applying weight quantization to the trained model.By adding weight quantization to the trained model,the model size is also lowered.The TAU Urban Acoustic Scenes 2020 Mobile development set was used for all of the experiments.It has been confirmed that the proposed model,with a model size of 219.6 kB,achieves an accuracy of 73.94%.
基金This research has been supported by Doctoral Research funding from Hunan University of Arts and Science,Grant Number E07016033.
文摘Over the past decade,the significant growth of the convolutional neural network(CNN)based on deep learning(DL)approaches has greatly improved the machine learning(ML)algorithm’s performance on the semantic scene classification(SSC)of remote sensing images(RSI).However,the unbalanced attention to classification accuracy and efficiency has made the superiority of DL-based algorithms,e.g.,automation and simplicity,partially lost.Traditional ML strategies(e.g.,the handcrafted features or indicators)and accuracy-aimed strategies with a high trade-off(e.g.,the multi-stage CNNs and ensemble of multi-CNNs)are widely used without any training efficiency optimization involved,which may result in suboptimal performance.To address this problem,we propose a fast and simple training CNN framework(named FST-EfficientNet)for RSI-SSC based on an EfficientNetversion2 small(EfficientNetV2-S)CNN model.The whole algorithm flow is completely one-stage and end-to-end without any handcrafted features or discriminators introduced.In the implementation of training efficiency optimization,only several routine data augmentation tricks coupled with a fixed ratio of resolution or a gradually increasing resolution strategy are employed,so that the algorithm’s trade-off is very cheap.The performance evaluation shows that our FST-EfficientNet achieves new state-of-the-art(SOTA)records in the overall accuracy(OA)with about 0.8%to 2.7%ahead of all earlier methods on the Aerial Image Dataset(AID)and Northwestern Poly-technical University Remote Sensing Image Scene Classification 45 Dataset(NWPU-RESISC45D).Meanwhile,the results also demonstrate the importance and indispensability of training efficiency optimization strategies for RSI-SSC by DL.In fact,it is not necessary to gain better classification accuracy by completely relying on an excessive trade-off without efficiency.Ultimately,these findings are expected to contribute to the development of more efficient CNN-based approaches in RSI-SSC.
文摘Purpose–The purpose of this paper is to build a classification system which mimics the perceptual ability of human vision,in gathering knowledge about the structure,content and the surrounding environment of a real-world natural scene,at a quick glance accurately.This paper proposes a set of novel features to determine the gist of a given scene based on dominant color,dominant direction,openness and roughness features.Design/methodology/approach–The classification system is designed at two different levels.At the first level,a set of low level features are extracted for each semantic feature.At the second level the extracted features are subjected to the process of feature evaluation,based on inter-class and intra-class distances.The most discriminating features are retained and used for training the support vector machine(SVM)classifier for two different data sets.Findings–Accuracy of the proposed system has been evaluated on two data sets:the well-known Oliva-Torralba data set and the customized image data set comprising of high-resolution images of natural landscapes.The experimentation on these two data sets with the proposed novel feature set and SVM classifier has provided 92.68 percent average classification accuracy,using ten-fold cross validation approach.The set of proposed features efficiently represent visual information and are therefore capable of narrowing the semantic gap between low-level image representation and high-level human perception.Originality/value–The method presented in this paper represents a new approach for extracting low-level features of reduced dimensionality that is able to model human perception for the task of scene classification.The methods of mapping primitive features to high-level features are intuitive to the user and are capable of reducing the semantic gap.The proposed feature evaluation technique is general and can be applied across any domain.
文摘Aiming at the convergence between Earth observation(EO)Big Data and Artificial General Intelligence(AGI),this two-part paper identifies an innovative,but realistic EO optical sensory imagederived semantics-enriched Analysis Ready Data(ARD)productpair and process gold standard as linchpin for success of a new notion of Space Economy 4.0.To be implemented in operational mode at the space segment and/or midstream segment by both public and private EO big data providers,it is regarded as necessarybut-not-sufficient“horizontal”(enabling)precondition for:(I)Transforming existing EO big raster-based data cubes at the midstream segment,typically affected by the so-called data-rich information-poor syndrome,into a new generation of semanticsenabled EO big raster-based numerical data and vector-based categorical(symbolic,semi-symbolic or subsymbolic)information cube management systems,eligible for semantic content-based image retrieval and semantics-enabled information/knowledge discovery.(II)Boosting the downstream segment in the development of an ever-increasing ensemble of“vertical”(deep and narrow,user-specific and domain-dependent)value–adding information products and services,suitable for a potentially huge worldwide market of institutional and private end-users of space technology.For the sake of readability,this paper consists of two parts.In the present Part 1,first,background notions in the remote sensing metascience domain are critically revised for harmonization across the multidisciplinary domain of cognitive science.In short,keyword“information”is disambiguated into the two complementary notions of quantitative/unequivocal information-as-thing and qualitative/equivocal/inherently ill-posed information-as-data-interpretation.Moreover,buzzword“artificial intelligence”is disambiguated into the two better-constrained notions of Artificial Narrow Intelligence as part-without-inheritance-of AGI.Second,based on a betterdefined and better-understood vocabulary of multidisciplinary terms,existing EO optical sensory image-derived Level 2/ARD products and processes are investigated at the Marr five levels of understanding of an information processing system.To overcome their drawbacks,an innovative,but realistic EO optical sensory image-derived semantics-enriched ARD product-pair and process gold standard is proposed in the subsequent Part 2.
基金ASAP 16 project call,project title:SemantiX-A cross-sensor semantic EO data cube to open and leverage essential climate variables with scientists and the public,Grant ID:878939ASAP 17 project call,project title:SIMS-Soil sealing identification and monitoring system,Grant ID:885365.
文摘Aiming at the convergence between Earth observation(EO)Big Data and Artificial General Intelligence(AGI),this paper consists of two parts.In the previous Part 1,existing EO optical sensory imagederived Level 2/Analysis Ready Data(ARD)products and processes are critically compared,to overcome their lack of harmonization/standardization/interoperability and suitability in a new notion of Space Economy 4.0.In the present Part 2,original contributions comprise,at the Marr five levels of system understanding:(1)an innovative,but realistic EO optical sensory image-derived semantics-enriched ARD co-product pair requirements specification.First,in the pursuit of third-level semantic/ontological interoperability,a novel ARD symbolic(categorical and semantic)co-product,known as Scene Classification Map(SCM),adopts an augmented Cloud versus Not-Cloud taxonomy,whose Not-Cloud class legend complies with the standard fully-nested Land Cover Classification System’s Dichotomous Phase taxonomy proposed by the United Nations Food and Agriculture Organization.Second,a novel ARD subsymbolic numerical co-product,specifically,a panchromatic or multispectral EO image whose dimensionless digital numbers are radiometrically calibrated into a physical unit of radiometric measure,ranging from top-of-atmosphere reflectance to surface reflectance and surface albedo values,in a five-stage radiometric correction sequence.(2)An original ARD process requirements specification.(3)An innovative ARD processing system design(architecture),where stepwise SCM generation and stepwise SCM-conditional EO optical image radiometric correction are alternated in sequence.(4)An original modular hierarchical hybrid(combined deductive and inductive)computer vision subsystem design,provided with feedback loops,where software solutions at the Marr two shallowest levels of system understanding,specifically,algorithm and implementation,are selected from the scientific literature,to benefit from their technology readiness level as proof of feasibility,required in addition to proven suitability.To be implemented in operational mode at the space segment and/or midstream segment by both public and private EO big data providers,the proposed EO optical sensory image-derived semantics-enriched ARD product-pair and process reference standard is highlighted as linchpin for success of a new notion of Space Economy 4.0.