Image caption generation is an essential task in computer vision and image understanding.Contemporary image caption generation models usually use the encoder-decoder model as the underlying network structure.However,i...Image caption generation is an essential task in computer vision and image understanding.Contemporary image caption generation models usually use the encoder-decoder model as the underlying network structure.However,in the traditional Encoder-Decoder architectures,only the global features of the images are extracted,while the local information of the images is not well utilized.This paper proposed an Encoder-Decoder model based on fused features and a novel mechanism for correcting the generated caption text.We use VGG16 and Faster R-CNN to extract global and local features in the encoder first.Then,we train the bidirectional LSTM network with the fused features in the decoder.Finally,the local features extracted is used to correct the caption text.The experiment results prove that the effectiveness of the proposed method.展开更多
Industrial Internet of Things(IoT)connecting society and industrial systems represents a tremendous and promising paradigm shift.With IoT,multimodal and heterogeneous data from industrial devices can be easily collect...Industrial Internet of Things(IoT)connecting society and industrial systems represents a tremendous and promising paradigm shift.With IoT,multimodal and heterogeneous data from industrial devices can be easily collected,and further analyzed to discover device maintenance and health related potential knowledge behind.IoT data-based fault diagnosis for industrial devices is very helpful to the sustainability and applicability of an IoT ecosystem.But how to efficiently use and fuse this multimodal heterogeneous data to realize intelligent fault diagnosis is still a challenge.In this paper,a novel Deep Multimodal Learning and Fusion(DMLF)based fault diagnosis method is proposed for addressing heterogeneous data from IoT environments where industrial devices coexist.First,a DMLF model is designed by combining a Convolution Neural Network(CNN)and Stacked Denoising Autoencoder(SDAE)together to capture more comprehensive fault knowledge and extract features from different modal data.Second,these multimodal features are seamlessly integrated at a fusion layer and the resulting fused features are further used to train a classifier for recognizing potential faults.Third,a two-stage training algorithm is proposed by combining supervised pre-training and fine-tuning to simplify the training process for deep structure models.A series of experiments are conducted over multimodal heterogeneous data from a gear device to verify our proposed fault diagnosis method.The experimental results show that our method outperforms the benchmarking ones in fault diagnosis accuracy.展开更多
Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid mo...Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid model of bidirectional encoder representation from transformers-hierarchical attention networks-dilated convolutions networks(BERT_HAN_DCN)which based on BERT pre-trained model with superior ability of extracting characteristic.The advantages of HAN model and DCN model are taken into account which can help gain abundant semantic information,fusing context semantic features and hierarchical characteristics.Secondly,the traditional softmax algorithm increases the learning difficulty of the same kind of samples,making it more difficult to distinguish similar features.Based on this,AM-softmax is introduced to replace the traditional softmax.Finally,the fused model is validated,which shows superior performance in the accuracy rate and F1-score of this hybrid model on two datasets and the experimental analysis shows the general single models such as HAN,DCN,based on BERT pre-trained model.Besides,the improved AM-softmax network model is superior to the general softmax network model.展开更多
Humankind is facing another deadliest pandemic of all times in history,caused by COVID-19.Apart from this challenging pandemic,World Health Organization(WHO)considers tuberculosis(TB)as a preeminent infectious disease...Humankind is facing another deadliest pandemic of all times in history,caused by COVID-19.Apart from this challenging pandemic,World Health Organization(WHO)considers tuberculosis(TB)as a preeminent infectious disease due to its high infection rate.Generally,both TB and COVID-19 severely affect the lungs,thus hardening the job of medical practitioners who can often misidentify these diseases in the current situation.Therefore,the time of need calls for an immediate and meticulous automatic diagnostic tool that can accurately discriminate both diseases.As one of the preliminary smart health systems that examine three clinical states(COVID-19,TB,and normal cases),this study proposes an amalgam of image filtering,data-augmentation technique,transfer learning-based approach,and advanced deep-learning classifiers to effectively segregate these diseases.It first employed a generative adversarial network(GAN)and Crimmins speckle removal filter on X-ray images to overcome the issue of limited data and noise.Each pre-processed image is then converted into red,green,and blue(RGB)and Commission Internationale de l’Elcairage(CIE)color spaces from which deep fused features are formed by extracting relevant features using DenseNet121 and ResNet50.Each feature extractor extracts 1000 most useful features which are then fused and finally fed to two variants of recurrent neural network(RNN)classifiers for precise discrimination of threeclinical states.Comparative analysis showed that the proposed Bi-directional long-short-term-memory(Bi-LSTM)model dominated the long-short-termmemory(LSTM)network by attaining an overall accuracy of 98.22%for the three-class classification task,whereas LSTM hardly achieved 94.22%accuracy on the test dataset.展开更多
Cloud detection in remote sensing images is a crucial task in various applications,such as meteorological disaster prediction and earth resource exploration,which require accurate cloud identi¯cation.This work pr...Cloud detection in remote sensing images is a crucial task in various applications,such as meteorological disaster prediction and earth resource exploration,which require accurate cloud identi¯cation.This work proposes a cloud detection model based on the Cloud Detection neural Network(CDNet),incorporating a fusion mechanism of channel and spatial attention.Depthwise separable convolution is adopted to achieve a lightweight network model and enhance the e±ciency of network training and detection.In addition,the Convolutional Block Attention Module(CBAM)is integrated into the network to train the cloud detection model with attention features in channel and spatial dimensions.Experiments were conducted on Landsat 8 imagery to validate the proposed improved CDNet.Averaged over all testing images,the overall accuracy(OA),mean Pixel Accuracy(mPA),Kappa coe±cient and Mean Intersection over Union(MIoU)of improved CDNet were 96.38%,81.18%,96.05%,and 84.69%,respectively.Those results were better than the original CDNet and DeeplabV3+.Experiment results show that the improved CDNet is e®ective and robust for cloud detection in remote sensing images.展开更多
基金This work is supported by the National Natural Science Foundation of China(6187223).
文摘Image caption generation is an essential task in computer vision and image understanding.Contemporary image caption generation models usually use the encoder-decoder model as the underlying network structure.However,in the traditional Encoder-Decoder architectures,only the global features of the images are extracted,while the local information of the images is not well utilized.This paper proposed an Encoder-Decoder model based on fused features and a novel mechanism for correcting the generated caption text.We use VGG16 and Faster R-CNN to extract global and local features in the encoder first.Then,we train the bidirectional LSTM network with the fused features in the decoder.Finally,the local features extracted is used to correct the caption text.The experiment results prove that the effectiveness of the proposed method.
基金supported in part by the National Key Research and Development Program of China(No.2018YFB1003700)in part by the National Natural Science Foundation of China(No.61836001)。
文摘Industrial Internet of Things(IoT)connecting society and industrial systems represents a tremendous and promising paradigm shift.With IoT,multimodal and heterogeneous data from industrial devices can be easily collected,and further analyzed to discover device maintenance and health related potential knowledge behind.IoT data-based fault diagnosis for industrial devices is very helpful to the sustainability and applicability of an IoT ecosystem.But how to efficiently use and fuse this multimodal heterogeneous data to realize intelligent fault diagnosis is still a challenge.In this paper,a novel Deep Multimodal Learning and Fusion(DMLF)based fault diagnosis method is proposed for addressing heterogeneous data from IoT environments where industrial devices coexist.First,a DMLF model is designed by combining a Convolution Neural Network(CNN)and Stacked Denoising Autoencoder(SDAE)together to capture more comprehensive fault knowledge and extract features from different modal data.Second,these multimodal features are seamlessly integrated at a fusion layer and the resulting fused features are further used to train a classifier for recognizing potential faults.Third,a two-stage training algorithm is proposed by combining supervised pre-training and fine-tuning to simplify the training process for deep structure models.A series of experiments are conducted over multimodal heterogeneous data from a gear device to verify our proposed fault diagnosis method.The experimental results show that our method outperforms the benchmarking ones in fault diagnosis accuracy.
基金Fundamental Research Funds for the Central University,China(No.2232018D3-17)。
文摘Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid model of bidirectional encoder representation from transformers-hierarchical attention networks-dilated convolutions networks(BERT_HAN_DCN)which based on BERT pre-trained model with superior ability of extracting characteristic.The advantages of HAN model and DCN model are taken into account which can help gain abundant semantic information,fusing context semantic features and hierarchical characteristics.Secondly,the traditional softmax algorithm increases the learning difficulty of the same kind of samples,making it more difficult to distinguish similar features.Based on this,AM-softmax is introduced to replace the traditional softmax.Finally,the fused model is validated,which shows superior performance in the accuracy rate and F1-score of this hybrid model on two datasets and the experimental analysis shows the general single models such as HAN,DCN,based on BERT pre-trained model.Besides,the improved AM-softmax network model is superior to the general softmax network model.
文摘Humankind is facing another deadliest pandemic of all times in history,caused by COVID-19.Apart from this challenging pandemic,World Health Organization(WHO)considers tuberculosis(TB)as a preeminent infectious disease due to its high infection rate.Generally,both TB and COVID-19 severely affect the lungs,thus hardening the job of medical practitioners who can often misidentify these diseases in the current situation.Therefore,the time of need calls for an immediate and meticulous automatic diagnostic tool that can accurately discriminate both diseases.As one of the preliminary smart health systems that examine three clinical states(COVID-19,TB,and normal cases),this study proposes an amalgam of image filtering,data-augmentation technique,transfer learning-based approach,and advanced deep-learning classifiers to effectively segregate these diseases.It first employed a generative adversarial network(GAN)and Crimmins speckle removal filter on X-ray images to overcome the issue of limited data and noise.Each pre-processed image is then converted into red,green,and blue(RGB)and Commission Internationale de l’Elcairage(CIE)color spaces from which deep fused features are formed by extracting relevant features using DenseNet121 and ResNet50.Each feature extractor extracts 1000 most useful features which are then fused and finally fed to two variants of recurrent neural network(RNN)classifiers for precise discrimination of threeclinical states.Comparative analysis showed that the proposed Bi-directional long-short-term-memory(Bi-LSTM)model dominated the long-short-termmemory(LSTM)network by attaining an overall accuracy of 98.22%for the three-class classification task,whereas LSTM hardly achieved 94.22%accuracy on the test dataset.
基金supported by the National Natural Science Foundation of China (61973164,62373192).
文摘Cloud detection in remote sensing images is a crucial task in various applications,such as meteorological disaster prediction and earth resource exploration,which require accurate cloud identi¯cation.This work proposes a cloud detection model based on the Cloud Detection neural Network(CDNet),incorporating a fusion mechanism of channel and spatial attention.Depthwise separable convolution is adopted to achieve a lightweight network model and enhance the e±ciency of network training and detection.In addition,the Convolutional Block Attention Module(CBAM)is integrated into the network to train the cloud detection model with attention features in channel and spatial dimensions.Experiments were conducted on Landsat 8 imagery to validate the proposed improved CDNet.Averaged over all testing images,the overall accuracy(OA),mean Pixel Accuracy(mPA),Kappa coe±cient and Mean Intersection over Union(MIoU)of improved CDNet were 96.38%,81.18%,96.05%,and 84.69%,respectively.Those results were better than the original CDNet and DeeplabV3+.Experiment results show that the improved CDNet is e®ective and robust for cloud detection in remote sensing images.