With the development of Deep Convolutional Neural Networks(DCNNs),the extracted features for image recognition tasks have shifted from low-level features to the high-level semantic features of DCNNs.Previous studies h...With the development of Deep Convolutional Neural Networks(DCNNs),the extracted features for image recognition tasks have shifted from low-level features to the high-level semantic features of DCNNs.Previous studies have shown that the deeper the network is,the more abstract the features are.However,the recognition ability of deep features would be limited by insufficient training samples.To address this problem,this paper derives an improved Deep Fusion Convolutional Neural Network(DF-Net)which can make full use of the differences and complementarities during network learning and enhance feature expression under the condition of limited datasets.Specifically,DF-Net organizes two identical subnets to extract features from the input image in parallel,and then a well-designed fusion module is introduced to the deep layer of DF-Net to fuse the subnet’s features in multi-scale.Thus,the more complex mappings are created and the more abundant and accurate fusion features can be extracted to improve recognition accuracy.Furthermore,a corresponding training strategy is also proposed to speed up the convergence and reduce the computation overhead of network training.Finally,DF-Nets based on the well-known ResNet,DenseNet and MobileNetV2 are evaluated on CIFAR100,Stanford Dogs,and UECFOOD-100.Theoretical analysis and experimental results strongly demonstrate that DF-Net enhances the performance of DCNNs and increases the accuracy of image recognition.展开更多
Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon...Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon seasons appears and continues,airlines operating in threatened areas and passengers having travel plans during this time period will pay close attention to the development of tropical storms.This paper proposes a deep multimodal fusion and multitasking trajectory prediction model that can improve the reliability of typhoon trajectory prediction and reduce the quantity of flight scheduling cancellation.The deep multimodal fusion module is formed by deep fusion of the feature output by multiple submodal fusion modules,and the multitask generation module uses longitude and latitude as two related tasks for simultaneous prediction.With more dependable data accuracy,problems can be analysed rapidly and more efficiently,enabling better decision-making with a proactive versus reactive posture.When multiple modalities coexist,features can be extracted from them simultaneously to supplement each other’s information.An actual case study,the typhoon Lichma that swept China in 2019,has demonstrated that the algorithm can effectively reduce the number of unnecessary flight cancellations compared to existing flight scheduling and assist the new generation of flight scheduling systems under extreme weather.展开更多
For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior fe...For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance.展开更多
Personality distinguishes individuals’ patterns of feeling, thinking,and behaving. Predicting personality from small video series is an excitingresearch area in computer vision. The majority of the existing research ...Personality distinguishes individuals’ patterns of feeling, thinking,and behaving. Predicting personality from small video series is an excitingresearch area in computer vision. The majority of the existing research concludespreliminary results to get immense knowledge from visual and Audio(sound) modality. To overcome the deficiency, we proposed the Deep BimodalFusion (DBF) approach to predict five traits of personality-agreeableness,extraversion, openness, conscientiousness and neuroticism. In the proposedframework, regarding visual modality, the modified convolution neural networks(CNN), more specifically Descriptor Aggregator Model (DAN) areused to attain significant visual modality. The proposed model extracts audiorepresentations for greater efficiency to construct the long short-termmemory(LSTM) for the audio modality. Moreover, employing modality-based neuralnetworks allows this framework to independently determine the traits beforecombining them with weighted fusion to achieve a conclusive prediction of thegiven traits. The proposed approach attains the optimal mean accuracy score,which is 0.9183. It is achieved based on the average of five personality traitsand is thus better than previously proposed frameworks.展开更多
To address the difficulties in fusing multi-mode sensor data for complex industrial machinery, an adaptive deep coupling convolutional auto-encoder (ADCCAE) fusion method was proposed. First, the multi-mode features e...To address the difficulties in fusing multi-mode sensor data for complex industrial machinery, an adaptive deep coupling convolutional auto-encoder (ADCCAE) fusion method was proposed. First, the multi-mode features extracted synchronously by the CCAE were stacked and fed to the multi-channel convolution layers for fusion. Then, the fused data was passed to all connection layers for compression and fed to the Softmax module for classification. Finally, the coupling loss function coefficients and the network parameters were optimized through an adaptive approach using the gray wolf optimization (GWO) algorithm. Experimental comparisons showed that the proposed ADCCAE fusion model was superior to existing models for multi-mode data fusion.展开更多
Based on the CNN-LSTM fusion deep neural network,this paper proposes a seismic velocity model building method that can simultaneously estimate the root mean square(RMS)velocity and interval velocity from the common-mi...Based on the CNN-LSTM fusion deep neural network,this paper proposes a seismic velocity model building method that can simultaneously estimate the root mean square(RMS)velocity and interval velocity from the common-midpoint(CMP)gather.In the proposed method,a convolutional neural network(CNN)Encoder and two long short-term memory networks(LSTMs)are used to extract spatial and temporal features from seismic signals,respectively,and a CNN Decoder is used to recover RMS velocity and interval velocity of underground media from various feature vectors.To address the problems of unstable gradients and easily fall into a local minimum in the deep neural network training process,we propose to use Kaiming normal initialization with zero negative slopes of rectifi ed units and to adjust the network learning process by optimizing the mean square error(MSE)loss function with the introduction of a freezing factor.The experiments on testing dataset show that CNN-LSTM fusion deep neural network can predict RMS velocity as well as interval velocity more accurately,and its inversion accuracy is superior to that of single neural network models.The predictions on the complex structures and Marmousi model are consistent with the true velocity variation trends,and the predictions on fi eld data can eff ectively correct the phase axis,improve the lateral continuity of phase axis and quality of stack section,indicating the eff ectiveness and decent generalization capability of the proposed method.展开更多
为解决在光线昏暗、声音与视觉噪声干扰等复杂条件下,单模态鱼类行为识别准确率和召回率低的问题,提出了基于声音和视觉特征多级融合的鱼类行为识别模型U-FusionNet-ResNet50+SENet,该方法采用ResNet50模型提取视觉模态特征,通过MFCC+Re...为解决在光线昏暗、声音与视觉噪声干扰等复杂条件下,单模态鱼类行为识别准确率和召回率低的问题,提出了基于声音和视觉特征多级融合的鱼类行为识别模型U-FusionNet-ResNet50+SENet,该方法采用ResNet50模型提取视觉模态特征,通过MFCC+RestNet50模型提取声音模态特征,并在此基础上设计一种U型融合架构,使不同维度的鱼类视觉和声音特征充分交互,在特征提取的各阶段实现特征融合,最后引入SENet构成关注通道信息特征融合网络,并通过对比试验,采用多模态鱼类行为的合成加噪试验数据验证算法的有效性。结果表明:U-FusionNet-ResNet50+SENet对鱼类行为识别准确率达到93.71%,F1值达到93.43%,召回率达到92.56%,与效果较好的已有模型Intermediate-feature-level deep model相比,召回率、F1值和准确率分别提升了2.35%、3.45%和3.48%。研究表明,所提出的U-FusionNet-ResNet50+SENet识别方法,可有效解决单模态鱼类行为识别准确率低的问题,提升了鱼类行为识别的整体效果,可以有效识别复杂条件下鱼类的游泳、摄食等行为,为真实生产条件下的鱼类行为识别研究提供了新思路和新方法。展开更多
1 Introduction The Paleogene strata(with a depth of more than 2500m)in the Bohai sea is complex(Xu Changgui,2006),the reservoir buried deeply,the reservoir prediction is difficult(LAI Weicheng,XU Changgui,2012),and more
Combining both visible and infrared object information, multispectral data is a promising source data for automatic maritime ship recognition. In this paper, in order to take advantage of deep convolutional neural net...Combining both visible and infrared object information, multispectral data is a promising source data for automatic maritime ship recognition. In this paper, in order to take advantage of deep convolutional neural network and multispectral data, we model multispectral ship recognition task into a convolutional feature fusion problem, and propose a feature fusion architecture called Hybrid Fusion. We fine-tune the VGG-16 model pre-trained on ImageNet through three channels single spectral image and four channels multispectral images, and use existing regularization techniques to avoid over-fitting problem. Hybrid Fusion as well as the other three feature fusion architectures is investigated. Each fusion architecture consists of visible image and infrared image feature extraction branches, in which the pre-trained and fine-tuned VGG-16 models are taken as feature extractor. In each fusion architecture, image features of two branches are firstly extracted from the same layer or different layers of VGG-16 model. Subsequently, the features extracted from the two branches are flattened and concatenated to produce a multispectral feature vector, which is finally fed into a classifier to achieve ship recognition task. Furthermore, based on these fusion architectures, we also evaluate recognition performance of a feature vector normalization method and three combinations of feature extractors. Experimental results on the visible and infrared ship (VAIS) dataset show that the best Hybrid Fusion achieves 89.6% mean per-class recognition accuracy on daytime paired images and 64.9% on nighttime infrared images, and outperforms the state-of-the-art method by 1.4% and 3.9%, respectively.展开更多
基金This work is partially supported by National Natural Foundation of China(Grant No.61772561)the Key Research&Development Plan of Hunan Province(Grant No.2018NK2012)+2 种基金the Degree&Postgraduate Education Reform Project of Hunan Province(Grant No.2019JGYB154)the Postgraduate Excellent teaching team Project of Hunan Province(Grant[2019]370-133)Teaching Reform Project of Central South University of Forestry and Technology(Grant No.20180682).
文摘With the development of Deep Convolutional Neural Networks(DCNNs),the extracted features for image recognition tasks have shifted from low-level features to the high-level semantic features of DCNNs.Previous studies have shown that the deeper the network is,the more abstract the features are.However,the recognition ability of deep features would be limited by insufficient training samples.To address this problem,this paper derives an improved Deep Fusion Convolutional Neural Network(DF-Net)which can make full use of the differences and complementarities during network learning and enhance feature expression under the condition of limited datasets.Specifically,DF-Net organizes two identical subnets to extract features from the input image in parallel,and then a well-designed fusion module is introduced to the deep layer of DF-Net to fuse the subnet’s features in multi-scale.Thus,the more complex mappings are created and the more abundant and accurate fusion features can be extracted to improve recognition accuracy.Furthermore,a corresponding training strategy is also proposed to speed up the convergence and reduce the computation overhead of network training.Finally,DF-Nets based on the well-known ResNet,DenseNet and MobileNetV2 are evaluated on CIFAR100,Stanford Dogs,and UECFOOD-100.Theoretical analysis and experimental results strongly demonstrate that DF-Net enhances the performance of DCNNs and increases the accuracy of image recognition.
基金supported by the National Natural Science Foundation of China(62073330)。
文摘Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon seasons appears and continues,airlines operating in threatened areas and passengers having travel plans during this time period will pay close attention to the development of tropical storms.This paper proposes a deep multimodal fusion and multitasking trajectory prediction model that can improve the reliability of typhoon trajectory prediction and reduce the quantity of flight scheduling cancellation.The deep multimodal fusion module is formed by deep fusion of the feature output by multiple submodal fusion modules,and the multitask generation module uses longitude and latitude as two related tasks for simultaneous prediction.With more dependable data accuracy,problems can be analysed rapidly and more efficiently,enabling better decision-making with a proactive versus reactive posture.When multiple modalities coexist,features can be extracted from them simultaneously to supplement each other’s information.An actual case study,the typhoon Lichma that swept China in 2019,has demonstrated that the algorithm can effectively reduce the number of unnecessary flight cancellations compared to existing flight scheduling and assist the new generation of flight scheduling systems under extreme weather.
文摘For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance.
文摘Personality distinguishes individuals’ patterns of feeling, thinking,and behaving. Predicting personality from small video series is an excitingresearch area in computer vision. The majority of the existing research concludespreliminary results to get immense knowledge from visual and Audio(sound) modality. To overcome the deficiency, we proposed the Deep BimodalFusion (DBF) approach to predict five traits of personality-agreeableness,extraversion, openness, conscientiousness and neuroticism. In the proposedframework, regarding visual modality, the modified convolution neural networks(CNN), more specifically Descriptor Aggregator Model (DAN) areused to attain significant visual modality. The proposed model extracts audiorepresentations for greater efficiency to construct the long short-termmemory(LSTM) for the audio modality. Moreover, employing modality-based neuralnetworks allows this framework to independently determine the traits beforecombining them with weighted fusion to achieve a conclusive prediction of thegiven traits. The proposed approach attains the optimal mean accuracy score,which is 0.9183. It is achieved based on the average of five personality traitsand is thus better than previously proposed frameworks.
文摘To address the difficulties in fusing multi-mode sensor data for complex industrial machinery, an adaptive deep coupling convolutional auto-encoder (ADCCAE) fusion method was proposed. First, the multi-mode features extracted synchronously by the CCAE were stacked and fed to the multi-channel convolution layers for fusion. Then, the fused data was passed to all connection layers for compression and fed to the Softmax module for classification. Finally, the coupling loss function coefficients and the network parameters were optimized through an adaptive approach using the gray wolf optimization (GWO) algorithm. Experimental comparisons showed that the proposed ADCCAE fusion model was superior to existing models for multi-mode data fusion.
基金financially supported by the Key Project of National Natural Science Foundation of China (No. 41930431)the Project of National Natural Science Foundation of China (Nos. 41904121, 41804133, and 41974116)Joint Guidance Project of Natural Science Foundation of Heilongjiang Province (No. LH2020D006)
文摘Based on the CNN-LSTM fusion deep neural network,this paper proposes a seismic velocity model building method that can simultaneously estimate the root mean square(RMS)velocity and interval velocity from the common-midpoint(CMP)gather.In the proposed method,a convolutional neural network(CNN)Encoder and two long short-term memory networks(LSTMs)are used to extract spatial and temporal features from seismic signals,respectively,and a CNN Decoder is used to recover RMS velocity and interval velocity of underground media from various feature vectors.To address the problems of unstable gradients and easily fall into a local minimum in the deep neural network training process,we propose to use Kaiming normal initialization with zero negative slopes of rectifi ed units and to adjust the network learning process by optimizing the mean square error(MSE)loss function with the introduction of a freezing factor.The experiments on testing dataset show that CNN-LSTM fusion deep neural network can predict RMS velocity as well as interval velocity more accurately,and its inversion accuracy is superior to that of single neural network models.The predictions on the complex structures and Marmousi model are consistent with the true velocity variation trends,and the predictions on fi eld data can eff ectively correct the phase axis,improve the lateral continuity of phase axis and quality of stack section,indicating the eff ectiveness and decent generalization capability of the proposed method.
文摘为解决在光线昏暗、声音与视觉噪声干扰等复杂条件下,单模态鱼类行为识别准确率和召回率低的问题,提出了基于声音和视觉特征多级融合的鱼类行为识别模型U-FusionNet-ResNet50+SENet,该方法采用ResNet50模型提取视觉模态特征,通过MFCC+RestNet50模型提取声音模态特征,并在此基础上设计一种U型融合架构,使不同维度的鱼类视觉和声音特征充分交互,在特征提取的各阶段实现特征融合,最后引入SENet构成关注通道信息特征融合网络,并通过对比试验,采用多模态鱼类行为的合成加噪试验数据验证算法的有效性。结果表明:U-FusionNet-ResNet50+SENet对鱼类行为识别准确率达到93.71%,F1值达到93.43%,召回率达到92.56%,与效果较好的已有模型Intermediate-feature-level deep model相比,召回率、F1值和准确率分别提升了2.35%、3.45%和3.48%。研究表明,所提出的U-FusionNet-ResNet50+SENet识别方法,可有效解决单模态鱼类行为识别准确率低的问题,提升了鱼类行为识别的整体效果,可以有效识别复杂条件下鱼类的游泳、摄食等行为,为真实生产条件下的鱼类行为识别研究提供了新思路和新方法。
基金funded by Major Projects of National Science and Technology “Large Oil and Gas Fields and CBM development”(Grant No. 2016ZX05 027)
文摘1 Introduction The Paleogene strata(with a depth of more than 2500m)in the Bohai sea is complex(Xu Changgui,2006),the reservoir buried deeply,the reservoir prediction is difficult(LAI Weicheng,XU Changgui,2012),and more
文摘Combining both visible and infrared object information, multispectral data is a promising source data for automatic maritime ship recognition. In this paper, in order to take advantage of deep convolutional neural network and multispectral data, we model multispectral ship recognition task into a convolutional feature fusion problem, and propose a feature fusion architecture called Hybrid Fusion. We fine-tune the VGG-16 model pre-trained on ImageNet through three channels single spectral image and four channels multispectral images, and use existing regularization techniques to avoid over-fitting problem. Hybrid Fusion as well as the other three feature fusion architectures is investigated. Each fusion architecture consists of visible image and infrared image feature extraction branches, in which the pre-trained and fine-tuned VGG-16 models are taken as feature extractor. In each fusion architecture, image features of two branches are firstly extracted from the same layer or different layers of VGG-16 model. Subsequently, the features extracted from the two branches are flattened and concatenated to produce a multispectral feature vector, which is finally fed into a classifier to achieve ship recognition task. Furthermore, based on these fusion architectures, we also evaluate recognition performance of a feature vector normalization method and three combinations of feature extractors. Experimental results on the visible and infrared ship (VAIS) dataset show that the best Hybrid Fusion achieves 89.6% mean per-class recognition accuracy on daytime paired images and 64.9% on nighttime infrared images, and outperforms the state-of-the-art method by 1.4% and 3.9%, respectively.