The digital twin is the concept of transcending reality,which is the reverse feedback from the real physical space to the virtual digital space.People hold great prospects for this emerging technology.In order to real...The digital twin is the concept of transcending reality,which is the reverse feedback from the real physical space to the virtual digital space.People hold great prospects for this emerging technology.In order to realize the upgrading of the digital twin industrial chain,it is urgent to introduce more modalities,such as vision,haptics,hearing and smell,into the virtual digital space,which assists physical entities and virtual objects in creating a closer connection.Therefore,perceptual understanding and object recognition have become an urgent hot topic in the digital twin.Existing surface material classification schemes often achieve recognition through machine learning or deep learning in a single modality,ignoring the complementarity between multiple modalities.In order to overcome this dilemma,we propose a multimodal fusion network in our article that combines two modalities,visual and haptic,for surface material recognition.On the one hand,the network makes full use of the potential correlations between multiple modalities to deeply mine the modal semantics and complete the data mapping.On the other hand,the network is extensible and can be used as a universal architecture to include more modalities.Experiments show that the constructed multimodal fusion network can achieve 99.42%classification accuracy while reducing complexity.展开更多
With the rapid development of the mobile communication and the Internet,the previous web anomaly detectionand identificationmodels were built relying on security experts’empirical knowledge and attack features.Althou...With the rapid development of the mobile communication and the Internet,the previous web anomaly detectionand identificationmodels were built relying on security experts’empirical knowledge and attack features.Althoughthis approach can achieve higher detection performance,it requires huge human labor and resources to maintainthe feature library.In contrast,semantic feature engineering can dynamically discover new semantic featuresand optimize feature selection by automatically analyzing the semantic information contained in the data itself,thus reducing dependence on prior knowledge.However,current semantic features still have the problem ofsemantic expression singularity,as they are extracted from a single semantic mode such as word segmentation,character segmentation,or arbitrary semantic feature extraction.This paper extracts features of web requestsfrom dual semantic granularity,and proposes a semantic feature fusion method to solve the above problems.Themethod first preprocesses web requests,and extracts word-level and character-level semantic features of URLs viaconvolutional neural network(CNN),respectively.By constructing three loss functions to reduce losses betweenfeatures,labels and categories.Experiments on the HTTP CSIC 2010,Malicious URLs and HttpParams datasetsverify the proposedmethod.Results show that compared withmachine learning,deep learningmethods and BERTmodel,the proposed method has better detection performance.And it achieved the best detection rate of 99.16%in the dataset HttpParams.展开更多
Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon...Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon seasons appears and continues,airlines operating in threatened areas and passengers having travel plans during this time period will pay close attention to the development of tropical storms.This paper proposes a deep multimodal fusion and multitasking trajectory prediction model that can improve the reliability of typhoon trajectory prediction and reduce the quantity of flight scheduling cancellation.The deep multimodal fusion module is formed by deep fusion of the feature output by multiple submodal fusion modules,and the multitask generation module uses longitude and latitude as two related tasks for simultaneous prediction.With more dependable data accuracy,problems can be analysed rapidly and more efficiently,enabling better decision-making with a proactive versus reactive posture.When multiple modalities coexist,features can be extracted from them simultaneously to supplement each other’s information.An actual case study,the typhoon Lichma that swept China in 2019,has demonstrated that the algorithm can effectively reduce the number of unnecessary flight cancellations compared to existing flight scheduling and assist the new generation of flight scheduling systems under extreme weather.展开更多
Medical image fusion has been developed as an efficient assistive technology in various clinical applications such as medical diagnosis and treatment planning.Aiming at the problem of insufficient protection of image ...Medical image fusion has been developed as an efficient assistive technology in various clinical applications such as medical diagnosis and treatment planning.Aiming at the problem of insufficient protection of image contour and detail information by traditional image fusion methods,a new multimodal medical image fusion method is proposed.This method first uses non-subsampled shearlet transform to decompose the source image to obtain high and low frequency subband coefficients,then uses the latent low rank representation algorithm to fuse the low frequency subband coefficients,and applies the improved PAPCNN algorithm to fuse the high frequency subband coefficients.Finally,based on the automatic setting of parameters,the optimization method configuration of the time decay factorαe is carried out.The experimental results show that the proposed method solves the problems of difficult parameter setting and insufficient detail protection ability in traditional PCNN algorithm fusion images,and at the same time,it has achieved great improvement in visual quality and objective evaluation indicators.展开更多
As social networks become increasingly complex, contemporary fake news often includes textual descriptionsof events accompanied by corresponding images or videos. Fake news in multiple modalities is more likely tocrea...As social networks become increasingly complex, contemporary fake news often includes textual descriptionsof events accompanied by corresponding images or videos. Fake news in multiple modalities is more likely tocreate a misleading perception among users. While early research primarily focused on text-based features forfake news detection mechanisms, there has been relatively limited exploration of learning shared representationsin multimodal (text and visual) contexts. To address these limitations, this paper introduces a multimodal modelfor detecting fake news, which relies on similarity reasoning and adversarial networks. The model employsBidirectional Encoder Representation from Transformers (BERT) and Text Convolutional Neural Network (Text-CNN) for extracting textual features while utilizing the pre-trained Visual Geometry Group 19-layer (VGG-19) toextract visual features. Subsequently, the model establishes similarity representations between the textual featuresextracted by Text-CNN and visual features through similarity learning and reasoning. Finally, these features arefused to enhance the accuracy of fake news detection, and adversarial networks have been employed to investigatethe relationship between fake news and events. This paper validates the proposed model using publicly availablemultimodal datasets from Weibo and Twitter. Experimental results demonstrate that our proposed approachachieves superior performance on Twitter, with an accuracy of 86%, surpassing traditional unimodalmodalmodelsand existing multimodal models. In contrast, the overall better performance of our model on the Weibo datasetsurpasses the benchmark models across multiple metrics. The application of similarity reasoning and adversarialnetworks in multimodal fake news detection significantly enhances detection effectiveness in this paper. However,current research is limited to the fusion of only text and image modalities. Future research directions should aimto further integrate features fromadditionalmodalities to comprehensively represent themultifaceted informationof fake news.展开更多
Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information....Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information. One is to generate sparse attention coefficients associated with acoustic and visual modalities, which helps locate critical emotional se-mantics. The other is fusing complementary cross‐modal representation to construct optimal salient feature combinations of multiple modalities. A Conditional Transformer Fusion Network is proposed to handle these problems. Firstly, the authors equip the transformer module with CNN layers to enhance the detection of subtle signal patterns in nonverbal sequences. Secondly, sentiment words are utilised as context conditions to guide the computation of cross‐modal attention. As a result, the located nonverbal fea-tures are not only salient but also complementary to sentiment words directly. Experi-mental results show that the authors’ method achieves state‐of‐the‐art performance on several multimodal affective analysis datasets.展开更多
Biography videos based on life performances of prominent figures in history aim to describe great mens' life.In this paper,a novel interactive video summarization for biography video based on multimodal fusion is ...Biography videos based on life performances of prominent figures in history aim to describe great mens' life.In this paper,a novel interactive video summarization for biography video based on multimodal fusion is proposed,which is a novel approach of visualizing the specific features for biography video and interacting with video content by taking advantage of the ability of multimodality.In general,a story of movie progresses by dialogues of characters and the subtitles are produced with the basis on the dialogues which contains all the information related to the movie.In this paper,JGibbsLDA is applied to extract key words from subtitles because the biography video consists of different aspects to depict the characters' whole life.In terms of fusing keywords and key-frames,affinity propagation is adopted to calculate the similarity between each key-frame cluster and keywords.Through the method mentioned above,a video summarization is presented based on multimodal fusion which describes video content more completely.In order to reduce the time spent on searching the interest video content and get the relationship between main characters,a kind of map is adopted to visualize video content and interact with video summarization.An experiment is conducted to evaluate video summarization and the results demonstrate that this system can formally facilitate the exploration of video content while improving interaction and finding events of interest efficiently.展开更多
For the last two decades,physicians and clinical experts have used a single imaging modality to identify the normal and abnormal structure of the human body.However,most of the time,medical experts are unable to accur...For the last two decades,physicians and clinical experts have used a single imaging modality to identify the normal and abnormal structure of the human body.However,most of the time,medical experts are unable to accurately analyze and examine the information from a single imaging modality due to the limited information.To overcome this problem,a multimodal approach is adopted to increase the qualitative and quantitative medical information which helps the doctors to easily diagnose diseases in their early stages.In the proposed method,a Multi-resolution Rigid Registration(MRR)technique is used for multimodal image registration while Discrete Wavelet Transform(DWT)along with Principal Component Averaging(PCAv)is utilized for image fusion.The proposed MRR method provides more accurate results as compared with Single Rigid Registration(SRR),while the proposed DWT-PCAv fusion process adds-on more constructive information with less computational time.The proposed method is tested on CT and MRI brain imaging modalities of the HARVARD dataset.The fusion results of the proposed method are compared with the existing fusion techniques.The quality assessment metrics such as Mutual Information(MI),Normalize Crosscorrelation(NCC)and Feature Mutual Information(FMI)are computed for statistical comparison of the proposed method.The proposed methodology provides more accurate results,better image quality and valuable information for medical diagnoses.展开更多
A multimodal biometric system is applied to recognize individuals for authentication using neural networks. In this paper multimodal biometric algorithm is designed by integrating iris, finger vein, palm print and fac...A multimodal biometric system is applied to recognize individuals for authentication using neural networks. In this paper multimodal biometric algorithm is designed by integrating iris, finger vein, palm print and face biometric traits. Normalized score level fusion approach is applied and optimized, encoded for matching decision. It is a multilevel wavelet, phase based fusion algorithm. This robust multimodal biometric algorithm increases the security level, accuracy, reduces memory size and equal error rate and eliminates unimodal biometric algorithm vulnerabilities.展开更多
Multimodal medical image fusion can help physicians provide more accurate treatment plans for patients, as unimodal images provide limited valid information. To address the insufficient ability of traditional medical ...Multimodal medical image fusion can help physicians provide more accurate treatment plans for patients, as unimodal images provide limited valid information. To address the insufficient ability of traditional medical image fusion solutions to protect image details and significant information, a new multimodality medical image fusion method(NSST-PAPCNNLatLRR) is proposed in this paper. Firstly, the high and low-frequency sub-band coefficients are obtained by decomposing the source image using NSST. Then, the latent low-rank representation algorithm is used to process the low-frequency sub-band coefficients;An improved PAPCNN algorithm is also proposed for the fusion of high-frequency sub-band coefficients. The improved PAPCNN model was based on the automatic setting of the parameters, and the optimal method was configured for the time decay factor αe. The experimental results show that, in comparison with the five mainstream fusion algorithms, the new algorithm has significantly improved the visual effect over the comparison algorithm,enhanced the ability to characterize important information in images, and further improved the ability to protect the detailed information;the new algorithm has achieved at least four firsts in six objective indexes.展开更多
3D vehicle detection based on LiDAR-camera fusion is becoming an emerging research topic in autonomous driving.The algorithm based on the Camera-LiDAR object candidate fusion method(CLOCs)is currently considered to be...3D vehicle detection based on LiDAR-camera fusion is becoming an emerging research topic in autonomous driving.The algorithm based on the Camera-LiDAR object candidate fusion method(CLOCs)is currently considered to be a more effective decision-level fusion algorithm,but it does not fully utilize the extracted features of 3D and 2D.Therefore,we proposed a 3D vehicle detection algorithm based onmultimodal decision-level fusion.First,project the anchor point of the 3D detection bounding box into the 2D image,calculate the distance between 2D and 3D anchor points,and use this distance as a new fusion feature to enhance the feature redundancy of the network.Subsequently,add an attention module:squeeze-and-excitation networks,weight each feature channel to enhance the important features of the network,and suppress useless features.The experimental results show that the mean average precision of the algorithm in the KITTI dataset is 82.96%,which outperforms previous state-ofthe-art multimodal fusion-based methods,and the average accuracy in the Easy,Moderate and Hard evaluation indicators reaches 88.96%,82.60%,and 77.31%,respectively,which are higher compared to the original CLOCs model by 1.02%,2.29%,and 0.41%,respectively.Compared with the original CLOCs algorithm,our algorithm has higher accuracy and better performance in 3D vehicle detection.展开更多
In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection ...In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection will be affected by problems such as illumination changes,object occlusion,and object detection distance.To this purpose,we face these challenges by proposing a multimodal feature fusion network for 3D object detection(MFF-Net).In this research,this paper first uses the spatial transformation projection algorithm to map the image features into the feature space,so that the image features are in the same spatial dimension when fused with the point cloud features.Then,feature channel weighting is performed using an adaptive expression augmentation fusion network to enhance important network features,suppress useless features,and increase the directionality of the network to features.Finally,this paper increases the probability of false detection and missed detection in the non-maximum suppression algo-rithm by increasing the one-dimensional threshold.So far,this paper has constructed a complete 3D target detection network based on multimodal feature fusion.The experimental results show that the proposed achieves an average accuracy of 82.60%on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)dataset,outperforming previous state-of-the-art multimodal fusion networks.In Easy,Moderate,and hard evaluation indicators,the accuracy rate of this paper reaches 90.96%,81.46%,and 75.39%.This shows that the MFF-Net network has good performance in 3D object detection.展开更多
Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside th...Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside the advantages,depth-sensing also presents many practical challenges.For instance,the depth sensors impose an additional payload burden on the robotic inspection platforms limiting the operation time and increasing the inspection cost.Additionally,some lidar-based depth sensors have poor outdoor performance due to sunlight contamination during the daytime.In this context,this study investigates the feasibility of abolishing depth-sensing at test time without compromising the segmentation performance.An autonomous damage segmentation framework is developed,based on recent advancements in vision-based multi-modal sensing such as modality hallucination(MH)and monocular depth estimation(MDE),which require depth data only during the model training.At the time of deployment,depth data becomes expendable as it can be simulated from the corresponding RGB frames.This makes it possible to reap the benefits of depth fusion without any depth perception per se.This study explored two different depth encoding techniques and three different fusion strategies in addition to a baseline RGB-based model.The proposed approach is validated on computer-generated RGB-D data of reinforced concrete buildings subjected to seismic damage.It was observed that the surrogate techniques can increase the segmentation IoU by up to 20.1%with a negligible increase in the computation cost.Overall,this study is believed to make a positive contribution to enhancing the resilience of critical civil infrastructure.展开更多
In this paper,we propose a new image fusion algorithm based on two-dimensional Scale-Mixing Complex Wavelet Transform(2D-SMCWT).The fusion of the detail 2D-SMCWT cofficients is performed via a Bayesian Maximum a Poste...In this paper,we propose a new image fusion algorithm based on two-dimensional Scale-Mixing Complex Wavelet Transform(2D-SMCWT).The fusion of the detail 2D-SMCWT cofficients is performed via a Bayesian Maximum a Posteriori(MAP)approach by considering a trivariate statistical model for the local neighboring of 2D-SMCWT coefficients.For the approx imation coefficients,a new fusion rule based on the Principal Component Analysis(PCA)is applied.We conduct several experiments using three different groups of multimodal medical images to evaluate the performance of the proposed method.The obt ained results prove the superiority of the proposed method over the state of the art fusion methods in terms of visual quality and several commonly used metrics.Robustness of the proposed method is further tested against different types of noise.The plots of fusion met rics establish the accuracy of the proposed fusion method.展开更多
Objective This paper proposed a novel algorithm of discrete wavelet transform(DWT) which is used for multimodal medical image fusion. Methods The source medical images are initially transformed by DWT followed by fusi...Objective This paper proposed a novel algorithm of discrete wavelet transform(DWT) which is used for multimodal medical image fusion. Methods The source medical images are initially transformed by DWT followed by fusing low and high frequency sub-images. Then, the "coefficient absolute value" that can provide clear and detail parts is adapted to fuse high-frequency coefficients, where as the "region energy ratio" which can efficiently preserve most information of source images is employed to fuse low-frequency coefficients. Finally, the fused image is reconstructed by inverse wavelet transform. Results Visually and quantitatively experimental results indicate that the proposed fusion method is superior to traditional wavelet transform and the existing fusion methods. Conclusion The proposed method is a feasible approach for multimodal medical image fusion which can obtain more efficient and accurate fusions results even in the noise environment.展开更多
Multimodal medical image fusion is used to merge functional and structural information of the same body organ. Most of the multimodal image fusion algorithms are designed to fuse grayscale images that are produced by ...Multimodal medical image fusion is used to merge functional and structural information of the same body organ. Most of the multimodal image fusion algorithms are designed to fuse grayscale images that are produced by different imaging modalities. It is likely that problems of colour distortion and information loss will occur in fused image when source images are fused by using algorithms that are not originally designed to fuse colour images. These problems can be avoided by representing and processing source images as quaternion numbers. Quaternion representation of a colour pixel encodes information of its colour channels on the imaginary parts of a quaternion number and provides the advantage to processing colour information holistically as a vector field. In this paper, we proposed an image fusion algorithm based on Quaternion Principal Component Analysis (QPCA), to fuse multimodal colour medical images.Quaternion principal components are calculated by decomposing quaternion covariance matrix using Quaternion Eigenvalue Decomposition (QEVD). Fusion rule is designed, based on the fusion weights that are extracted from the highly influential principal component. To test the performance of the proposed algorithm,experiments have been performed on six image-sets of multimodal colour images of the brain. Experimental results are compared objectively with existing image fusion algorithms. Comparison results show that the proposed algorithm performed better than existing algorithms in fusing colour medical images.展开更多
Nowadays, the vein based recognition system becomes an emerging and facilitating biometric technology in the recognition system. Vein recognition exploits the different modalities such as finger, palm and hand image f...Nowadays, the vein based recognition system becomes an emerging and facilitating biometric technology in the recognition system. Vein recognition exploits the different modalities such as finger, palm and hand image for the person identification. In this work, the fuzzy least brain storm optimization and Euclidean distance(EED) are proposed for the vein based recognition system. Initially, the input image is fed into the region of interest(ROI) extraction which obtains the appropriate image for the subsequent step. Then, features or vein pattern is extracted by the image enlightening, circular averaging filter and holoentropy based thresholding. After the features are obtained, the entropy based Euclidean distance is proposed to fuse the features by the score level fusion with the weight score value. Finally, the optimal matching score is computed iteratively by the newly developed fuzzy least brain storm optimization(FLBSO) algorithm. The novel algorithm is developed by the least mean square(LMS) algorithm and fuzzy brain storm optimization(FBSO). Thus, the experimental results are evaluated and the performance is compared with the existing systems using false acceptance rate(FAR), false rejection rate(FRR) and accuracy. The performance outcome of the proposed algorithm attains the higher accuracy of 89.9% which ensures the better recognition rate.展开更多
A multimodal fusion classifier is presented based on neural networks (NNs) learned with hints for automatic spontaneous affect recognition. In case that different channels can provide com- plementary information, fe...A multimodal fusion classifier is presented based on neural networks (NNs) learned with hints for automatic spontaneous affect recognition. In case that different channels can provide com- plementary information, features are utilized from four behavioral cues: frontal-view facial expres- sion, profile-view facial expression, shoulder movement, and vocalization (audio). NNs are used in both single cue processing and multimodal fusion. Coarse categories and quadrants in the activation- evaluation dimensional space are utilized respectively as the heuristic information (hints) of NNs during training, aiming at recognition of basic emotions. With the aid of hints, the weights in NNs could learn optimal feature groupings and the subtlety and complexity of spontaneous affective states could be better modeled. The proposed method requires low computation effort and reaches high recognition accuracy, even if the training data is insufficient. Experiment results on the Semaine nat- uralistic dataset demonstrate that our method is effective and promising.展开更多
Gesture recognition technology enables machines to read human gestures and has significant application prospects in the fields of human-computer interaction and sign language translation.Existing researches usually us...Gesture recognition technology enables machines to read human gestures and has significant application prospects in the fields of human-computer interaction and sign language translation.Existing researches usually use convolutional neural networks to extract features directly from raw gesture data for gesture recognition,but the networks are affected by much interference information in the input data and thus fit to some unimportant features.In this paper,we proposed a novel method for encoding spatio-temporal information,which can enhance the key features required for gesture recognition,such as shape,structure,contour,position and hand motion of gestures,thereby improving the accuracy of gesture recognition.This encoding method can encode arbitrarily multiple frames of gesture data into a single frame of the spatio-temporal feature map and use the spatio-temporal feature map as the input to the neural network.This can guide the model to fit important features while avoiding the use of complex recurrent network structures to extract temporal features.In addition,we designed two sub-networks and trained the model using a sub-network pre-training strategy that trains the sub-networks first and then the entire network,so as to avoid the subnetworks focusing too much on the information of a single category feature and being overly influenced by each other’s features.Experimental results on two public gesture datasets show that the proposed spatio-temporal information encoding method achieves advanced accuracy.展开更多
基金the National Natural Science Foundation of China(62001246,62001248,62171232)Key R&D Program of Jiangsu Province Key project and topics under Grant BE2021095+3 种基金the Natural Science Foundation of Jiangsu Province Higher Education Institutions(20KJB510020)the Future Network Scientific Research Fund Project(FNSRFP-2021-YB-16)the open research fund of Key Lab of Broadband Wireless Communication and Sensor Network Technology(JZNY202110)the NUPTSF under Grant(NY220070).
文摘The digital twin is the concept of transcending reality,which is the reverse feedback from the real physical space to the virtual digital space.People hold great prospects for this emerging technology.In order to realize the upgrading of the digital twin industrial chain,it is urgent to introduce more modalities,such as vision,haptics,hearing and smell,into the virtual digital space,which assists physical entities and virtual objects in creating a closer connection.Therefore,perceptual understanding and object recognition have become an urgent hot topic in the digital twin.Existing surface material classification schemes often achieve recognition through machine learning or deep learning in a single modality,ignoring the complementarity between multiple modalities.In order to overcome this dilemma,we propose a multimodal fusion network in our article that combines two modalities,visual and haptic,for surface material recognition.On the one hand,the network makes full use of the potential correlations between multiple modalities to deeply mine the modal semantics and complete the data mapping.On the other hand,the network is extensible and can be used as a universal architecture to include more modalities.Experiments show that the constructed multimodal fusion network can achieve 99.42%classification accuracy while reducing complexity.
基金a grant from the National Natural Science Foundation of China(Nos.11905239,12005248 and 12105303).
文摘With the rapid development of the mobile communication and the Internet,the previous web anomaly detectionand identificationmodels were built relying on security experts’empirical knowledge and attack features.Althoughthis approach can achieve higher detection performance,it requires huge human labor and resources to maintainthe feature library.In contrast,semantic feature engineering can dynamically discover new semantic featuresand optimize feature selection by automatically analyzing the semantic information contained in the data itself,thus reducing dependence on prior knowledge.However,current semantic features still have the problem ofsemantic expression singularity,as they are extracted from a single semantic mode such as word segmentation,character segmentation,or arbitrary semantic feature extraction.This paper extracts features of web requestsfrom dual semantic granularity,and proposes a semantic feature fusion method to solve the above problems.Themethod first preprocesses web requests,and extracts word-level and character-level semantic features of URLs viaconvolutional neural network(CNN),respectively.By constructing three loss functions to reduce losses betweenfeatures,labels and categories.Experiments on the HTTP CSIC 2010,Malicious URLs and HttpParams datasetsverify the proposedmethod.Results show that compared withmachine learning,deep learningmethods and BERTmodel,the proposed method has better detection performance.And it achieved the best detection rate of 99.16%in the dataset HttpParams.
基金supported by the National Natural Science Foundation of China(62073330)。
文摘Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon seasons appears and continues,airlines operating in threatened areas and passengers having travel plans during this time period will pay close attention to the development of tropical storms.This paper proposes a deep multimodal fusion and multitasking trajectory prediction model that can improve the reliability of typhoon trajectory prediction and reduce the quantity of flight scheduling cancellation.The deep multimodal fusion module is formed by deep fusion of the feature output by multiple submodal fusion modules,and the multitask generation module uses longitude and latitude as two related tasks for simultaneous prediction.With more dependable data accuracy,problems can be analysed rapidly and more efficiently,enabling better decision-making with a proactive versus reactive posture.When multiple modalities coexist,features can be extracted from them simultaneously to supplement each other’s information.An actual case study,the typhoon Lichma that swept China in 2019,has demonstrated that the algorithm can effectively reduce the number of unnecessary flight cancellations compared to existing flight scheduling and assist the new generation of flight scheduling systems under extreme weather.
文摘Medical image fusion has been developed as an efficient assistive technology in various clinical applications such as medical diagnosis and treatment planning.Aiming at the problem of insufficient protection of image contour and detail information by traditional image fusion methods,a new multimodal medical image fusion method is proposed.This method first uses non-subsampled shearlet transform to decompose the source image to obtain high and low frequency subband coefficients,then uses the latent low rank representation algorithm to fuse the low frequency subband coefficients,and applies the improved PAPCNN algorithm to fuse the high frequency subband coefficients.Finally,based on the automatic setting of parameters,the optimization method configuration of the time decay factorαe is carried out.The experimental results show that the proposed method solves the problems of difficult parameter setting and insufficient detail protection ability in traditional PCNN algorithm fusion images,and at the same time,it has achieved great improvement in visual quality and objective evaluation indicators.
基金the National Natural Science Foundation of China(No.62302540)with author F.F.S.For more information,please visit their website at https://www.nsfc.gov.cn/.Additionally,it is also funded by the Open Foundation of Henan Key Laboratory of Cyberspace Situation Awareness(No.HNTS2022020)+1 种基金where F.F.S is an author.Further details can be found at http://xt.hnkjt.gov.cn/data/pingtai/.The research is also supported by the Natural Science Foundation of Henan Province Youth Science Fund Project(No.232300420422)for more information,you can visit https://kjt.henan.gov.cn/2022/09-02/2599082.html.Lastly,it receives funding from the Natural Science Foundation of Zhongyuan University of Technology(No.K2023QN018),where F.F.S is an author.You can find more information at https://www.zut.edu.cn/.
文摘As social networks become increasingly complex, contemporary fake news often includes textual descriptionsof events accompanied by corresponding images or videos. Fake news in multiple modalities is more likely tocreate a misleading perception among users. While early research primarily focused on text-based features forfake news detection mechanisms, there has been relatively limited exploration of learning shared representationsin multimodal (text and visual) contexts. To address these limitations, this paper introduces a multimodal modelfor detecting fake news, which relies on similarity reasoning and adversarial networks. The model employsBidirectional Encoder Representation from Transformers (BERT) and Text Convolutional Neural Network (Text-CNN) for extracting textual features while utilizing the pre-trained Visual Geometry Group 19-layer (VGG-19) toextract visual features. Subsequently, the model establishes similarity representations between the textual featuresextracted by Text-CNN and visual features through similarity learning and reasoning. Finally, these features arefused to enhance the accuracy of fake news detection, and adversarial networks have been employed to investigatethe relationship between fake news and events. This paper validates the proposed model using publicly availablemultimodal datasets from Weibo and Twitter. Experimental results demonstrate that our proposed approachachieves superior performance on Twitter, with an accuracy of 86%, surpassing traditional unimodalmodalmodelsand existing multimodal models. In contrast, the overall better performance of our model on the Weibo datasetsurpasses the benchmark models across multiple metrics. The application of similarity reasoning and adversarialnetworks in multimodal fake news detection significantly enhances detection effectiveness in this paper. However,current research is limited to the fusion of only text and image modalities. Future research directions should aimto further integrate features fromadditionalmodalities to comprehensively represent themultifaceted informationof fake news.
基金National Key Research and Development Plan of China, Grant/Award Number: 2021YFB3600503National Natural Science Foundation of China, Grant/Award Numbers: 62276065, U21A20472。
文摘Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information. One is to generate sparse attention coefficients associated with acoustic and visual modalities, which helps locate critical emotional se-mantics. The other is fusing complementary cross‐modal representation to construct optimal salient feature combinations of multiple modalities. A Conditional Transformer Fusion Network is proposed to handle these problems. Firstly, the authors equip the transformer module with CNN layers to enhance the detection of subtle signal patterns in nonverbal sequences. Secondly, sentiment words are utilised as context conditions to guide the computation of cross‐modal attention. As a result, the located nonverbal fea-tures are not only salient but also complementary to sentiment words directly. Experi-mental results show that the authors’ method achieves state‐of‐the‐art performance on several multimodal affective analysis datasets.
基金Supported by the National Key Research and Development Plan(2016YFB1001200)the Natural Science Foundation of China(U1435220,61232013)Natural Science Research Projects of Universities in Jiangsu Province(16KJA520003)
文摘Biography videos based on life performances of prominent figures in history aim to describe great mens' life.In this paper,a novel interactive video summarization for biography video based on multimodal fusion is proposed,which is a novel approach of visualizing the specific features for biography video and interacting with video content by taking advantage of the ability of multimodality.In general,a story of movie progresses by dialogues of characters and the subtitles are produced with the basis on the dialogues which contains all the information related to the movie.In this paper,JGibbsLDA is applied to extract key words from subtitles because the biography video consists of different aspects to depict the characters' whole life.In terms of fusing keywords and key-frames,affinity propagation is adopted to calculate the similarity between each key-frame cluster and keywords.Through the method mentioned above,a video summarization is presented based on multimodal fusion which describes video content more completely.In order to reduce the time spent on searching the interest video content and get the relationship between main characters,a kind of map is adopted to visualize video content and interact with video summarization.An experiment is conducted to evaluate video summarization and the results demonstrate that this system can formally facilitate the exploration of video content while improving interaction and finding events of interest efficiently.
文摘For the last two decades,physicians and clinical experts have used a single imaging modality to identify the normal and abnormal structure of the human body.However,most of the time,medical experts are unable to accurately analyze and examine the information from a single imaging modality due to the limited information.To overcome this problem,a multimodal approach is adopted to increase the qualitative and quantitative medical information which helps the doctors to easily diagnose diseases in their early stages.In the proposed method,a Multi-resolution Rigid Registration(MRR)technique is used for multimodal image registration while Discrete Wavelet Transform(DWT)along with Principal Component Averaging(PCAv)is utilized for image fusion.The proposed MRR method provides more accurate results as compared with Single Rigid Registration(SRR),while the proposed DWT-PCAv fusion process adds-on more constructive information with less computational time.The proposed method is tested on CT and MRI brain imaging modalities of the HARVARD dataset.The fusion results of the proposed method are compared with the existing fusion techniques.The quality assessment metrics such as Mutual Information(MI),Normalize Crosscorrelation(NCC)and Feature Mutual Information(FMI)are computed for statistical comparison of the proposed method.The proposed methodology provides more accurate results,better image quality and valuable information for medical diagnoses.
文摘A multimodal biometric system is applied to recognize individuals for authentication using neural networks. In this paper multimodal biometric algorithm is designed by integrating iris, finger vein, palm print and face biometric traits. Normalized score level fusion approach is applied and optimized, encoded for matching decision. It is a multilevel wavelet, phase based fusion algorithm. This robust multimodal biometric algorithm increases the security level, accuracy, reduces memory size and equal error rate and eliminates unimodal biometric algorithm vulnerabilities.
基金funded by the National Natural Science Foundation of China,grant number 61302188.
文摘Multimodal medical image fusion can help physicians provide more accurate treatment plans for patients, as unimodal images provide limited valid information. To address the insufficient ability of traditional medical image fusion solutions to protect image details and significant information, a new multimodality medical image fusion method(NSST-PAPCNNLatLRR) is proposed in this paper. Firstly, the high and low-frequency sub-band coefficients are obtained by decomposing the source image using NSST. Then, the latent low-rank representation algorithm is used to process the low-frequency sub-band coefficients;An improved PAPCNN algorithm is also proposed for the fusion of high-frequency sub-band coefficients. The improved PAPCNN model was based on the automatic setting of the parameters, and the optimal method was configured for the time decay factor αe. The experimental results show that, in comparison with the five mainstream fusion algorithms, the new algorithm has significantly improved the visual effect over the comparison algorithm,enhanced the ability to characterize important information in images, and further improved the ability to protect the detailed information;the new algorithm has achieved at least four firsts in six objective indexes.
基金supported by the Financial Support of the Key Research and Development Projects of Anhui (202104a05020003)the Natural Science Foundation of Anhui Province (2208085MF173)the Anhui Development and Reform Commission Supports R&D and Innovation Projects ([2020]479).
文摘3D vehicle detection based on LiDAR-camera fusion is becoming an emerging research topic in autonomous driving.The algorithm based on the Camera-LiDAR object candidate fusion method(CLOCs)is currently considered to be a more effective decision-level fusion algorithm,but it does not fully utilize the extracted features of 3D and 2D.Therefore,we proposed a 3D vehicle detection algorithm based onmultimodal decision-level fusion.First,project the anchor point of the 3D detection bounding box into the 2D image,calculate the distance between 2D and 3D anchor points,and use this distance as a new fusion feature to enhance the feature redundancy of the network.Subsequently,add an attention module:squeeze-and-excitation networks,weight each feature channel to enhance the important features of the network,and suppress useless features.The experimental results show that the mean average precision of the algorithm in the KITTI dataset is 82.96%,which outperforms previous state-ofthe-art multimodal fusion-based methods,and the average accuracy in the Easy,Moderate and Hard evaluation indicators reaches 88.96%,82.60%,and 77.31%,respectively,which are higher compared to the original CLOCs model by 1.02%,2.29%,and 0.41%,respectively.Compared with the original CLOCs algorithm,our algorithm has higher accuracy and better performance in 3D vehicle detection.
基金The authors would like to thank the financial support of Natural Science Foundation of Anhui Province(No.2208085MF173)the key research and development projects of Anhui(202104a05020003)+2 种基金the anhui development and reform commission supports R&D and innovation project([2020]479)the national natural science foundation of China(51575001)Anhui university scientific research platform innovation team building project(2016-2018).
文摘In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection will be affected by problems such as illumination changes,object occlusion,and object detection distance.To this purpose,we face these challenges by proposing a multimodal feature fusion network for 3D object detection(MFF-Net).In this research,this paper first uses the spatial transformation projection algorithm to map the image features into the feature space,so that the image features are in the same spatial dimension when fused with the point cloud features.Then,feature channel weighting is performed using an adaptive expression augmentation fusion network to enhance important network features,suppress useless features,and increase the directionality of the network to features.Finally,this paper increases the probability of false detection and missed detection in the non-maximum suppression algo-rithm by increasing the one-dimensional threshold.So far,this paper has constructed a complete 3D target detection network based on multimodal feature fusion.The experimental results show that the proposed achieves an average accuracy of 82.60%on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)dataset,outperforming previous state-of-the-art multimodal fusion networks.In Easy,Moderate,and hard evaluation indicators,the accuracy rate of this paper reaches 90.96%,81.46%,and 75.39%.This shows that the MFF-Net network has good performance in 3D object detection.
基金supported in part by a fund from Bentley Systems,Inc.
文摘Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside the advantages,depth-sensing also presents many practical challenges.For instance,the depth sensors impose an additional payload burden on the robotic inspection platforms limiting the operation time and increasing the inspection cost.Additionally,some lidar-based depth sensors have poor outdoor performance due to sunlight contamination during the daytime.In this context,this study investigates the feasibility of abolishing depth-sensing at test time without compromising the segmentation performance.An autonomous damage segmentation framework is developed,based on recent advancements in vision-based multi-modal sensing such as modality hallucination(MH)and monocular depth estimation(MDE),which require depth data only during the model training.At the time of deployment,depth data becomes expendable as it can be simulated from the corresponding RGB frames.This makes it possible to reap the benefits of depth fusion without any depth perception per se.This study explored two different depth encoding techniques and three different fusion strategies in addition to a baseline RGB-based model.The proposed approach is validated on computer-generated RGB-D data of reinforced concrete buildings subjected to seismic damage.It was observed that the surrogate techniques can increase the segmentation IoU by up to 20.1%with a negligible increase in the computation cost.Overall,this study is believed to make a positive contribution to enhancing the resilience of critical civil infrastructure.
文摘In this paper,we propose a new image fusion algorithm based on two-dimensional Scale-Mixing Complex Wavelet Transform(2D-SMCWT).The fusion of the detail 2D-SMCWT cofficients is performed via a Bayesian Maximum a Posteriori(MAP)approach by considering a trivariate statistical model for the local neighboring of 2D-SMCWT coefficients.For the approx imation coefficients,a new fusion rule based on the Principal Component Analysis(PCA)is applied.We conduct several experiments using three different groups of multimodal medical images to evaluate the performance of the proposed method.The obt ained results prove the superiority of the proposed method over the state of the art fusion methods in terms of visual quality and several commonly used metrics.Robustness of the proposed method is further tested against different types of noise.The plots of fusion met rics establish the accuracy of the proposed fusion method.
文摘Objective This paper proposed a novel algorithm of discrete wavelet transform(DWT) which is used for multimodal medical image fusion. Methods The source medical images are initially transformed by DWT followed by fusing low and high frequency sub-images. Then, the "coefficient absolute value" that can provide clear and detail parts is adapted to fuse high-frequency coefficients, where as the "region energy ratio" which can efficiently preserve most information of source images is employed to fuse low-frequency coefficients. Finally, the fused image is reconstructed by inverse wavelet transform. Results Visually and quantitatively experimental results indicate that the proposed fusion method is superior to traditional wavelet transform and the existing fusion methods. Conclusion The proposed method is a feasible approach for multimodal medical image fusion which can obtain more efficient and accurate fusions results even in the noise environment.
文摘Multimodal medical image fusion is used to merge functional and structural information of the same body organ. Most of the multimodal image fusion algorithms are designed to fuse grayscale images that are produced by different imaging modalities. It is likely that problems of colour distortion and information loss will occur in fused image when source images are fused by using algorithms that are not originally designed to fuse colour images. These problems can be avoided by representing and processing source images as quaternion numbers. Quaternion representation of a colour pixel encodes information of its colour channels on the imaginary parts of a quaternion number and provides the advantage to processing colour information holistically as a vector field. In this paper, we proposed an image fusion algorithm based on Quaternion Principal Component Analysis (QPCA), to fuse multimodal colour medical images.Quaternion principal components are calculated by decomposing quaternion covariance matrix using Quaternion Eigenvalue Decomposition (QEVD). Fusion rule is designed, based on the fusion weights that are extracted from the highly influential principal component. To test the performance of the proposed algorithm,experiments have been performed on six image-sets of multimodal colour images of the brain. Experimental results are compared objectively with existing image fusion algorithms. Comparison results show that the proposed algorithm performed better than existing algorithms in fusing colour medical images.
文摘Nowadays, the vein based recognition system becomes an emerging and facilitating biometric technology in the recognition system. Vein recognition exploits the different modalities such as finger, palm and hand image for the person identification. In this work, the fuzzy least brain storm optimization and Euclidean distance(EED) are proposed for the vein based recognition system. Initially, the input image is fed into the region of interest(ROI) extraction which obtains the appropriate image for the subsequent step. Then, features or vein pattern is extracted by the image enlightening, circular averaging filter and holoentropy based thresholding. After the features are obtained, the entropy based Euclidean distance is proposed to fuse the features by the score level fusion with the weight score value. Finally, the optimal matching score is computed iteratively by the newly developed fuzzy least brain storm optimization(FLBSO) algorithm. The novel algorithm is developed by the least mean square(LMS) algorithm and fuzzy brain storm optimization(FBSO). Thus, the experimental results are evaluated and the performance is compared with the existing systems using false acceptance rate(FAR), false rejection rate(FRR) and accuracy. The performance outcome of the proposed algorithm attains the higher accuracy of 89.9% which ensures the better recognition rate.
基金Supported by the National Natural Science Foundation of China(60905006)the Basic Research Fund of Beijing Institute ofTechnology(20120842006)
文摘A multimodal fusion classifier is presented based on neural networks (NNs) learned with hints for automatic spontaneous affect recognition. In case that different channels can provide com- plementary information, features are utilized from four behavioral cues: frontal-view facial expres- sion, profile-view facial expression, shoulder movement, and vocalization (audio). NNs are used in both single cue processing and multimodal fusion. Coarse categories and quadrants in the activation- evaluation dimensional space are utilized respectively as the heuristic information (hints) of NNs during training, aiming at recognition of basic emotions. With the aid of hints, the weights in NNs could learn optimal feature groupings and the subtlety and complexity of spontaneous affective states could be better modeled. The proposed method requires low computation effort and reaches high recognition accuracy, even if the training data is insufficient. Experiment results on the Semaine nat- uralistic dataset demonstrate that our method is effective and promising.
基金This work was supported,in part,by the National Nature Science Foundation of China under grant numbers 62272236in part,by the Natural Science Foundation of Jiangsu Province under grant numbers BK20201136,BK20191401in part,by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)fund.
文摘Gesture recognition technology enables machines to read human gestures and has significant application prospects in the fields of human-computer interaction and sign language translation.Existing researches usually use convolutional neural networks to extract features directly from raw gesture data for gesture recognition,but the networks are affected by much interference information in the input data and thus fit to some unimportant features.In this paper,we proposed a novel method for encoding spatio-temporal information,which can enhance the key features required for gesture recognition,such as shape,structure,contour,position and hand motion of gestures,thereby improving the accuracy of gesture recognition.This encoding method can encode arbitrarily multiple frames of gesture data into a single frame of the spatio-temporal feature map and use the spatio-temporal feature map as the input to the neural network.This can guide the model to fit important features while avoiding the use of complex recurrent network structures to extract temporal features.In addition,we designed two sub-networks and trained the model using a sub-network pre-training strategy that trains the sub-networks first and then the entire network,so as to avoid the subnetworks focusing too much on the information of a single category feature and being overly influenced by each other’s features.Experimental results on two public gesture datasets show that the proposed spatio-temporal information encoding method achieves advanced accuracy.