期刊文献+
共找到27篇文章
< 1 2 >
每页显示 20 50 100
Multimodal fusion recognition for digital twin
1
作者 Tianzhe Zhou Xuguang Zhang +1 位作者 Bing Kang Mingkai Chen 《Digital Communications and Networks》 SCIE CSCD 2024年第2期337-346,共10页
The digital twin is the concept of transcending reality,which is the reverse feedback from the real physical space to the virtual digital space.People hold great prospects for this emerging technology.In order to real... The digital twin is the concept of transcending reality,which is the reverse feedback from the real physical space to the virtual digital space.People hold great prospects for this emerging technology.In order to realize the upgrading of the digital twin industrial chain,it is urgent to introduce more modalities,such as vision,haptics,hearing and smell,into the virtual digital space,which assists physical entities and virtual objects in creating a closer connection.Therefore,perceptual understanding and object recognition have become an urgent hot topic in the digital twin.Existing surface material classification schemes often achieve recognition through machine learning or deep learning in a single modality,ignoring the complementarity between multiple modalities.In order to overcome this dilemma,we propose a multimodal fusion network in our article that combines two modalities,visual and haptic,for surface material recognition.On the one hand,the network makes full use of the potential correlations between multiple modalities to deeply mine the modal semantics and complete the data mapping.On the other hand,the network is extensible and can be used as a universal architecture to include more modalities.Experiments show that the constructed multimodal fusion network can achieve 99.42%classification accuracy while reducing complexity. 展开更多
关键词 Digital twin multimodal fusion Object recognition Deep learning Transfer learning
下载PDF
A deep multimodal fusion and multitasking trajectory prediction model for typhoon trajectory prediction to reduce flight scheduling cancellation
2
作者 TANG Jun QIN Wanting +1 位作者 PAN Qingtao LAO Songyang 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第3期666-678,共13页
Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon... Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon seasons appears and continues,airlines operating in threatened areas and passengers having travel plans during this time period will pay close attention to the development of tropical storms.This paper proposes a deep multimodal fusion and multitasking trajectory prediction model that can improve the reliability of typhoon trajectory prediction and reduce the quantity of flight scheduling cancellation.The deep multimodal fusion module is formed by deep fusion of the feature output by multiple submodal fusion modules,and the multitask generation module uses longitude and latitude as two related tasks for simultaneous prediction.With more dependable data accuracy,problems can be analysed rapidly and more efficiently,enabling better decision-making with a proactive versus reactive posture.When multiple modalities coexist,features can be extracted from them simultaneously to supplement each other’s information.An actual case study,the typhoon Lichma that swept China in 2019,has demonstrated that the algorithm can effectively reduce the number of unnecessary flight cancellations compared to existing flight scheduling and assist the new generation of flight scheduling systems under extreme weather. 展开更多
关键词 flight scheduling optimization deep multimodal fusion multitasking trajectory prediction typhoon weather flight cancellation prediction reliability
下载PDF
Multimodal Fusion of Brain Imaging Data: Methods and Applications
3
作者 Na Luo Weiyang Shi +2 位作者 Zhengyi Yang Ming Song Tianzi Jiang 《Machine Intelligence Research》 EI CSCD 2024年第1期136-152,共17页
Neuroimaging data typically include multiple modalities,such as structural or functional magnetic resonance imaging,dif-fusion tensor imaging,and positron emission tomography,which provide multiple views for observing... Neuroimaging data typically include multiple modalities,such as structural or functional magnetic resonance imaging,dif-fusion tensor imaging,and positron emission tomography,which provide multiple views for observing and analyzing the brain.To lever-age the complementary representations of different modalities,multimodal fusion is consequently needed to dig out both inter-modality and intra-modality information.With the exploited rich information,it is becoming popular to combine multiple modality data to ex-plore the structural and functional characteristics of the brain in both health and disease status.In this paper,we first review a wide spectrum of advanced machine learning methodologies for fusing multimodal brain imaging data,broadly categorized into unsupervised and supervised learning strategies.Followed by this,some representative applications are discussed,including how they help to under-stand the brain arealization,how they improve the prediction of behavioral phenotypes and brain aging,and how they accelerate the biomarker exploration of brain diseases.Finally,we discuss some exciting emerging trends and important future directions.Collectively,we intend to offer a comprehensive overview of brain imaging fusion methods and their successful applications,along with the chal-lenges imposed by multi-scale and big data,which arises an urgent demand on developing new models and platforms. 展开更多
关键词 multimodal fusion supervised learning unsupervised learning brain atlas COGNITION brain disorders
原文传递
3D Vehicle Detection Algorithm Based onMultimodal Decision-Level Fusion
4
作者 Peicheng Shi Heng Qi +1 位作者 Zhiqiang Liu Aixi Yang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第6期2007-2023,共17页
3D vehicle detection based on LiDAR-camera fusion is becoming an emerging research topic in autonomous driving.The algorithm based on the Camera-LiDAR object candidate fusion method(CLOCs)is currently considered to be... 3D vehicle detection based on LiDAR-camera fusion is becoming an emerging research topic in autonomous driving.The algorithm based on the Camera-LiDAR object candidate fusion method(CLOCs)is currently considered to be a more effective decision-level fusion algorithm,but it does not fully utilize the extracted features of 3D and 2D.Therefore,we proposed a 3D vehicle detection algorithm based onmultimodal decision-level fusion.First,project the anchor point of the 3D detection bounding box into the 2D image,calculate the distance between 2D and 3D anchor points,and use this distance as a new fusion feature to enhance the feature redundancy of the network.Subsequently,add an attention module:squeeze-and-excitation networks,weight each feature channel to enhance the important features of the network,and suppress useless features.The experimental results show that the mean average precision of the algorithm in the KITTI dataset is 82.96%,which outperforms previous state-ofthe-art multimodal fusion-based methods,and the average accuracy in the Easy,Moderate and Hard evaluation indicators reaches 88.96%,82.60%,and 77.31%,respectively,which are higher compared to the original CLOCs model by 1.02%,2.29%,and 0.41%,respectively.Compared with the original CLOCs algorithm,our algorithm has higher accuracy and better performance in 3D vehicle detection. 展开更多
关键词 3D vehicle detection multimodal fusion CLOCs network structure optimization attention module
下载PDF
MFF-Net: Multimodal Feature Fusion Network for 3D Object Detection
5
作者 Peicheng Shi Zhiqiang Liu +1 位作者 Heng Qi Aixi Yang 《Computers, Materials & Continua》 SCIE EI 2023年第6期5615-5637,共23页
In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection ... In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection will be affected by problems such as illumination changes,object occlusion,and object detection distance.To this purpose,we face these challenges by proposing a multimodal feature fusion network for 3D object detection(MFF-Net).In this research,this paper first uses the spatial transformation projection algorithm to map the image features into the feature space,so that the image features are in the same spatial dimension when fused with the point cloud features.Then,feature channel weighting is performed using an adaptive expression augmentation fusion network to enhance important network features,suppress useless features,and increase the directionality of the network to features.Finally,this paper increases the probability of false detection and missed detection in the non-maximum suppression algo-rithm by increasing the one-dimensional threshold.So far,this paper has constructed a complete 3D target detection network based on multimodal feature fusion.The experimental results show that the proposed achieves an average accuracy of 82.60%on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)dataset,outperforming previous state-of-the-art multimodal fusion networks.In Easy,Moderate,and hard evaluation indicators,the accuracy rate of this paper reaches 90.96%,81.46%,and 75.39%.This shows that the MFF-Net network has good performance in 3D object detection. 展开更多
关键词 3D object detection multimodal fusion neural network autonomous driving attention mechanism
下载PDF
Data-driven multimodal fusion:approaches and applications in psychiatric research
6
作者 Jing Sui Dongmei Zhi Vince D Calhoun 《Psychoradiology》 2023年第1期135-153,共19页
In the era of big data,where vast amounts of information are being generated and collected at an unprecedented rate,there is a pressing demand for innovative data-driven multi-modal fusion methods.These methods aim to... In the era of big data,where vast amounts of information are being generated and collected at an unprecedented rate,there is a pressing demand for innovative data-driven multi-modal fusion methods.These methods aim to integrate diverse neuroimaging per-spectives to extract meaningful insights and attain a more comprehensive understanding of complex psychiatric disorders.However,analyzing each modality separately may only reveal partial insights or miss out on important correlations between different types of data.This is where data-driven multi-modal fusion techniques come into play.By combining information from multiple modalities in a synergistic manner,these methods enable us to uncover hidden patterns and relationships that would otherwise remain unnoticed.In this paper,we present an extensive overview of data-driven multimodal fusion approaches with or without prior information,with specific emphasis on canonical correlation analysis and independent component analysis.The applications of such fusion methods are wide-ranging and allow us to incorporate multiple factors such as genetics,environment,cognition,and treatment outcomes across various brain disorders.After summarizing the diverse neuropsychiatric magnetic resonance imaging fusion applications,we further discuss the emerging neuroimaging analyzing trends in big data,such as N-way multimodal fusion,deep learning approaches,and clinical translation.Overall,multimodal fusion emerges as an imperative approach providing valuable insights into the under-lying neural basis of mental disorders,which can uncover subtle abnormalities or potential biomarkers that may benefit targeted treatments and personalized medical interventions. 展开更多
关键词 multimodal fusion approach data driven functional magnetic resonance imaging(fMRI) structural MRI diffusion mag-netic resonance imaging independent component analysis canonical correlation analysis psychiatric disorder
原文传递
Multimodal Social Media Fake News Detection Based on Similarity Inference and Adversarial Networks 被引量:1
7
作者 Fangfang Shan Huifang Sun Mengyi Wang 《Computers, Materials & Continua》 SCIE EI 2024年第4期581-605,共25页
As social networks become increasingly complex, contemporary fake news often includes textual descriptionsof events accompanied by corresponding images or videos. Fake news in multiple modalities is more likely tocrea... As social networks become increasingly complex, contemporary fake news often includes textual descriptionsof events accompanied by corresponding images or videos. Fake news in multiple modalities is more likely tocreate a misleading perception among users. While early research primarily focused on text-based features forfake news detection mechanisms, there has been relatively limited exploration of learning shared representationsin multimodal (text and visual) contexts. To address these limitations, this paper introduces a multimodal modelfor detecting fake news, which relies on similarity reasoning and adversarial networks. The model employsBidirectional Encoder Representation from Transformers (BERT) and Text Convolutional Neural Network (Text-CNN) for extracting textual features while utilizing the pre-trained Visual Geometry Group 19-layer (VGG-19) toextract visual features. Subsequently, the model establishes similarity representations between the textual featuresextracted by Text-CNN and visual features through similarity learning and reasoning. Finally, these features arefused to enhance the accuracy of fake news detection, and adversarial networks have been employed to investigatethe relationship between fake news and events. This paper validates the proposed model using publicly availablemultimodal datasets from Weibo and Twitter. Experimental results demonstrate that our proposed approachachieves superior performance on Twitter, with an accuracy of 86%, surpassing traditional unimodalmodalmodelsand existing multimodal models. In contrast, the overall better performance of our model on the Weibo datasetsurpasses the benchmark models across multiple metrics. The application of similarity reasoning and adversarialnetworks in multimodal fake news detection significantly enhances detection effectiveness in this paper. However,current research is limited to the fusion of only text and image modalities. Future research directions should aimto further integrate features fromadditionalmodalities to comprehensively represent themultifaceted informationof fake news. 展开更多
关键词 Fake news detection attention mechanism image-text similarity multimodal feature fusion
下载PDF
Fusion of color and hallucinated depth features for enhanced multimodal deep learning-based damage segmentation
8
作者 Tarutal Ghosh Mondal Mohammad Reza Jahanshahi 《Earthquake Engineering and Engineering Vibration》 SCIE EI CSCD 2023年第1期55-68,共14页
Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside th... Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside the advantages,depth-sensing also presents many practical challenges.For instance,the depth sensors impose an additional payload burden on the robotic inspection platforms limiting the operation time and increasing the inspection cost.Additionally,some lidar-based depth sensors have poor outdoor performance due to sunlight contamination during the daytime.In this context,this study investigates the feasibility of abolishing depth-sensing at test time without compromising the segmentation performance.An autonomous damage segmentation framework is developed,based on recent advancements in vision-based multi-modal sensing such as modality hallucination(MH)and monocular depth estimation(MDE),which require depth data only during the model training.At the time of deployment,depth data becomes expendable as it can be simulated from the corresponding RGB frames.This makes it possible to reap the benefits of depth fusion without any depth perception per se.This study explored two different depth encoding techniques and three different fusion strategies in addition to a baseline RGB-based model.The proposed approach is validated on computer-generated RGB-D data of reinforced concrete buildings subjected to seismic damage.It was observed that the surrogate techniques can increase the segmentation IoU by up to 20.1%with a negligible increase in the computation cost.Overall,this study is believed to make a positive contribution to enhancing the resilience of critical civil infrastructure. 展开更多
关键词 multimodal data fusion depth sensing vision-based inspection UAV-assisted inspection damage segmentation post-disaster reconnaissance modality hallucination monocular depth estimation
下载PDF
Multimodal fusion of EEG and fMRI for epilepsy detection
9
作者 Xiashuang Wang Guanghong Gong Ni Li 《International Journal of Modeling, Simulation, and Scientific Computing》 EI 2018年第2期23-35,共13页
Technology of brain–computer interface(BCI)provides a new way of communication and control without language or physical action.Brain signal tracking and positioning is the basis of BCI research,while brain modeling a... Technology of brain–computer interface(BCI)provides a new way of communication and control without language or physical action.Brain signal tracking and positioning is the basis of BCI research,while brain modeling affects the treatment analysis of(EEG)and functional magnetic resonance imaging(fMRI)directly.This paper proposes human ellipsoid brain modeling method.Then,we use non-parametric spectral estimation method of time–frequency analysis to deal with simulation and real EEG of epilepsy patients,which utilizes both the high spatial and the high time resolution to improve the doctor’s diagnostic efficiency. 展开更多
关键词 Brain model time-frequency analysis medical diagnosis multimodal fusion.
原文传递
Fusion of Hash-Based Hard and Soft Biometrics for Enhancing Face Image Database Search and Retrieval
10
作者 Ameerah Abdullah Alshahrani Emad Sami Jaha Nahed Alowidi 《Computers, Materials & Continua》 SCIE EI 2023年第12期3489-3509,共21页
The utilization of digital picture search and retrieval has grown substantially in numerous fields for different purposes during the last decade,owing to the continuing advances in image processing and computer vision... The utilization of digital picture search and retrieval has grown substantially in numerous fields for different purposes during the last decade,owing to the continuing advances in image processing and computer vision approaches.In multiple real-life applications,for example,social media,content-based face picture retrieval is a well-invested technique for large-scale databases,where there is a significant necessity for reliable retrieval capabilities enabling quick search in a vast number of pictures.Humans widely employ faces for recognizing and identifying people.Thus,face recognition through formal or personal pictures is increasingly used in various real-life applications,such as helping crime investigators retrieve matching images from face image databases to identify victims and criminals.However,such face image retrieval becomes more challenging in large-scale databases,where traditional vision-based face analysis requires ample additional storage space than the raw face images already occupied to store extracted lengthy feature vectors and takes much longer to process and match thousands of face images.This work mainly contributes to enhancing face image retrieval performance in large-scale databases using hash codes inferred by locality-sensitive hashing(LSH)for facial hard and soft biometrics as(Hard BioHash)and(Soft BioHash),respectively,to be used as a search input for retrieving the top-k matching faces.Moreover,we propose the multi-biometric score-level fusion of both face hard and soft BioHashes(Hard-Soft BioHash Fusion)for further augmented face image retrieval.The experimental outcomes applied on the Labeled Faces in the Wild(LFW)dataset and the related attributes dataset(LFW-attributes),demonstrate that the retrieval performance of the suggested fusion approach(Hard-Soft BioHash Fusion)significantly improved the retrieval performance compared to solely using Hard BioHash or Soft BioHash in isolation,where the suggested method provides an augmented accuracy of 87%when executed on 1000 specimens and 77%on 5743 samples.These results remarkably outperform the results of the Hard BioHash method by(50%on the 1000 samples and 30%on the 5743 samples),and the Soft BioHash method by(78%on the 1000 samples and 63%on the 5743 samples). 展开更多
关键词 Face image retrieval soft biometrics similar pictures HASHING database search large databases score-level fusion multimodal fusion
下载PDF
Improved Weather Radar Echo Extrapolation Through Wind Speed Data Fusion Using a New Spatiotemporal Neural Network Model
11
作者 耿焕同 谢博洋 +2 位作者 葛晓燕 闵锦忠 庄潇然 《Journal of Tropical Meteorology》 SCIE 2023年第4期482-492,共11页
Weather radar echo extrapolation plays a crucial role in weather forecasting.However,traditional weather radar echo extrapolation methods are not very accurate and do not make full use of historical data.Deep learning... Weather radar echo extrapolation plays a crucial role in weather forecasting.However,traditional weather radar echo extrapolation methods are not very accurate and do not make full use of historical data.Deep learning algorithms based on Recurrent Neural Networks also have the problem of accumulating errors.Moreover,it is difficult to obtain higher accuracy by relying on a single historical radar echo observation.Therefore,in this study,we constructed the Fusion GRU module,which leverages a cascade structure to effectively combine radar echo data and mean wind data.We also designed the Top Connection so that the model can capture the global spatial relationship to construct constraints on the predictions.Based on the Jiangsu Province dataset,we compared some models.The results show that our proposed model,Cascade Fusion Spatiotemporal Network(CFSN),improved the critical success index(CSI)by 10.7%over the baseline at the threshold of 30 dBZ.Ablation experiments further validated the effectiveness of our model.Similarly,the CSI of the complete CFSN was 0.004 higher than the suboptimal solution without the cross-attention module at the threshold of 30 dBZ. 展开更多
关键词 deep learning spatiotemporal prediction radar echo extrapolation recurrent neural network multimodal fusion
下载PDF
Image fusion methods in high-speed railway scenes:A survey
12
作者 Yuqiao Zeng Xu Wang +3 位作者 Hongwei Zhao Yi Jin George A.Giannopoulos Yidong Li 《High-Speed Railway》 2023年第2期87-91,共5页
Image fusion refers to extracting meaningful information from images of different sources or modalities,and then fusing them to generate more informative images that are beneficial for subsequent applications.In recen... Image fusion refers to extracting meaningful information from images of different sources or modalities,and then fusing them to generate more informative images that are beneficial for subsequent applications.In recent years,the growing data and computing resources have promoted the development of deep learning,and image fusion technology has continued to spawn new deep learning fusion methods based on traditional fusion methods.However,high-speed railroads,as an important part of life,have their unique industry characteristics of image data,which leads to different image fusion techniques with different fusion effects in high-speed railway scenes.This research work first introduces the mainstream technology classification of image fusion,further describes the downstream tasks that image fusion techniques may combine within high-speed railway scenes,and introduces the evaluation metrics of image fusion,followed by a series of subjective and objective experiments to completely evaluate the performance level of each image fusion method in different traffic scenes,and finally provides some possible future image fusion in the field of rail transportation of research. 展开更多
关键词 Computervision Information fusion Deep learning multimodal fusion
下载PDF
A novel image fusion algorithm based on 2D scale-mixing complex wavelet transform and Bayesian MAP estimation for multimodal medical images
13
作者 Abdallah Bengueddoudj Zoubeida Messali Volodymyr Mosorov 《Journal of Innovative Optical Health Sciences》 SCIE EI CAS 2017年第3期52-68,共17页
In this paper,we propose a new image fusion algorithm based on two-dimensional Scale-Mixing Complex Wavelet Transform(2D-SMCWT).The fusion of the detail 2D-SMCWT cofficients is performed via a Bayesian Maximum a Poste... In this paper,we propose a new image fusion algorithm based on two-dimensional Scale-Mixing Complex Wavelet Transform(2D-SMCWT).The fusion of the detail 2D-SMCWT cofficients is performed via a Bayesian Maximum a Posteriori(MAP)approach by considering a trivariate statistical model for the local neighboring of 2D-SMCWT coefficients.For the approx imation coefficients,a new fusion rule based on the Principal Component Analysis(PCA)is applied.We conduct several experiments using three different groups of multimodal medical images to evaluate the performance of the proposed method.The obt ained results prove the superiority of the proposed method over the state of the art fusion methods in terms of visual quality and several commonly used metrics.Robustness of the proposed method is further tested against different types of noise.The plots of fusion met rics establish the accuracy of the proposed fusion method. 展开更多
关键词 Medical imaging multimodal medical image fusion scale mixing complex wavelet transform MAP Bayes estimation principal component analysis.
下载PDF
Fusion of Medical Images in Wavelet Domain:A Hybrid Implementation
14
作者 Satya Prakash Yadav Sachin Yadav 《Computer Modeling in Engineering & Sciences》 SCIE EI 2020年第1期303-321,共19页
This paper presents a low intricate,profoundly energy effective MRI Images combination intended for remote visual sensor frameworks which leads to improved understanding and implementation of treatment;especially for ... This paper presents a low intricate,profoundly energy effective MRI Images combination intended for remote visual sensor frameworks which leads to improved understanding and implementation of treatment;especially for radiology.This is done by combining the original picture which leads to a significant reduction in the computation time and frequency.The proposed technique conquers the calculation and energy impediment of low power tools and is examined as far as picture quality and energy is concerned.Reenactments are performed utilizing MATLAB 2018a,to quantify the resultant vitality investment funds and the reproduction results show that the proposed calculation is very quick and devours just around 1%of vitality decomposition by the hybrid combination plans.Likewise,the effortlessness of our proposed strategy makes it increasingly suitable for continuous applications. 展开更多
关键词 Medical image fusion wavelet transform DWT DCT ICA fusion techniques multimodal fusion
下载PDF
Multimodal spontaneous affect recognition using neural networks learned with hints
15
作者 张欣 吕坤 《Journal of Beijing Institute of Technology》 EI CAS 2014年第1期117-125,共9页
A multimodal fusion classifier is presented based on neural networks (NNs) learned with hints for automatic spontaneous affect recognition. In case that different channels can provide com- plementary information, fe... A multimodal fusion classifier is presented based on neural networks (NNs) learned with hints for automatic spontaneous affect recognition. In case that different channels can provide com- plementary information, features are utilized from four behavioral cues: frontal-view facial expres- sion, profile-view facial expression, shoulder movement, and vocalization (audio). NNs are used in both single cue processing and multimodal fusion. Coarse categories and quadrants in the activation- evaluation dimensional space are utilized respectively as the heuristic information (hints) of NNs during training, aiming at recognition of basic emotions. With the aid of hints, the weights in NNs could learn optimal feature groupings and the subtlety and complexity of spontaneous affective states could be better modeled. The proposed method requires low computation effort and reaches high recognition accuracy, even if the training data is insufficient. Experiment results on the Semaine nat- uralistic dataset demonstrate that our method is effective and promising. 展开更多
关键词 affect recognition multimodal fusion neural network learned with hints spontaneousaffect
下载PDF
Deep Multi-Module Based Language Priors Mitigation Model for Visual Question Answering
16
作者 于守健 金学勤 +2 位作者 吴国文 石秀金 张红 《Journal of Donghua University(English Edition)》 CAS 2023年第6期684-694,共11页
The original intention of visual question answering(VQA)models is to infer the answer based on the relevant information of the question text in the visual image,but many VQA models often yield answers that are biased ... The original intention of visual question answering(VQA)models is to infer the answer based on the relevant information of the question text in the visual image,but many VQA models often yield answers that are biased by some prior knowledge,especially the language priors.This paper proposes a mitigation model called language priors mitigation-VQA(LPM-VQA)for the language priors problem in VQA model,which divides language priors into positive and negative language priors.Different network branches are used to capture and process the different priors to achieve the purpose of mitigating language priors.A dynamically-changing language prior feedback objective function is designed with the intermediate results of some modules in the VQA model.The weight of the loss value for each answer is dynamically set according to the strength of its language priors to balance its proportion in the total VQA loss to further mitigate the language priors.This model does not depend on the baseline VQA architectures and can be configured like a plug-in to improve the performance of the model over most existing VQA models.The experimental results show that the proposed model is general and effective,achieving state-of-the-art accuracy in the VQA-CP v2 dataset. 展开更多
关键词 visual question answering(VQA) language priors natural language processing multimodal fusion computer vision
下载PDF
TACFN:Transformer-Based Adaptive Cross-Modal Fusion Network for Multimodal Emotion Recognition
17
作者 Feng Liu Ziwang Fu +1 位作者 Yunlong Wang Qijian Zheng 《CAAI Artificial Intelligence Research》 2023年第1期75-82,共8页
The fusion technique is the key to the multimodal emotion recognition task.Recently,cross-modal attention-based fusion methods have demonstrated high performance and strong robustness.However,cross-modal attention suf... The fusion technique is the key to the multimodal emotion recognition task.Recently,cross-modal attention-based fusion methods have demonstrated high performance and strong robustness.However,cross-modal attention suffers from redundant features and does not capture complementary features well.We find that it is not necessary to use the entire information of one modality to reinforce the other during cross-modal interaction,and the features that can reinforce a modality may contain only a part of it.To this end,we design an innovative Transformer-based Adaptive Cross-modal Fusion Network(TACFN).Specifically,for the redundant features,we make one modality perform intra-modal feature selection through a self-attention mechanism,so that the selected features can adaptively and efficiently interact with another modality.To better capture the complementary information between the modalities,we obtain the fused weight vector by splicing and use the weight vector to achieve feature reinforcement of the modalities.We apply TCAFN to the RAVDESS and IEMOCAP datasets.For fair comparison,we use the same unimodal representations to validate the effectiveness of the proposed fusion method.The experimental results show that TACFN brings a significant performance improvement compared to other methods and reaches the state-of-the-art performance.All code and models could be accessed from https://github.com/shuzihuaiyu/TACFN. 展开更多
关键词 multimodal emotion recognition multimodal fusion adaptive cross-modal blocks TRANSFORMER computational perception
原文传递
MCSTransWnet:A new deep learning process for postoperative corneal topography prediction based on raw multimodal data from the Pentacam HR system
18
作者 Nan Cheng Zhe Zhang +4 位作者 Jing Pan Xiao-Na Li Wei-Yi Chen Guang-Hua Zhang Wei-Hua Yang 《Medicine in Novel Technology and Devices》 2024年第1期53-63,共11页
This work provides a new multimodal fusion generative adversarial net(GAN)model,Multiple Conditions Transform W-net(MCSTransWnet),which primarily uses femtosecond laser arcuate keratotomy surgical parameters and preop... This work provides a new multimodal fusion generative adversarial net(GAN)model,Multiple Conditions Transform W-net(MCSTransWnet),which primarily uses femtosecond laser arcuate keratotomy surgical parameters and preoperative corneal topography to predict postoperative corneal topography in astigmatism-corrected patients.The MCSTransWnet model comprises a generator and a discriminator,and the generator is composed of two sub-generators.The first sub-generator extracts features using the U-net model,vision transform(ViT)and a multi-parameter conditional module branch.The second sub-generator uses a U-net network for further image denoising.The discriminator uses the pixel discriminator in Pix2Pix.Currently,most GAN models are convolutional neural networks;however,due to their feature extraction locality,it is difficult to comprehend the relationships among global features.Thus,we added a vision Transform network as the model branch to extract the global features.It is normally difficult to train the transformer,and image noise and geometric information loss are likely.Hence,we adopted the standard U-net fusion scheme and transform network as the generator,so that global features,local features,and rich image details could be obtained simultaneously.Our experimental results clearly demonstrate that MCSTransWnet successfully predicts postoperative corneal topographies(structural similarity=0.765,peak signal-to-noise ratio=16.012,and Fréchet inception distance=9.264).Using this technique to obtain the rough shape of the postoperative corneal topography in advance gives clinicians more references and guides changes to surgical planning and improves the success rate of surgery. 展开更多
关键词 Deep learning Generative adversarial networks Corneal topography Transformer W-net U-net Medical imaging multimodal fusion
原文传递
Cross-Modal Complementary Network with Hierarchical Fusion for Multimodal Sentiment Classification 被引量:5
19
作者 Cheng Peng Chunxia Zhang +3 位作者 Xiaojun Xue Jiameng Gao Hongjian Liang Zhengdong Niu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第4期664-679,共16页
Multimodal Sentiment Classification(MSC)uses multimodal data,such as images and texts,to identify the users'sentiment polarities from the information posted by users on the Internet.MSC has attracted considerable ... Multimodal Sentiment Classification(MSC)uses multimodal data,such as images and texts,to identify the users'sentiment polarities from the information posted by users on the Internet.MSC has attracted considerable attention because of its wide applications in social computing and opinion mining.However,improper correlation strategies can cause erroneous fusion as the texts and the images that are unrelated to each other may integrate.Moreover,simply concatenating them modal by modal,even with true correlation,cannot fully capture the features within and between modals.To solve these problems,this paper proposes a Cross-Modal Complementary Network(CMCN)with hierarchical fusion for MSC.The CMCN is designed as a hierarchical structure with three key modules,namely,the feature extraction module to extract features from texts and images,the feature attention module to learn both text and image attention features generated by an image-text correlation generator,and the cross-modal hierarchical fusion module to fuse features within and between modals.Such a CMCN provides a hierarchical fusion framework that can fully integrate different modal features and helps reduce the risk of integrating unrelated modal features.Extensive experimental results on three public datasets show that the proposed approach significantly outperforms the state-of-the-art methods. 展开更多
关键词 multimodal sentiment analysis multimodal fusion Cross-Modal Complementary Network(CMCN) hierarchical fusion joint optimization
原文传递
Developing a Physiological Signal-Based, Mean Threshold and Decision-Level Fusion Algorithm (PMD) for Emotion Recognition 被引量:4
20
作者 Qiuju Zhang Hongtao Zhang +1 位作者 Keming Zhou Le Zhang 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第4期673-685,共13页
With the development of computers,artificial intelligence,and cognitive science,engagement in deep communication between humans and computers has become increasingly important.Therefore,affective computing is a curren... With the development of computers,artificial intelligence,and cognitive science,engagement in deep communication between humans and computers has become increasingly important.Therefore,affective computing is a current hot research topic.Thus,this study constructs a Physiological signal-based,Mean-threshold,and Decision-level fusion algorithm(PMD)to identify human emotional states.First,we select key features from electroencephalogram and peripheral physiological signals,and use the mean-value method to obtain the classification threshold of each participant and distinguish individual differences.Then,we employ Gaussian Naive Bayes(GNB),Linear Regression(LR),Support Vector Machine(SVM),and other classification methods to perform emotion recognition.Finally,we improve the classification accuracy by developing an ensemble model.The experimental results reveal that physiological signals are more suitable for emotion recognition than classical facial and speech signals.Our proposed mean-threshold method can solve the problem of individual differences to a certain extent,and the ensemble learning model we developed significantly outperforms other classification models,such as GNB and LR. 展开更多
关键词 electroencephalogram(EEG) peripheral physiological signals machine learning emotion recognition multimodal fusion
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部