期刊文献+
共找到630篇文章
< 1 2 32 >
每页显示 20 50 100
Artificial intelligence-driven radiomics study in cancer:the role of feature engineering and modeling 被引量:1
1
作者 Yuan-Peng Zhang Xin-Yun Zhang +11 位作者 Yu-Ting Cheng Bing Li Xin-Zhi Teng Jiang Zhang Saikit Lam Ta Zhou Zong-Rui Ma Jia-Bao Sheng Victor CWTam Shara WYLee Hong Ge Jing Cai 《Military Medical Research》 SCIE CAS CSCD 2024年第1期115-147,共33页
Modern medicine is reliant on various medical imaging technologies for non-invasively observing patients’anatomy.However,the interpretation of medical images can be highly subjective and dependent on the expertise of... Modern medicine is reliant on various medical imaging technologies for non-invasively observing patients’anatomy.However,the interpretation of medical images can be highly subjective and dependent on the expertise of clinicians.Moreover,some potentially useful quantitative information in medical images,especially that which is not visible to the naked eye,is often ignored during clinical practice.In contrast,radiomics performs high-throughput feature extraction from medical images,which enables quantitative analysis of medical images and prediction of various clinical endpoints.Studies have reported that radiomics exhibits promising performance in diagnosis and predicting treatment responses and prognosis,demonstrating its potential to be a non-invasive auxiliary tool for personalized medicine.However,radiomics remains in a developmental phase as numerous technical challenges have yet to be solved,especially in feature engineering and statistical modeling.In this review,we introduce the current utility of radiomics by summarizing research on its application in the diagnosis,prognosis,and prediction of treatment responses in patients with cancer.We focus on machine learning approaches,for feature extraction and selection during feature engineering and for imbalanced datasets and multi-modality fusion during statistical modeling.Furthermore,we introduce the stability,reproducibility,and interpretability of features,and the generalizability and interpretability of models.Finally,we offer possible solutions to current challenges in radiomics research. 展开更多
关键词 Artificial intelligence Radiomics Feature extraction Feature selection Modeling INTERPRETABILITY Multimodalities Head and neck cancer
下载PDF
Intelligent Recognition Using Ultralight Multifunctional Nano‑Layered Carbon Aerogel Sensors with Human‑Like Tactile Perception 被引量:1
2
作者 Huiqi Zhao Yizheng Zhang +8 位作者 Lei Han Weiqi Qian Jiabin Wang Heting Wu Jingchen Li Yuan Dai Zhengyou Zhang Chris RBowen Ya Yang 《Nano-Micro Letters》 SCIE EI CAS CSCD 2024年第1期172-186,共15页
Humans can perceive our complex world through multi-sensory fusion.Under limited visual conditions,people can sense a variety of tactile signals to identify objects accurately and rapidly.However,replicating this uniq... Humans can perceive our complex world through multi-sensory fusion.Under limited visual conditions,people can sense a variety of tactile signals to identify objects accurately and rapidly.However,replicating this unique capability in robots remains a significant challenge.Here,we present a new form of ultralight multifunctional tactile nano-layered carbon aerogel sensor that provides pressure,temperature,material recognition and 3D location capabilities,which is combined with multimodal supervised learning algorithms for object recognition.The sensor exhibits human-like pressure(0.04–100 kPa)and temperature(21.5–66.2℃)detection,millisecond response times(11 ms),a pressure sensitivity of 92.22 kPa^(−1)and triboelectric durability of over 6000 cycles.The devised algorithm has universality and can accommodate a range of application scenarios.The tactile system can identify common foods in a kitchen scene with 94.63%accuracy and explore the topographic and geomorphic features of a Mars scene with 100%accuracy.This sensing approach empowers robots with versatile tactile perception to advance future society toward heightened sensing,recognition and intelligence. 展开更多
关键词 Multifunctional sensor Tactile perception Multimodal machine learning algorithms Universal tactile system Intelligent object recognition
下载PDF
Multimodal Social Media Fake News Detection Based on Similarity Inference and Adversarial Networks 被引量:1
3
作者 Fangfang Shan Huifang Sun Mengyi Wang 《Computers, Materials & Continua》 SCIE EI 2024年第4期581-605,共25页
As social networks become increasingly complex, contemporary fake news often includes textual descriptionsof events accompanied by corresponding images or videos. Fake news in multiple modalities is more likely tocrea... As social networks become increasingly complex, contemporary fake news often includes textual descriptionsof events accompanied by corresponding images or videos. Fake news in multiple modalities is more likely tocreate a misleading perception among users. While early research primarily focused on text-based features forfake news detection mechanisms, there has been relatively limited exploration of learning shared representationsin multimodal (text and visual) contexts. To address these limitations, this paper introduces a multimodal modelfor detecting fake news, which relies on similarity reasoning and adversarial networks. The model employsBidirectional Encoder Representation from Transformers (BERT) and Text Convolutional Neural Network (Text-CNN) for extracting textual features while utilizing the pre-trained Visual Geometry Group 19-layer (VGG-19) toextract visual features. Subsequently, the model establishes similarity representations between the textual featuresextracted by Text-CNN and visual features through similarity learning and reasoning. Finally, these features arefused to enhance the accuracy of fake news detection, and adversarial networks have been employed to investigatethe relationship between fake news and events. This paper validates the proposed model using publicly availablemultimodal datasets from Weibo and Twitter. Experimental results demonstrate that our proposed approachachieves superior performance on Twitter, with an accuracy of 86%, surpassing traditional unimodalmodalmodelsand existing multimodal models. In contrast, the overall better performance of our model on the Weibo datasetsurpasses the benchmark models across multiple metrics. The application of similarity reasoning and adversarialnetworks in multimodal fake news detection significantly enhances detection effectiveness in this paper. However,current research is limited to the fusion of only text and image modalities. Future research directions should aimto further integrate features fromadditionalmodalities to comprehensively represent themultifaceted informationof fake news. 展开更多
关键词 Fake news detection attention mechanism image-text similarity multimodal feature fusion
下载PDF
Evolution and Prospects of Foundation Models: From Large Language Models to Large Multimodal Models 被引量:1
4
作者 Zheyi Chen Liuchang Xu +5 位作者 Hongting Zheng Luyao Chen Amr Tolba Liang Zhao Keping Yu Hailin Feng 《Computers, Materials & Continua》 SCIE EI 2024年第8期1753-1808,共56页
Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the ... Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field. 展开更多
关键词 Artificial intelligence large language models large multimodal models foundation models
下载PDF
Multimode Design and Analysis of an Integrated Leg-Arm Quadruped Robot with Deployable Characteristics
5
作者 Fuqun Zhao Yifan Wu +4 位作者 Xinhua Yang Xilun Ding Kun Xu Sheng Guo Xiaodong Jin 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2024年第3期41-61,共21页
To improve locomotion and operation integration, this paper presents an integrated leg-arm quadruped robot(ILQR) that has a reconfigurable joint. First, the reconfigurable joint is designed and assembled at the end of... To improve locomotion and operation integration, this paper presents an integrated leg-arm quadruped robot(ILQR) that has a reconfigurable joint. First, the reconfigurable joint is designed and assembled at the end of the legarm chain. When the robot performs a task, reconfigurable configuration and mode switching can be achieved using this joint. In contrast from traditional quadruped robots, this robot can stack in a designated area to optimize the occupied volume in a nonworking state. Kinematics modeling and dynamics modeling are established to evaluate the mechanical properties for multiple modes. All working modes of the robot are classified, which can be defined as deployable mode, locomotion mode and operation mode. Based on the stability margin and mechanical modeling, switching analysis and evaluation between each mode is carried out. Finally, the prototype experimental results verify the function realization and switching stability of multimode and provide a design method to integrate and perform multimode for quadruped robots with deployable characteristics. 展开更多
关键词 Quadruped robot Multimode design Mode switching Locomotion Operation
下载PDF
Treatment patterns and survival outcomes in patients with nonmetastatic early-onset pancreatic cancer
6
作者 Le-Tian Zhang Ying Zhang +2 位作者 Bi-Yang Cao Chen-Chen Wu Jing Wang 《World Journal of Gastroenterology》 SCIE CAS 2024年第12期1739-1750,共12页
BACKGROUND The incidence of patients with early-onset pancreatic cancer(EOPC;age≤50 years at diagnosis)is on the rise,placing a heavy burden on individuals,families,and society.The role of combination therapy includi... BACKGROUND The incidence of patients with early-onset pancreatic cancer(EOPC;age≤50 years at diagnosis)is on the rise,placing a heavy burden on individuals,families,and society.The role of combination therapy including surgery,radiotherapy,and chemotherapy in non-metastatic EOPC is not well-defined.AIM To investigate the treatment patterns and survival outcomes in patients with non-metastatic EOPC.METHODS A total of 277 patients with non-metastatic EOPC who were treated at our institution between 2017 and 2021 were investigated retrospectively.Overall survival(OS),disease-free survival,and progression-free survival were estimated using the Kaplan-Meier method.Univariate and multivariate analyses with the Cox proportional hazards model were used to identify prognostic factors.RESULTS With a median follow-up time of 34.6 months,the 1-year,2-year,and 3-year OS rates for the entire cohort were 84.3%,51.5%,and 27.6%,respectively.The median OS of patients with localized disease who received surgery alone and adjuvant therapy(AT)were 21.2 months and 28.8 months,respectively(P=0.007).The median OS of patients with locally advanced disease who received radiotherapy-based combination therapy(RCT),surgery after neoadjuvant therapy(NAT),and chemotherapy were 28.5 months,25.6 months,and 14.0 months,respectively(P=0.002).The median OS after regional recurrence were 16.0 months,13.4 months,and 8.9 months in the RCT,chemotherapy,and supportive therapy groups,respectively(P=0.035).Multivariate analysis demonstrated that carbohydrate antigen 19-9 level,pathological grade,T-stage,N-stage,and resection were independent prognostic factors for non-metastatic EOPC.CONCLUSION AT improves postoperative survival in localized patients.Surgery after NAT and RCT are the preferred therapeutic options for patients with locally advanced EOPC. 展开更多
关键词 Pancreatic cancer EARLY-ONSET NON-METASTATIC Multimodal treatment RADIOTHERAPY Overall survival
下载PDF
Effects of multimodal microstructure on fracture toughness and its anisotropy of LPSO-type extruded Mg-1Zn-2Y alloys
7
作者 Soya Nishimoto Taiga Yasuda +1 位作者 Koji Hagihara Michiaki Yamasaki 《Journal of Magnesium and Alloys》 SCIE EI CAS CSCD 2024年第7期2952-2966,共15页
The fracture toughness of extruded Mg-1Zn-2Y(at.%)alloys,featuring a multimodal microstructure containing fine dynamically recrystallized(DRXed)grains with random crystallographic orientation and coarse-worked grains ... The fracture toughness of extruded Mg-1Zn-2Y(at.%)alloys,featuring a multimodal microstructure containing fine dynamically recrystallized(DRXed)grains with random crystallographic orientation and coarse-worked grains with a strong fiber texture,was investigated.The DRXed grains comprised randomly oriented equiaxedα-Mg grains.In contrast,the worked grains includedα-Mg and long-period stacking ordered(LPSO)phases that extended in the extrusion direction(ED).Both types displayed a strong texture,aligning the10.10direction parallel to the ED.The volume fractions of the DRXed and worked grains were controlled by adjusting the extrusion temperature.In the longitudinal-transverse(L-T)orientation,where the loading direction was aligned parallel to the ED,there was a tendency for the conditional fracture toughness,KQ,tended to increase as the volume fraction of the worked grains increased.However,the KQ values in the T-L orientation,where the loading direction was perpendicular to the ED,decreased with an increase in the volume fraction of the worked grains.This suggests strong anisotropy in the fracture toughness of the specimen with a high volume fraction of the worked grains,relative to the test direction.The worked grains,which included the LPSO phase and were elongated perpendicular to the initial crack plane,suppressed the straight crack extension,causing crack deflection,and generating secondary cracks.Thus,these worked grains significantly contributed to the fracture toughness of the extruded Mg-1Zn-2Y alloys in the L-T orientation. 展开更多
关键词 Magnesium alloy Magnesium-zinc-yttrium Long-period stacking ordered phase Multimodal microstructure Fracture toughness
下载PDF
Hybrid Operator and Strengthened Diversity Improving for Multimodal Multi-Objective Optimization
8
作者 Guoting Zhang Yonghao Du +1 位作者 Xiaobin Zhu Xiaolu Liu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2024年第5期1409-1421,共13页
Multimodal multi-objective optimization problems(MMOPs)contain multiple equivalent Pareto subsets(PSs)corresponding to a single Pareto front(PF),resulting in difficulty in maintaining promising diversities in both obj... Multimodal multi-objective optimization problems(MMOPs)contain multiple equivalent Pareto subsets(PSs)corresponding to a single Pareto front(PF),resulting in difficulty in maintaining promising diversities in both objective and decision spaces to find these PSs.Widely used to solve MMOPs,evolutionary algorithms mainly consist of evolutionary operators that generate new solutions and fitness evaluations of the solutions.To enhance performance in solving MMOPs,this paper proposes a multimodal multi-objective optimization evolutionary algorithm based on a hybrid operator and strengthened diversity improving.Specifically,a hybrid operator mechanism is devised to ensure the exploration of the decision space in the early stage and approximation to the optima in the latter stage.Moreover,an elitist-assisted differential evolution mechanism is designed for the early exploration stage.In addition,a new fitness function is proposed and used in environmental and mating selections to simultaneously evaluate diversities for PF and PSs.Experimental studies on 11 widely used benchmark instances from a test suite verify the superiority or at least competitiveness of the proposed methods compared to five state-of-the-art algorithms tailored for MMOPs. 展开更多
关键词 multimodal multi-objective optimization evolutionary algorithm hybrid operator strengthened diversity
原文传递
Multimodal fusion recognition for digital twin
9
作者 Tianzhe Zhou Xuguang Zhang +1 位作者 Bing Kang Mingkai Chen 《Digital Communications and Networks》 SCIE CSCD 2024年第2期337-346,共10页
The digital twin is the concept of transcending reality,which is the reverse feedback from the real physical space to the virtual digital space.People hold great prospects for this emerging technology.In order to real... The digital twin is the concept of transcending reality,which is the reverse feedback from the real physical space to the virtual digital space.People hold great prospects for this emerging technology.In order to realize the upgrading of the digital twin industrial chain,it is urgent to introduce more modalities,such as vision,haptics,hearing and smell,into the virtual digital space,which assists physical entities and virtual objects in creating a closer connection.Therefore,perceptual understanding and object recognition have become an urgent hot topic in the digital twin.Existing surface material classification schemes often achieve recognition through machine learning or deep learning in a single modality,ignoring the complementarity between multiple modalities.In order to overcome this dilemma,we propose a multimodal fusion network in our article that combines two modalities,visual and haptic,for surface material recognition.On the one hand,the network makes full use of the potential correlations between multiple modalities to deeply mine the modal semantics and complete the data mapping.On the other hand,the network is extensible and can be used as a universal architecture to include more modalities.Experiments show that the constructed multimodal fusion network can achieve 99.42%classification accuracy while reducing complexity. 展开更多
关键词 Digital twin Multimodal fusion Object recognition Deep learning Transfer learning
下载PDF
Design of AI-Enhanced and Hardware-Supported Multimodal E-Skin for Environmental Object Recognition and Wireless Toxic Gas Alarm
10
作者 Jianye Li Hao Wang +8 位作者 Yibing Luo Zijing Zhou He Zhang Huizhi Chen Kai Tao Chuan Liu Lingxing Zeng Fengwei Huo Jin Wu 《Nano-Micro Letters》 SCIE EI CAS CSCD 2024年第12期1-22,共22页
Post-earthquake rescue missions are full of challenges due to the unstable structure of ruins and successive aftershocks.Most of the current rescue robots lack the ability to interact with environments,leading to low ... Post-earthquake rescue missions are full of challenges due to the unstable structure of ruins and successive aftershocks.Most of the current rescue robots lack the ability to interact with environments,leading to low rescue efficiency.The multimodal electronic skin(e-skin)proposed not only reproduces the pressure,temperature,and humidity sensing capabilities of natural skin but also develops sensing functions beyond it—perceiving object proximity and NO2 gas.Its multilayer stacked structure based on Ecoflex and organohydrogel endows the e-skin with mechanical properties similar to natural skin.Rescue robots integrated with multimodal e-skin and artificial intelligence(AI)algorithms show strong environmental perception capabilities and can accurately distinguish objects and identify human limbs through grasping,laying the foundation for automated post-earthquake rescue.Besides,the combination of e-skin and NO2 wireless alarm circuits allows robots to sense toxic gases in the environment in real time,thereby adopting appropriate measures to protect trapped people from the toxic environment.Multimodal e-skin powered by AI algorithms and hardware circuits exhibits powerful environmental perception and information processing capabilities,which,as an interface for interaction with the physical world,dramatically expands intelligent robots’application scenarios. 展开更多
关键词 Stretchable hydrogel sensors Multimodal e-skin Artificial intelligence Post-earthquake rescue Wireless toxic gas alarm
下载PDF
Multiobjective Differential Evolution for Higher-Dimensional Multimodal Multiobjective Optimization
11
作者 Jing Liang Hongyu Lin +2 位作者 Caitong Yue Ponnuthurai Nagaratnam Suganthan Yaonan Wang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第6期1458-1475,共18页
In multimodal multiobjective optimization problems(MMOPs),there are several Pareto optimal solutions corre-sponding to the identical objective vector.This paper proposes a new differential evolution algorithm to solve... In multimodal multiobjective optimization problems(MMOPs),there are several Pareto optimal solutions corre-sponding to the identical objective vector.This paper proposes a new differential evolution algorithm to solve MMOPs with higher-dimensional decision variables.Due to the increase in the dimensions of decision variables in real-world MMOPs,it is diffi-cult for current multimodal multiobjective optimization evolu-tionary algorithms(MMOEAs)to find multiple Pareto optimal solutions.The proposed algorithm adopts a dual-population framework and an improved environmental selection method.It utilizes a convergence archive to help the first population improve the quality of solutions.The improved environmental selection method enables the other population to search the remaining decision space and reserve more Pareto optimal solutions through the information of the first population.The combination of these two strategies helps to effectively balance and enhance conver-gence and diversity performance.In addition,to study the per-formance of the proposed algorithm,a novel set of multimodal multiobjective optimization test functions with extensible decision variables is designed.The proposed MMOEA is certified to be effective through comparison with six state-of-the-art MMOEAs on the test functions. 展开更多
关键词 Benchmark functions diversity measure evolution-ary algorithms multimodal multiobjective optimization.
下载PDF
Conditional selection with CNN augmented transformer for multimodal affective analysis
12
作者 Jianwen Wang Shiping Wang +3 位作者 Shunxin Xiao Renjie Lin Mianxiong Dong Wenzhong Guo 《CAAI Transactions on Intelligence Technology》 SCIE EI 2024年第4期917-931,共15页
Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information.... Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information. One is to generate sparse attention coefficients associated with acoustic and visual modalities, which helps locate critical emotional se-mantics. The other is fusing complementary cross‐modal representation to construct optimal salient feature combinations of multiple modalities. A Conditional Transformer Fusion Network is proposed to handle these problems. Firstly, the authors equip the transformer module with CNN layers to enhance the detection of subtle signal patterns in nonverbal sequences. Secondly, sentiment words are utilised as context conditions to guide the computation of cross‐modal attention. As a result, the located nonverbal fea-tures are not only salient but also complementary to sentiment words directly. Experi-mental results show that the authors’ method achieves state‐of‐the‐art performance on several multimodal affective analysis datasets. 展开更多
关键词 affective computing data fusion information fusion multimodal approaches
下载PDF
Multimodal Machine Learning Guides Low Carbon Aeration Strategies in Urban Wastewater Treatment
13
作者 Hong-Cheng Wang Yu-Qi Wang +4 位作者 Xu Wang Wan-Xin Yin Ting-Chao Yu Chen-Hao Xue Ai-Jie Wang 《Engineering》 SCIE EI CAS CSCD 2024年第5期51-62,共12页
The potential for reducing greenhouse gas(GHG)emissions and energy consumption in wastewater treatment can be realized through intelligent control,with machine learning(ML)and multimodality emerging as a promising sol... The potential for reducing greenhouse gas(GHG)emissions and energy consumption in wastewater treatment can be realized through intelligent control,with machine learning(ML)and multimodality emerging as a promising solution.Here,we introduce an ML technique based on multimodal strategies,focusing specifically on intelligent aeration control in wastewater treatment plants(WWTPs).The generalization of the multimodal strategy is demonstrated on eight ML models.The results demonstrate that this multimodal strategy significantly enhances model indicators for ML in environmental science and the efficiency of aeration control,exhibiting exceptional performance and interpretability.Integrating random forest with visual models achieves the highest accuracy in forecasting aeration quantity in multimodal models,with a mean absolute percentage error of 4.4%and a coefficient of determination of 0.948.Practical testing in a full-scale plant reveals that the multimodal model can reduce operation costs by 19.8%compared to traditional fuzzy control methods.The potential application of these strategies in critical water science domains is discussed.To foster accessibility and promote widespread adoption,the multimodal ML models are freely available on GitHub,thereby eliminating technical barriers and encouraging the application of artificial intelligence in urban wastewater treatment. 展开更多
关键词 Wastewater treatment Multimodal machine learning Deep learning Aeration control Interpretable machine learning
下载PDF
Audio-visual keyword transformer for unconstrained sentence-level keyword spotting
14
作者 Yidi Li Jiale Ren +3 位作者 Yawei Wang Guoquan Wang Xia Li Hong Liu 《CAAI Transactions on Intelligence Technology》 SCIE EI 2024年第1期142-152,共11页
As one of the most effective methods to improve the accuracy and robustness of speech tasks,the audio-visual fusion approach has recently been introduced into the field of Keyword Spotting(KWS).However,existing audio-... As one of the most effective methods to improve the accuracy and robustness of speech tasks,the audio-visual fusion approach has recently been introduced into the field of Keyword Spotting(KWS).However,existing audio-visual keyword spotting models are limited to detecting isolated words,while keyword spotting for unconstrained speech is still a challenging problem.To this end,an Audio-Visual Keyword Transformer(AVKT)network is proposed to spot keywords in unconstrained video clips.The authors present a transformer classifier with learnable CLS tokens to extract distinctive keyword features from the variable-length audio and visual inputs.The outputs of audio and visual branches are combined in a decision fusion module.As humans can easily notice whether a keyword appears in a sentence or not,our AVKT network can detect whether a video clip with a spoken sentence contains a pre-specified keyword.Moreover,the position of the keyword is localised in the attention map without additional position labels.Exper-imental results on the LRS2-KWS dataset and our newly collected PKU-KWS dataset show that the accuracy of AVKT exceeded 99%in clean scenes and 85%in extremely noisy conditions.The code is available at https://github.com/jialeren/AVKT. 展开更多
关键词 artificial intelligence multimodal approaches natural language processing neural network speech processing
下载PDF
Efficient User Identity Linkage Based on Aligned Multimodal Features and Temporal Correlation
15
作者 Jiaqi Gao Kangfeng Zheng +2 位作者 Xiujuan Wang Chunhua Wu Bin Wu 《Computers, Materials & Continua》 SCIE EI 2024年第10期251-270,共20页
User identity linkage(UIL)refers to identifying user accounts belonging to the same identity across different social media platforms.Most of the current research is based on text analysis,which fails to fully explore ... User identity linkage(UIL)refers to identifying user accounts belonging to the same identity across different social media platforms.Most of the current research is based on text analysis,which fails to fully explore the rich image resources generated by users,and the existing attempts touch on the multimodal domain,but still face the challenge of semantic differences between text and images.Given this,we investigate the UIL task across different social media platforms based on multimodal user-generated contents(UGCs).We innovatively introduce the efficient user identity linkage via aligned multi-modal features and temporal correlation(EUIL)approach.The method first generates captions for user-posted images with the BLIP model,alleviating the problem of missing textual information.Subsequently,we extract aligned text and image features with the CLIP model,which closely aligns the two modalities and significantly reduces the semantic gap.Accordingly,we construct a set of adapter modules to integrate the multimodal features.Furthermore,we design a temporal weight assignment mechanism to incorporate the temporal dimension of user behavior.We evaluate the proposed scheme on the real-world social dataset TWIN,and the results show that our method reaches 86.39%accuracy,which demonstrates the excellence in handling multimodal data,and provides strong algorithmic support for UIL. 展开更多
关键词 User identity linkage multimodal models attention mechanism temporal correlation
下载PDF
A Robust Framework for Multimodal Sentiment Analysis with Noisy Labels Generated from Distributed Data Annotation
16
作者 Kai Jiang Bin Cao Jing Fan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第6期2965-2984,共20页
Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and sha... Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines. 展开更多
关键词 Distributed data collection multimodal sentiment analysis meta learning learn with noisy labels
下载PDF
Enhancing Cross-Lingual Image Description: A Multimodal Approach for Semantic Relevance and Stylistic Alignment
17
作者 Emran Al-Buraihy Dan Wang 《Computers, Materials & Continua》 SCIE EI 2024年第6期3913-3938,共26页
Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural net... Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural network models and semantic matching techniques.Experiments conducted on the Flickr8k and AraImg2k benchmark datasets,featuring images and descriptions in English and Arabic,showcase remarkable performance improvements over state-of-the-art methods.Our model,equipped with the Image&Cross-Language Semantic Matching module and the Target Language Domain Evaluation module,significantly enhances the semantic relevance of generated image descriptions.For English-to-Arabic and Arabic-to-English cross-language image descriptions,our approach achieves a CIDEr score for English and Arabic of 87.9%and 81.7%,respectively,emphasizing the substantial contributions of our methodology.Comparative analyses with previous works further affirm the superior performance of our approach,and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language.In summary,this study advances the field of cross-lingual image description,offering an effective solution for generating image captions across languages,with the potential to impact multilingual communication and accessibility.Future research directions include expanding to more languages and incorporating diverse visual and textual data sources. 展开更多
关键词 Cross-language image description multimodal deep learning semantic matching reward mechanisms
下载PDF
Audio-Text Multimodal Speech Recognition via Dual-Tower Architecture for Mandarin Air Traffic Control Communications
18
作者 Shuting Ge Jin Ren +3 位作者 Yihua Shi Yujun Zhang Shunzhi Yang Jinfeng Yang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3215-3245,共31页
In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p... In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management. 展开更多
关键词 Speech-text multimodal automatic speech recognition semantic alignment air traffic control communications dual-tower architecture
下载PDF
A deep multimodal fusion and multitasking trajectory prediction model for typhoon trajectory prediction to reduce flight scheduling cancellation
19
作者 TANG Jun QIN Wanting +1 位作者 PAN Qingtao LAO Songyang 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第3期666-678,共13页
Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon... Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon seasons appears and continues,airlines operating in threatened areas and passengers having travel plans during this time period will pay close attention to the development of tropical storms.This paper proposes a deep multimodal fusion and multitasking trajectory prediction model that can improve the reliability of typhoon trajectory prediction and reduce the quantity of flight scheduling cancellation.The deep multimodal fusion module is formed by deep fusion of the feature output by multiple submodal fusion modules,and the multitask generation module uses longitude and latitude as two related tasks for simultaneous prediction.With more dependable data accuracy,problems can be analysed rapidly and more efficiently,enabling better decision-making with a proactive versus reactive posture.When multiple modalities coexist,features can be extracted from them simultaneously to supplement each other’s information.An actual case study,the typhoon Lichma that swept China in 2019,has demonstrated that the algorithm can effectively reduce the number of unnecessary flight cancellations compared to existing flight scheduling and assist the new generation of flight scheduling systems under extreme weather. 展开更多
关键词 flight scheduling optimization deep multimodal fusion multitasking trajectory prediction typhoon weather flight cancellation prediction reliability
下载PDF
FusionNN:A Semantic Feature Fusion Model Based on Multimodal for Web Anomaly Detection
20
作者 Li Wang Mingshan Xia +3 位作者 Hao Hu Jianfang Li Fengyao Hou Gang Chen 《Computers, Materials & Continua》 SCIE EI 2024年第5期2991-3006,共16页
With the rapid development of the mobile communication and the Internet,the previous web anomaly detectionand identificationmodels were built relying on security experts’empirical knowledge and attack features.Althou... With the rapid development of the mobile communication and the Internet,the previous web anomaly detectionand identificationmodels were built relying on security experts’empirical knowledge and attack features.Althoughthis approach can achieve higher detection performance,it requires huge human labor and resources to maintainthe feature library.In contrast,semantic feature engineering can dynamically discover new semantic featuresand optimize feature selection by automatically analyzing the semantic information contained in the data itself,thus reducing dependence on prior knowledge.However,current semantic features still have the problem ofsemantic expression singularity,as they are extracted from a single semantic mode such as word segmentation,character segmentation,or arbitrary semantic feature extraction.This paper extracts features of web requestsfrom dual semantic granularity,and proposes a semantic feature fusion method to solve the above problems.Themethod first preprocesses web requests,and extracts word-level and character-level semantic features of URLs viaconvolutional neural network(CNN),respectively.By constructing three loss functions to reduce losses betweenfeatures,labels and categories.Experiments on the HTTP CSIC 2010,Malicious URLs and HttpParams datasetsverify the proposedmethod.Results show that compared withmachine learning,deep learningmethods and BERTmodel,the proposed method has better detection performance.And it achieved the best detection rate of 99.16%in the dataset HttpParams. 展开更多
关键词 Feature fusion web anomaly detection MULTIMODAL convolutional neural network(CNN) semantic feature extraction
下载PDF
上一页 1 2 32 下一页 到第
使用帮助 返回顶部