期刊文献+
共找到536篇文章
< 1 2 27 >
每页显示 20 50 100
Multimode Design and Analysis of an Integrated Leg-Arm Quadruped Robot with Deployable Characteristics
1
作者 Fuqun Zhao Yifan Wu +4 位作者 Xinhua Yang Xilun Ding Kun Xu Sheng Guo Xiaodong Jin 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2024年第3期41-61,共21页
To improve locomotion and operation integration, this paper presents an integrated leg-arm quadruped robot(ILQR) that has a reconfigurable joint. First, the reconfigurable joint is designed and assembled at the end of... To improve locomotion and operation integration, this paper presents an integrated leg-arm quadruped robot(ILQR) that has a reconfigurable joint. First, the reconfigurable joint is designed and assembled at the end of the legarm chain. When the robot performs a task, reconfigurable configuration and mode switching can be achieved using this joint. In contrast from traditional quadruped robots, this robot can stack in a designated area to optimize the occupied volume in a nonworking state. Kinematics modeling and dynamics modeling are established to evaluate the mechanical properties for multiple modes. All working modes of the robot are classified, which can be defined as deployable mode, locomotion mode and operation mode. Based on the stability margin and mechanical modeling, switching analysis and evaluation between each mode is carried out. Finally, the prototype experimental results verify the function realization and switching stability of multimode and provide a design method to integrate and perform multimode for quadruped robots with deployable characteristics. 展开更多
关键词 Quadruped robot multimode design Mode switching Locomotion Operation
下载PDF
Optical scanning endoscope via a single multimode optical fiber
2
作者 Guangxing Wu Runze Zhu +2 位作者 Yanqing Lu Minghui Hong Fei Xu 《Opto-Electronic Science》 2024年第3期1-32,共32页
Optical endoscopy has become an essential diagnostic and therapeutic approach in modern biomedicine for directly observing organs and tissues deep inside the human body,enabling non-invasive,rapid diagnosis and treatm... Optical endoscopy has become an essential diagnostic and therapeutic approach in modern biomedicine for directly observing organs and tissues deep inside the human body,enabling non-invasive,rapid diagnosis and treatment.Optical fiber endoscopy is highly competitive among various endoscopic imaging techniques due to its high flexibility,compact structure,excellent resolution,and resistance to electromagnetic interference.Over the past decade,endoscopes based on a single multimode optical fiber(MMF)have attracted widespread research interest due to their potential to significantly reduce the footprint of optical fiber endoscopes and enhance imaging capabilities.In comparison with other imaging principles of MMF endoscopes,the scanning imaging method based on the wavefront shaping technique is highly developed and provides benefits including excellent imaging contrast,broad applicability to complex imaging scenarios,and good compatibility with various well-established scanning imaging modalities.In this review,various technical routes to achieve light focusing through MMF and procedures to conduct the scanning imaging of MMF endoscopes are introduced.The advancements in imaging performance enhancements,integrations of various imaging modalities with MMF scanning endoscopes,and applications are summarized.Challenges specific to this endoscopic imaging technology are analyzed,and potential remedies and avenues for future developments are discussed. 展开更多
关键词 multimode optical fiber ENDOSCOPE scanning imaging FOCUSING wavefront shaping
下载PDF
Storage and Parallel Loading System Based on Mode Network for Multimode Medical Image Data
3
作者 Xiao Zhai Haiwei Pan +2 位作者 Xiaoqin Xie Zhiqiang Zhang Qilong Han 《国际计算机前沿大会会议论文集》 2016年第2期61-62,共2页
Since Multimode data is composed of many modes and their complex relationships,it cannot be retrieved or mined effectively by utilizing traditional analysis and processing techniques for single mode data.To address th... Since Multimode data is composed of many modes and their complex relationships,it cannot be retrieved or mined effectively by utilizing traditional analysis and processing techniques for single mode data.To address the challenges,we design and implement a graph-based storage and parallel loading system aimed at multimode medical image data.The system is a framework designed to flexibly store and rapidly load these multimode data.Specifically,the system utilizes the Mode Network to model the modes and their relationships in multimode medical image data and the graph database to store the data with a parallel loading technique. 展开更多
关键词 multimode MEDICAL image data MODE NETWORK GRAPH database PARALLEL loading
下载PDF
Electrode design for multimode suppression of aluminum nitride tuning fork resonators
4
作者 Yi Yuan Qingrui Yang +6 位作者 Haolin Li Shuai Shi Pengfei Niu Chongling Sun Bohua Liu Menglun Zhang Wei Pang 《Nanotechnology and Precision Engineering》 EI CAS CSCD 2023年第4期11-21,共11页
This paper is focused on electrode design for piezoelectric tuning fork resonators.The relationship between the performance and electrode pattern of aluminum nitride piezoelectric tuning fork resonators vibrating in t... This paper is focused on electrode design for piezoelectric tuning fork resonators.The relationship between the performance and electrode pattern of aluminum nitride piezoelectric tuning fork resonators vibrating in the in-plane flexural mode is investigated based on a set of resonators with different electrode lengths,widths,and ratios.Experimental and simulation results show that the electrode design impacts greatly the multimode effect induced from torsional modes but has little influence on other loss mechanisms.Optimizing the electrode design suppresses the torsional mode successfully,thereby increasing the ratio of impedance at parallel and series resonant frequencies(R_(p)/R_(s))by more than 80%and achieving a quality factor(Q)of 7753,an effective electromechanical coupling coefficient(kt_(eff)^(2))of 0.066%,and an impedance at series resonant frequency(R_(m))of 23.6 kΩ.The proposed approach shows great potential for high-performance piezoelectric resonators,which are likely to be fundamental building blocks for sensors with high sensitivity and low noise and power consumption. 展开更多
关键词 Tuning fork multimode AlN-based resonator Microelectromechanical systems
下载PDF
A Novel Fusion System Based on Iris and Ear Biometrics for E-exams
5
作者 S.A.Shaban Hosnia M.M.Ahmed D.L.Elsheweikh 《Intelligent Automation & Soft Computing》 SCIE 2023年第3期3295-3315,共21页
With the rapid spread of the coronavirus epidemic all over the world,educational and other institutions are heading towards digitization.In the era of digitization,identifying educational e-platform users using ear an... With the rapid spread of the coronavirus epidemic all over the world,educational and other institutions are heading towards digitization.In the era of digitization,identifying educational e-platform users using ear and iris based multi-modal biometric systems constitutes an urgent and interesting research topic to pre-serve enterprise security,particularly with wearing a face mask as a precaution against the new coronavirus epidemic.This study proposes a multimodal system based on ear and iris biometrics at the feature fusion level to identify students in electronic examinations(E-exams)during the COVID-19 pandemic.The proposed system comprises four steps.Thefirst step is image preprocessing,which includes enhancing,segmenting,and extracting the regions of interest.The second step is feature extraction,where the Haralick texture and shape methods are used to extract the features of ear images,whereas Tamura texture and color histogram methods are used to extract the features of iris images.The third step is feature fusion,where the extracted features of the ear and iris images are combined into one sequential fused vector.The fourth step is the matching,which is executed using the City Block Dis-tance(CTB)for student identification.Thefindings of the study indicate that the system’s recognition accuracy is 97%,with a 2%False Acceptance Rate(FAR),a 4%False Rejection Rate(FRR),a 94%Correct Recognition Rate(CRR),and a 96%Genuine Acceptance Rate(GAR).In addition,the proposed recognition sys-tem achieved higher accuracy than other related systems. 展开更多
关键词 City block distance(CTB) Covid-19 ear biometric e-exams feature-level fusion iris biometric multimodal biometric student’s identity
下载PDF
A Study of Multimodal Intelligent Adaptive Learning System and Its Pattern of Promoting Learners’Online Learning Engagement
6
作者 ZHANG Chao SHI Qing TONG Mingwen 《Psychology Research》 2023年第5期202-206,共5页
As the field of artificial intelligence continues to evolve,so too does the application of multimodal learning analysis and intelligent adaptive learning systems.This trend has the potential to promote the equalizatio... As the field of artificial intelligence continues to evolve,so too does the application of multimodal learning analysis and intelligent adaptive learning systems.This trend has the potential to promote the equalization of educational resources,the intellectualization of educational methods,and the modernization of educational reform,among other benefits.This study proposes a construction framework for an intelligent adaptive learning system that is supported by multimodal data.It provides a detailed explanation of the system’s working principles and patterns,which aim to enhance learners’online engagement in behavior,emotion,and cognition.The study seeks to address the issue of intelligent adaptive learning systems diagnosing learners’learning behavior based solely on learning achievement,to improve learners’online engagement,enable them to master more required knowledge,and ultimately achieve better learning outcomes. 展开更多
关键词 MULTIMODAL intelligent adaptive learning system online learning engagement
下载PDF
Artificial intelligence-driven radiomics study in cancer:the role of feature engineering and modeling 被引量:1
7
作者 Yuan-Peng Zhang Xin-Yun Zhang +11 位作者 Yu-Ting Cheng Bing Li Xin-Zhi Teng Jiang Zhang Saikit Lam Ta Zhou Zong-Rui Ma Jia-Bao Sheng Victor CWTam Shara WYLee Hong Ge Jing Cai 《Military Medical Research》 SCIE CAS CSCD 2024年第1期115-147,共33页
Modern medicine is reliant on various medical imaging technologies for non-invasively observing patients’anatomy.However,the interpretation of medical images can be highly subjective and dependent on the expertise of... Modern medicine is reliant on various medical imaging technologies for non-invasively observing patients’anatomy.However,the interpretation of medical images can be highly subjective and dependent on the expertise of clinicians.Moreover,some potentially useful quantitative information in medical images,especially that which is not visible to the naked eye,is often ignored during clinical practice.In contrast,radiomics performs high-throughput feature extraction from medical images,which enables quantitative analysis of medical images and prediction of various clinical endpoints.Studies have reported that radiomics exhibits promising performance in diagnosis and predicting treatment responses and prognosis,demonstrating its potential to be a non-invasive auxiliary tool for personalized medicine.However,radiomics remains in a developmental phase as numerous technical challenges have yet to be solved,especially in feature engineering and statistical modeling.In this review,we introduce the current utility of radiomics by summarizing research on its application in the diagnosis,prognosis,and prediction of treatment responses in patients with cancer.We focus on machine learning approaches,for feature extraction and selection during feature engineering and for imbalanced datasets and multi-modality fusion during statistical modeling.Furthermore,we introduce the stability,reproducibility,and interpretability of features,and the generalizability and interpretability of models.Finally,we offer possible solutions to current challenges in radiomics research. 展开更多
关键词 Artificial intelligence Radiomics Feature extraction Feature selection Modeling INTERPRETABILITY Multimodalities Head and neck cancer
下载PDF
Treatment patterns and survival outcomes in patients with nonmetastatic early-onset pancreatic cancer
8
作者 Le-Tian Zhang Ying Zhang +2 位作者 Bi-Yang Cao Chen-Chen Wu Jing Wang 《World Journal of Gastroenterology》 SCIE CAS 2024年第12期1739-1750,共12页
BACKGROUND The incidence of patients with early-onset pancreatic cancer(EOPC;age≤50 years at diagnosis)is on the rise,placing a heavy burden on individuals,families,and society.The role of combination therapy includi... BACKGROUND The incidence of patients with early-onset pancreatic cancer(EOPC;age≤50 years at diagnosis)is on the rise,placing a heavy burden on individuals,families,and society.The role of combination therapy including surgery,radiotherapy,and chemotherapy in non-metastatic EOPC is not well-defined.AIM To investigate the treatment patterns and survival outcomes in patients with non-metastatic EOPC.METHODS A total of 277 patients with non-metastatic EOPC who were treated at our institution between 2017 and 2021 were investigated retrospectively.Overall survival(OS),disease-free survival,and progression-free survival were estimated using the Kaplan-Meier method.Univariate and multivariate analyses with the Cox proportional hazards model were used to identify prognostic factors.RESULTS With a median follow-up time of 34.6 months,the 1-year,2-year,and 3-year OS rates for the entire cohort were 84.3%,51.5%,and 27.6%,respectively.The median OS of patients with localized disease who received surgery alone and adjuvant therapy(AT)were 21.2 months and 28.8 months,respectively(P=0.007).The median OS of patients with locally advanced disease who received radiotherapy-based combination therapy(RCT),surgery after neoadjuvant therapy(NAT),and chemotherapy were 28.5 months,25.6 months,and 14.0 months,respectively(P=0.002).The median OS after regional recurrence were 16.0 months,13.4 months,and 8.9 months in the RCT,chemotherapy,and supportive therapy groups,respectively(P=0.035).Multivariate analysis demonstrated that carbohydrate antigen 19-9 level,pathological grade,T-stage,N-stage,and resection were independent prognostic factors for non-metastatic EOPC.CONCLUSION AT improves postoperative survival in localized patients.Surgery after NAT and RCT are the preferred therapeutic options for patients with locally advanced EOPC. 展开更多
关键词 Pancreatic cancer EARLY-ONSET NON-METASTATIC Multimodal treatment RADIOTHERAPY Overall survival
下载PDF
Multiobjective Differential Evolution for Higher-Dimensional Multimodal Multiobjective Optimization
9
作者 Jing Liang Hongyu Lin +2 位作者 Caitong Yue Ponnuthurai Nagaratnam Suganthan Yaonan Wang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第6期1458-1475,共18页
In multimodal multiobjective optimization problems(MMOPs),there are several Pareto optimal solutions corre-sponding to the identical objective vector.This paper proposes a new differential evolution algorithm to solve... In multimodal multiobjective optimization problems(MMOPs),there are several Pareto optimal solutions corre-sponding to the identical objective vector.This paper proposes a new differential evolution algorithm to solve MMOPs with higher-dimensional decision variables.Due to the increase in the dimensions of decision variables in real-world MMOPs,it is diffi-cult for current multimodal multiobjective optimization evolu-tionary algorithms(MMOEAs)to find multiple Pareto optimal solutions.The proposed algorithm adopts a dual-population framework and an improved environmental selection method.It utilizes a convergence archive to help the first population improve the quality of solutions.The improved environmental selection method enables the other population to search the remaining decision space and reserve more Pareto optimal solutions through the information of the first population.The combination of these two strategies helps to effectively balance and enhance conver-gence and diversity performance.In addition,to study the per-formance of the proposed algorithm,a novel set of multimodal multiobjective optimization test functions with extensible decision variables is designed.The proposed MMOEA is certified to be effective through comparison with six state-of-the-art MMOEAs on the test functions. 展开更多
关键词 Benchmark functions diversity measure evolution-ary algorithms multimodal multiobjective optimization.
下载PDF
Multimodal Machine Learning Guides Low Carbon Aeration Strategies in Urban Wastewater Treatment
10
作者 Hong-Cheng Wang Yu-Qi Wang +4 位作者 Xu Wang Wan-Xin Yin Ting-Chao Yu Chen-Hao Xue Ai-Jie Wang 《Engineering》 SCIE EI CAS CSCD 2024年第5期51-62,共12页
The potential for reducing greenhouse gas(GHG)emissions and energy consumption in wastewater treatment can be realized through intelligent control,with machine learning(ML)and multimodality emerging as a promising sol... The potential for reducing greenhouse gas(GHG)emissions and energy consumption in wastewater treatment can be realized through intelligent control,with machine learning(ML)and multimodality emerging as a promising solution.Here,we introduce an ML technique based on multimodal strategies,focusing specifically on intelligent aeration control in wastewater treatment plants(WWTPs).The generalization of the multimodal strategy is demonstrated on eight ML models.The results demonstrate that this multimodal strategy significantly enhances model indicators for ML in environmental science and the efficiency of aeration control,exhibiting exceptional performance and interpretability.Integrating random forest with visual models achieves the highest accuracy in forecasting aeration quantity in multimodal models,with a mean absolute percentage error of 4.4%and a coefficient of determination of 0.948.Practical testing in a full-scale plant reveals that the multimodal model can reduce operation costs by 19.8%compared to traditional fuzzy control methods.The potential application of these strategies in critical water science domains is discussed.To foster accessibility and promote widespread adoption,the multimodal ML models are freely available on GitHub,thereby eliminating technical barriers and encouraging the application of artificial intelligence in urban wastewater treatment. 展开更多
关键词 Wastewater treatment Multimodal machine learning Deep learning Aeration control Interpretable machine learning
下载PDF
Intelligent Recognition Using Ultralight Multifunctional Nano‑Layered Carbon Aerogel Sensors with Human‑Like Tactile Perception
11
作者 Huiqi Zhao Yizheng Zhang +8 位作者 Lei Han Weiqi Qian Jiabin Wang Heting Wu Jingchen Li Yuan Dai Zhengyou Zhang Chris RBowen Ya Yang 《Nano-Micro Letters》 SCIE EI CAS CSCD 2024年第1期172-186,共15页
Humans can perceive our complex world through multi-sensory fusion.Under limited visual conditions,people can sense a variety of tactile signals to identify objects accurately and rapidly.However,replicating this uniq... Humans can perceive our complex world through multi-sensory fusion.Under limited visual conditions,people can sense a variety of tactile signals to identify objects accurately and rapidly.However,replicating this unique capability in robots remains a significant challenge.Here,we present a new form of ultralight multifunctional tactile nano-layered carbon aerogel sensor that provides pressure,temperature,material recognition and 3D location capabilities,which is combined with multimodal supervised learning algorithms for object recognition.The sensor exhibits human-like pressure(0.04–100 kPa)and temperature(21.5–66.2℃)detection,millisecond response times(11 ms),a pressure sensitivity of 92.22 kPa^(−1)and triboelectric durability of over 6000 cycles.The devised algorithm has universality and can accommodate a range of application scenarios.The tactile system can identify common foods in a kitchen scene with 94.63%accuracy and explore the topographic and geomorphic features of a Mars scene with 100%accuracy.This sensing approach empowers robots with versatile tactile perception to advance future society toward heightened sensing,recognition and intelligence. 展开更多
关键词 Multifunctional sensor Tactile perception Multimodal machine learning algorithms Universal tactile system Intelligent object recognition
下载PDF
Multimodal Social Media Fake News Detection Based on Similarity Inference and Adversarial Networks
12
作者 Fangfang Shan Huifang Sun Mengyi Wang 《Computers, Materials & Continua》 SCIE EI 2024年第4期581-605,共25页
As social networks become increasingly complex, contemporary fake news often includes textual descriptionsof events accompanied by corresponding images or videos. Fake news in multiple modalities is more likely tocrea... As social networks become increasingly complex, contemporary fake news often includes textual descriptionsof events accompanied by corresponding images or videos. Fake news in multiple modalities is more likely tocreate a misleading perception among users. While early research primarily focused on text-based features forfake news detection mechanisms, there has been relatively limited exploration of learning shared representationsin multimodal (text and visual) contexts. To address these limitations, this paper introduces a multimodal modelfor detecting fake news, which relies on similarity reasoning and adversarial networks. The model employsBidirectional Encoder Representation from Transformers (BERT) and Text Convolutional Neural Network (Text-CNN) for extracting textual features while utilizing the pre-trained Visual Geometry Group 19-layer (VGG-19) toextract visual features. Subsequently, the model establishes similarity representations between the textual featuresextracted by Text-CNN and visual features through similarity learning and reasoning. Finally, these features arefused to enhance the accuracy of fake news detection, and adversarial networks have been employed to investigatethe relationship between fake news and events. This paper validates the proposed model using publicly availablemultimodal datasets from Weibo and Twitter. Experimental results demonstrate that our proposed approachachieves superior performance on Twitter, with an accuracy of 86%, surpassing traditional unimodalmodalmodelsand existing multimodal models. In contrast, the overall better performance of our model on the Weibo datasetsurpasses the benchmark models across multiple metrics. The application of similarity reasoning and adversarialnetworks in multimodal fake news detection significantly enhances detection effectiveness in this paper. However,current research is limited to the fusion of only text and image modalities. Future research directions should aimto further integrate features fromadditionalmodalities to comprehensively represent themultifaceted informationof fake news. 展开更多
关键词 Fake news detection attention mechanism image-text similarity multimodal feature fusion
下载PDF
Audio-visual keyword transformer for unconstrained sentence-level keyword spotting
13
作者 Yidi Li Jiale Ren +3 位作者 Yawei Wang Guoquan Wang Xia Li Hong Liu 《CAAI Transactions on Intelligence Technology》 SCIE EI 2024年第1期142-152,共11页
As one of the most effective methods to improve the accuracy and robustness of speech tasks,the audio-visual fusion approach has recently been introduced into the field of Keyword Spotting(KWS).However,existing audio-... As one of the most effective methods to improve the accuracy and robustness of speech tasks,the audio-visual fusion approach has recently been introduced into the field of Keyword Spotting(KWS).However,existing audio-visual keyword spotting models are limited to detecting isolated words,while keyword spotting for unconstrained speech is still a challenging problem.To this end,an Audio-Visual Keyword Transformer(AVKT)network is proposed to spot keywords in unconstrained video clips.The authors present a transformer classifier with learnable CLS tokens to extract distinctive keyword features from the variable-length audio and visual inputs.The outputs of audio and visual branches are combined in a decision fusion module.As humans can easily notice whether a keyword appears in a sentence or not,our AVKT network can detect whether a video clip with a spoken sentence contains a pre-specified keyword.Moreover,the position of the keyword is localised in the attention map without additional position labels.Exper-imental results on the LRS2-KWS dataset and our newly collected PKU-KWS dataset show that the accuracy of AVKT exceeded 99%in clean scenes and 85%in extremely noisy conditions.The code is available at https://github.com/jialeren/AVKT. 展开更多
关键词 artificial intelligence multimodal approaches natural language processing neural network speech processing
下载PDF
Evolution and Prospects of Foundation Models: From Large Language Models to Large Multimodal Models
14
作者 Zheyi Chen Liuchang Xu +5 位作者 Hongting Zheng Luyao Chen Amr Tolba Liang Zhao Keping Yu Hailin Feng 《Computers, Materials & Continua》 SCIE EI 2024年第8期1753-1808,共56页
Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the ... Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field. 展开更多
关键词 Artificial intelligence large language models large multimodal models foundation models
下载PDF
A Robust Framework for Multimodal Sentiment Analysis with Noisy Labels Generated from Distributed Data Annotation
15
作者 Kai Jiang Bin Cao Jing Fan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第6期2965-2984,共20页
Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and sha... Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines. 展开更多
关键词 Distributed data collection multimodal sentiment analysis meta learning learn with noisy labels
下载PDF
Audio-Text Multimodal Speech Recognition via Dual-Tower Architecture for Mandarin Air Traffic Control Communications
16
作者 Shuting Ge Jin Ren +3 位作者 Yihua Shi Yujun Zhang Shunzhi Yang Jinfeng Yang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3215-3245,共31页
In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p... In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management. 展开更多
关键词 Speech-text multimodal automatic speech recognition semantic alignment air traffic control communications dual-tower architecture
下载PDF
Enhancing Cross-Lingual Image Description: A Multimodal Approach for Semantic Relevance and Stylistic Alignment
17
作者 Emran Al-Buraihy Dan Wang 《Computers, Materials & Continua》 SCIE EI 2024年第6期3913-3938,共26页
Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural net... Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural network models and semantic matching techniques.Experiments conducted on the Flickr8k and AraImg2k benchmark datasets,featuring images and descriptions in English and Arabic,showcase remarkable performance improvements over state-of-the-art methods.Our model,equipped with the Image&Cross-Language Semantic Matching module and the Target Language Domain Evaluation module,significantly enhances the semantic relevance of generated image descriptions.For English-to-Arabic and Arabic-to-English cross-language image descriptions,our approach achieves a CIDEr score for English and Arabic of 87.9%and 81.7%,respectively,emphasizing the substantial contributions of our methodology.Comparative analyses with previous works further affirm the superior performance of our approach,and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language.In summary,this study advances the field of cross-lingual image description,offering an effective solution for generating image captions across languages,with the potential to impact multilingual communication and accessibility.Future research directions include expanding to more languages and incorporating diverse visual and textual data sources. 展开更多
关键词 Cross-language image description multimodal deep learning semantic matching reward mechanisms
下载PDF
A deep multimodal fusion and multitasking trajectory prediction model for typhoon trajectory prediction to reduce flight scheduling cancellation
18
作者 TANG Jun QIN Wanting +1 位作者 PAN Qingtao LAO Songyang 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第3期666-678,共13页
Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon... Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon seasons appears and continues,airlines operating in threatened areas and passengers having travel plans during this time period will pay close attention to the development of tropical storms.This paper proposes a deep multimodal fusion and multitasking trajectory prediction model that can improve the reliability of typhoon trajectory prediction and reduce the quantity of flight scheduling cancellation.The deep multimodal fusion module is formed by deep fusion of the feature output by multiple submodal fusion modules,and the multitask generation module uses longitude and latitude as two related tasks for simultaneous prediction.With more dependable data accuracy,problems can be analysed rapidly and more efficiently,enabling better decision-making with a proactive versus reactive posture.When multiple modalities coexist,features can be extracted from them simultaneously to supplement each other’s information.An actual case study,the typhoon Lichma that swept China in 2019,has demonstrated that the algorithm can effectively reduce the number of unnecessary flight cancellations compared to existing flight scheduling and assist the new generation of flight scheduling systems under extreme weather. 展开更多
关键词 flight scheduling optimization deep multimodal fusion multitasking trajectory prediction typhoon weather flight cancellation prediction reliability
下载PDF
FusionNN:A Semantic Feature Fusion Model Based on Multimodal for Web Anomaly Detection
19
作者 Li Wang Mingshan Xia +3 位作者 Hao Hu Jianfang Li Fengyao Hou Gang Chen 《Computers, Materials & Continua》 SCIE EI 2024年第5期2991-3006,共16页
With the rapid development of the mobile communication and the Internet,the previous web anomaly detectionand identificationmodels were built relying on security experts’empirical knowledge and attack features.Althou... With the rapid development of the mobile communication and the Internet,the previous web anomaly detectionand identificationmodels were built relying on security experts’empirical knowledge and attack features.Althoughthis approach can achieve higher detection performance,it requires huge human labor and resources to maintainthe feature library.In contrast,semantic feature engineering can dynamically discover new semantic featuresand optimize feature selection by automatically analyzing the semantic information contained in the data itself,thus reducing dependence on prior knowledge.However,current semantic features still have the problem ofsemantic expression singularity,as they are extracted from a single semantic mode such as word segmentation,character segmentation,or arbitrary semantic feature extraction.This paper extracts features of web requestsfrom dual semantic granularity,and proposes a semantic feature fusion method to solve the above problems.Themethod first preprocesses web requests,and extracts word-level and character-level semantic features of URLs viaconvolutional neural network(CNN),respectively.By constructing three loss functions to reduce losses betweenfeatures,labels and categories.Experiments on the HTTP CSIC 2010,Malicious URLs and HttpParams datasetsverify the proposedmethod.Results show that compared withmachine learning,deep learningmethods and BERTmodel,the proposed method has better detection performance.And it achieved the best detection rate of 99.16%in the dataset HttpParams. 展开更多
关键词 Feature fusion web anomaly detection MULTIMODAL convolutional neural network(CNN) semantic feature extraction
下载PDF
An Immune-Inspired Approach with Interval Allocation in Solving Multimodal Multi-Objective Optimization Problems with Local Pareto Sets
20
作者 Weiwei Zhang Jiaqiang Li +2 位作者 Chao Wang Meng Li Zhi Rao 《Computers, Materials & Continua》 SCIE EI 2024年第6期4237-4257,共21页
In practical engineering,multi-objective optimization often encounters situations where multiple Pareto sets(PS)in the decision space correspond to the same Pareto front(PF)in the objective space,known as Multi-Modal ... In practical engineering,multi-objective optimization often encounters situations where multiple Pareto sets(PS)in the decision space correspond to the same Pareto front(PF)in the objective space,known as Multi-Modal Multi-Objective Optimization Problems(MMOP).Locating multiple equivalent global PSs poses a significant challenge in real-world applications,especially considering the existence of local PSs.Effectively identifying and locating both global and local PSs is a major challenge.To tackle this issue,we introduce an immune-inspired reproduction strategy designed to produce more offspring in less crowded,promising regions and regulate the number of offspring in areas that have been thoroughly explored.This approach achieves a balanced trade-off between exploration and exploitation.Furthermore,we present an interval allocation strategy that adaptively assigns fitness levels to each antibody.This strategy ensures a broader survival margin for solutions in their initial stages and progressively amplifies the differences in individual fitness values as the population matures,thus fostering better population convergence.Additionally,we incorporate a multi-population mechanism that precisely manages each subpopulation through the interval allocation strategy,ensuring the preservation of both global and local PSs.Experimental results on 21 test problems,encompassing both global and local PSs,are compared with eight state-of-the-art multimodal multi-objective optimization algorithms.The results demonstrate the effectiveness of our proposed algorithm in simultaneously identifying global Pareto sets and locally high-quality PSs. 展开更多
关键词 Multimodal multi-objective optimization problem local PSs immune-inspired reproduction
下载PDF
上一页 1 2 27 下一页 到第
使用帮助 返回顶部