期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Audio-visual keyword transformer for unconstrained sentence-level keyword spotting
1
作者 Yidi Li Jiale Ren +3 位作者 Yawei Wang Guoquan Wang Xia Li Hong Liu 《CAAI Transactions on Intelligence Technology》 SCIE EI 2024年第1期142-152,共11页
As one of the most effective methods to improve the accuracy and robustness of speech tasks,the audio-visual fusion approach has recently been introduced into the field of Keyword Spotting(KWS).However,existing audio-... As one of the most effective methods to improve the accuracy and robustness of speech tasks,the audio-visual fusion approach has recently been introduced into the field of Keyword Spotting(KWS).However,existing audio-visual keyword spotting models are limited to detecting isolated words,while keyword spotting for unconstrained speech is still a challenging problem.To this end,an Audio-Visual Keyword Transformer(AVKT)network is proposed to spot keywords in unconstrained video clips.The authors present a transformer classifier with learnable CLS tokens to extract distinctive keyword features from the variable-length audio and visual inputs.The outputs of audio and visual branches are combined in a decision fusion module.As humans can easily notice whether a keyword appears in a sentence or not,our AVKT network can detect whether a video clip with a spoken sentence contains a pre-specified keyword.Moreover,the position of the keyword is localised in the attention map without additional position labels.Exper-imental results on the LRS2-KWS dataset and our newly collected PKU-KWS dataset show that the accuracy of AVKT exceeded 99%in clean scenes and 85%in extremely noisy conditions.The code is available at https://github.com/jialeren/AVKT. 展开更多
关键词 artificial intelligence multimodal approaches natural language processing neural network speech processing
下载PDF
Multimodal Omics Approaches to Aging and Age‑Related Diseases
2
作者 Qianzhao Ji Xiaoyu Jiang +4 位作者 Minxian Wang Zijuan Xin Weiqi Zhang Jing Qu Guang‑Hui Liu 《Phenomics》 2024年第1期56-71,共16页
Aging is associated with a progressive decline in physiological capacities and an increased risk of aging-associated disorders.An increasing body of experimental evidence shows that aging is a complex biological proce... Aging is associated with a progressive decline in physiological capacities and an increased risk of aging-associated disorders.An increasing body of experimental evidence shows that aging is a complex biological process coordinately regulated by multiple factors at diferent molecular layers.Thus,it is difcult to delineate the overall systematic aging changes based on single-layer data.Instead,multimodal omics approaches,in which data are acquired and analyzed using complementary omics technologies,such as genomics,transcriptomics,and epigenomics,are needed for gaining insights into the precise molecular regulatory mechanisms that trigger aging.In recent years,multimodal omics sequencing technologies that can reveal complex regulatory networks and specifc phenotypic changes have been developed and widely applied to decode aging and age-related diseases.This review summarizes the classifcation and progress of multimodal omics approaches,as well as the rapidly growing number of articles reporting on their application in the feld of aging research,and outlines new developments in the clinical treatment of age-related diseases based on omics technologies. 展开更多
关键词 multimodal omics approaches AGING GENOME EPIGENOME
原文传递
Data-driven multimodal fusion:approaches and applications in psychiatric research
3
作者 Jing Sui Dongmei Zhi Vince D Calhoun 《Psychoradiology》 2023年第1期135-153,共19页
In the era of big data,where vast amounts of information are being generated and collected at an unprecedented rate,there is a pressing demand for innovative data-driven multi-modal fusion methods.These methods aim to... In the era of big data,where vast amounts of information are being generated and collected at an unprecedented rate,there is a pressing demand for innovative data-driven multi-modal fusion methods.These methods aim to integrate diverse neuroimaging per-spectives to extract meaningful insights and attain a more comprehensive understanding of complex psychiatric disorders.However,analyzing each modality separately may only reveal partial insights or miss out on important correlations between different types of data.This is where data-driven multi-modal fusion techniques come into play.By combining information from multiple modalities in a synergistic manner,these methods enable us to uncover hidden patterns and relationships that would otherwise remain unnoticed.In this paper,we present an extensive overview of data-driven multimodal fusion approaches with or without prior information,with specific emphasis on canonical correlation analysis and independent component analysis.The applications of such fusion methods are wide-ranging and allow us to incorporate multiple factors such as genetics,environment,cognition,and treatment outcomes across various brain disorders.After summarizing the diverse neuropsychiatric magnetic resonance imaging fusion applications,we further discuss the emerging neuroimaging analyzing trends in big data,such as N-way multimodal fusion,deep learning approaches,and clinical translation.Overall,multimodal fusion emerges as an imperative approach providing valuable insights into the under-lying neural basis of mental disorders,which can uncover subtle abnormalities or potential biomarkers that may benefit targeted treatments and personalized medical interventions. 展开更多
关键词 multimodal fusion approach data driven functional magnetic resonance imaging(fMRI) structural MRI diffusion mag-netic resonance imaging independent component analysis canonical correlation analysis psychiatric disorder
原文传递
Conditional selection with CNN augmented transformer for multimodal affective analysis
4
作者 Jianwen Wang Shiping Wang +3 位作者 Shunxin Xiao Renjie Lin Mianxiong Dong Wenzhong Guo 《CAAI Transactions on Intelligence Technology》 SCIE EI 2024年第4期917-931,共15页
Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information.... Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information. One is to generate sparse attention coefficients associated with acoustic and visual modalities, which helps locate critical emotional se-mantics. The other is fusing complementary cross‐modal representation to construct optimal salient feature combinations of multiple modalities. A Conditional Transformer Fusion Network is proposed to handle these problems. Firstly, the authors equip the transformer module with CNN layers to enhance the detection of subtle signal patterns in nonverbal sequences. Secondly, sentiment words are utilised as context conditions to guide the computation of cross‐modal attention. As a result, the located nonverbal fea-tures are not only salient but also complementary to sentiment words directly. Experi-mental results show that the authors’ method achieves state‐of‐the‐art performance on several multimodal affective analysis datasets. 展开更多
关键词 affective computing data fusion information fusion multimodal approaches
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部