基于多模态语义增强的水下视频多标签分类网络

Multi-label classification network for underwater video using enhancing multimodal semantic

下载PDF

导出

摘要光线传播路径在水下环境受到水分子散射和吸收的影响,物体出现模糊、扭曲等现象,导致基于水下视觉的检测精度不高。为提高水下检测精度,利用水下视频包含的视觉模态和音频模态,提出一种基于多模态语义增强的水下视频多标签分类网络(MCNEMS),通过基于注意力增强的多模态互补编解码生成增强标签的特征表示,引导多标签语义关联。具体而言,构建基于模态语义增强模块,完成多模态之间公共-独立的编解码,用来增强模态之间的共享信息和独立信息,并利用多头注意力机制生成多模态互补特征矩阵,获得增强的水下视频内容表示。为挖掘多标签隐性关联性,设计了基于动态图卷积的图关联学习模块,用于自适应地学习标签语义嵌入。在提出的水下视频多标签分类数据集(UVMCD)上进行实验,仿真结果表明所提模型均具有较好的性能指标。 The propagation of light in underwater environment is affected by the scattering and absorption of water molecules,resulting in low accuracy in underwater visual detection.To improve the accuracy of underwater detection,this study proposes a multi‑label classification network for underwater video using enhancing multimodal semantic,which utilizes underwater videos mo‑dalities such as the visual and audio modalities.By enhancing feature representations of labels based on multi head attention(MHA)mechanism,the network achieves the multi‑label semantic correlations and then improves the accuracy of classification.Specifically,an enhancing modal semantic module contributes to encoding and decoding the common‑independent features,which riches the correlation information of multi‑modalities.To achieve the enhancing representation,the multi‑head attention is used to learning a multimodal complementary module based on dynamic graph convolution is used to adaptively learn label semantic.Num‑bers of experiments show that the proposed method have state‑of‑the‑art performances,in which proposed the underwater video multi‑label classification dataset.

作者卢振坤王粟李云 Lu Zhenkun;Wang Su;Li Yun(College of Electronic Information,Guangxi Minzu University,Nanning 530006,China;School of Big Data and Artificial Intelligence,Guangxi University of Finance and Economics,Nanning 530003,China)

机构地区广西民族大学电子信息学院广西财经学院大数据与人工智能学院

出处《现代计算机》 2024年第14期1-8,17,共9页 Modern Computer

基金国家自然科学基金资助项目(61861014、62361002) 博士启动基金(BS2021025)。

关键词多标签分类多模态图卷积水下视频 multi‑label classification multimodal graph convolution underwater video

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1李泽超,付孝德,潘礼勇,严锐,唐金辉.基于视音互补语义清晰化的隐私视频动作识别方法[J].电子学报,2024,52(7):2170-2182.
2吴相岚,肖洋,刘梦莹,刘明铭.基于语义增强模式链接的Text-to-SQL模型[J].计算机应用,2024,44(9):2689-2695.
3张蕾,郭荣慧.医学科普动画中的疾病多模态隐喻表征研究[J].语言学研究,2024(1):104-118.
4李洋洋,方以群,俞旭华,石路.闭式全面罩呼吸器人因工程学设计研究综述[J].人类工效学,2024,30(3):74-80.

现代计算机

2024年第14期

浏览历史

内容加载中请稍等...

基于多模态语义增强的水下视频多标签分类网络

相关作者

相关机构

相关主题

浏览历史