摘要
光线传播路径在水下环境受到水分子散射和吸收的影响,物体出现模糊、扭曲等现象,导致基于水下视觉的检测精度不高。为提高水下检测精度,利用水下视频包含的视觉模态和音频模态,提出一种基于多模态语义增强的水下视频多标签分类网络(MCNEMS),通过基于注意力增强的多模态互补编解码生成增强标签的特征表示,引导多标签语义关联。具体而言,构建基于模态语义增强模块,完成多模态之间公共-独立的编解码,用来增强模态之间的共享信息和独立信息,并利用多头注意力机制生成多模态互补特征矩阵,获得增强的水下视频内容表示。为挖掘多标签隐性关联性,设计了基于动态图卷积的图关联学习模块,用于自适应地学习标签语义嵌入。在提出的水下视频多标签分类数据集(UVMCD)上进行实验,仿真结果表明所提模型均具有较好的性能指标。
The propagation of light in underwater environment is affected by the scattering and absorption of water molecules,resulting in low accuracy in underwater visual detection.To improve the accuracy of underwater detection,this study proposes a multi‑label classification network for underwater video using enhancing multimodal semantic,which utilizes underwater videos mo‑dalities such as the visual and audio modalities.By enhancing feature representations of labels based on multi head attention(MHA)mechanism,the network achieves the multi‑label semantic correlations and then improves the accuracy of classification.Specifically,an enhancing modal semantic module contributes to encoding and decoding the common‑independent features,which riches the correlation information of multi‑modalities.To achieve the enhancing representation,the multi‑head attention is used to learning a multimodal complementary module based on dynamic graph convolution is used to adaptively learn label semantic.Num‑bers of experiments show that the proposed method have state‑of‑the‑art performances,in which proposed the underwater video multi‑label classification dataset.
作者
卢振坤
王粟
李云
Lu Zhenkun;Wang Su;Li Yun(College of Electronic Information,Guangxi Minzu University,Nanning 530006,China;School of Big Data and Artificial Intelligence,Guangxi University of Finance and Economics,Nanning 530003,China)
出处
《现代计算机》
2024年第14期1-8,17,共9页
Modern Computer
基金
国家自然科学基金资助项目(61861014、62361002)
博士启动基金(BS2021025)。
关键词
多标签分类
多模态
图卷积
水下视频
multi‑label classification
multimodal
graph convolution
underwater video