摘要
多模态融合旨在将多个模态信息整合以得到一致、公共的模型输出,是多模态领域的一个基本问题。通过多模态信息的融合能获得更全面的特征并且提高模型鲁棒性,目前多模态融合技术已成为多模态领域核心研究课题之一。本文基于ImageNet、HowNet和CCD,通过人工标注构建了一个新的多模态知识库,已完成校准ImageNet中21455个名词及动词概念的映射,有效地将HowNet以及CCD中概念映射到ImageNet中。该数据集能够应用于自然语言处理任务和计算机视觉任务,并通过图片信息和概念信息提高任务效果。在图片分类中,通过增加HowNet和ImageNet概念能够融合更多的图片特征来辅助分类;在语义理解中,通过映射增加图片信息可以更好地理解语义。
Multi-modal fusion aims to integrate multiple modal information to obtain a consistent and common model output,which is a basic problem in the multi-modal field.Through the fusion of multimodal information,more comprehensive features can be obtained and the robustness of the model can be improved.At present,multimodal fusion technology has become one of the core research topics in the field of multimodality.Based on Imagenet,HowNet and CCD,this paper constructs a new multimodal knowledge base through manual annotation.The calibration has completed the mapping of 21455 noun concepts in ImageNet,effectively mapping the concepts in HowNet and CCD to ImageNet.The data set can be applied to natural language processing tasks and computer vision tasks,and improve the task effect through picture information and concept information.In image classification,by adding HowNet and ImageNet concepts,more image features can be integrated to assist classification.In semantic understanding,image information can be better understood by adding image information through mapping.
作者
晁睿
张坤丽
王佳佳
胡斌
张维聪
韩英杰
昝红英
CHAO Rui;ZHANG Kunli;WANG Jiajia;HU Bin;ZHANG Weicong;HAN Yingjie;ZAN Hongying(School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou Henan 450001,China)
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2022年第3期31-39,共9页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家重点研发计划(2017YFB1002101)
国家社科基金重大项目(17ZDA138)
国家自然科学基金(62006211)
河南省科技攻关项目(192102210260)
河南省高等学校重点科研项目(19A520003,20A520038)
教育部人文社科规划项目(20YJA740033)。