摘要
基于单一模态实体之间建立关联所形成的语义关系网难以准确理解现实世界中的多模态语义。为增强多源知识图谱的补全能力以及解决知识图谱语义缺失问题,提出一种基于多模态嵌入张量分解的方法ME-TD(multimodal embedding tensor decomposition)。利用由图像、描述文本和知识构成的三元组作为张量分解模型的输入,分别对图像和文本进行特征提取,研究3种融合方法:相加融合、相乘融合以及连接映射方法,通过高维映射形成一个多模态的三阶张量;经过三模式分解,产生一个核心张量与每一个维度因子矩阵的乘积,通过链接预测计算三元组正确的概率。实验结果表明,ME-TD方法在知识补全中对多模态矩阵预测效果相较于其它方法有较为明显提升。
It is difficult to accurately understand the multi-modal semantics in the real world based on the semantic relation network formed by the association between single-modal entities.To enhance the completion ability of multi-source knowledge graph and solve the problem of missing knowledge graph semantics,a method based on multimodal embedding tensor decomposition ME-TD(multimodal embedding tensor decomposition)was proposed.The triplet consisting of image,description text and knowledge was used as the input of the tensor decomposition model,and the features of the image and text were extracted respectively,and three fusion methods were studied including additive fusion,multiplication fusion and connection mapping method,a multi-modal third-order tensor was formed through high-dimensional mapping.After the three-mode decomposition,a product of a core tensor and each dimension factor matrix was generated,and the correct probability of the triplet was calculated through link prediction.Experimental results show that the ME-TD method has a significant improvement in the prediction effect of multimodal metrics in knowledge completion compared with other methods.
作者
陈冲
蒙祖强
CHEN Chong;MENG Zu-qiang(School of Computer and Electronic Information,Guangxi University,Nanning 530004,China)
出处
《计算机工程与设计》
北大核心
2023年第10期2956-2964,共9页
Computer Engineering and Design
基金
国家自然科学基金项目(61862005)。
关键词
知识图谱补全
特征提取
多模态嵌入
融合
核心张量
三模式分解
链接预测
knowledge graph completion
feature extraction
embedding in multimodal
fusion
core tensor
tensor decomposition in three mode
link prediction