摘要
汉越跨语言事件检索旨在根据输入的中文事件查询短语,检索出相关的越南语新闻事件文档。由于查询文档的新闻文本较长,中文事件查询短语与越南语的查询文档长度不一,表达差异较大,且查询文档中往往会包含大量与其描述的核心事件无关的噪声文本,现有的模型不能很好地捕获事件匹配特征,匹配效果欠佳。基于此,文中提出基于要素关联图的汉越跨语言事件检索方法。首先,预训练一个汉越双语词嵌入来解决跨语言问题;然后,抽取查询文档中的关键信息(关键词和实体)以构建要素关联图;最后,通过引入一个图编码器对构建的要素图进行编码,生成结构化的事件信息来增强传统的事件检索模型。实验结果表明文中提出的方法优于传统的基线方法。
Chinese⁃Vietnamese cross⁃lingual event retrieval aims to retrieve relevant Vietnamese news event documents according to the input Chinese event query phrase.Due to the extremely long news text of the query document,and the facts that the length of the Chinese event query phrase and the Vietnamese query document are different,and their expressions are quite different,and the query document often contains a large amount of noise texts that has nothing to do with the core event it describes,the existing model fails to capture the event matching characteristics,and the matching effect is not good enough.On this basis,a Chinese⁃Vietnamese cross⁃lingual event retrieval method based on arguments association graphs is proposed.A Chinese⁃Vietnamese bilingual word embedding is pre⁃trained to solve cross⁃language problems.Then,the key information(keywords and entities)in the query document is extracted to construct an arguments association graph.The constructed arguments graph is encoded by introducing a graph encoder,so as to generate structured event information to enhance the traditional event retrieval models.Experimental results show that the proposed method outperforms the traditional baseline methods.
作者
赵周颖
余正涛
黄于欣
陈瑞清
朱恩昌
ZHAO Zhouying;YU Zhengtao;HUANG Yuxin;CHEN Ruiqing;ZHU Enchang(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)
出处
《现代电子技术》
北大核心
2024年第7期127-132,共6页
Modern Electronics Technique
关键词
跨语言事件检索
跨语言词嵌入
要素关联图
图神经网络
文本匹配
事件检索
cross⁃lingual event retrieval
cross⁃lingual word embedding
arguments relational graph
graph neural network
text match
fact retrieval