摘要
跨语言知识链接是指在描述相同内容的不同语言的在线百科文章之间建立联系。跨语言知识链接可分为候选集选择和候选集排序两部分。首先,把候选集选择问题转换为跨语言信息检索问题,提出一种将标题与关键词相结合从而生成查询的方法,该方法将候选集选择的召回率大幅提高至93.8%;在候选集排序部分,提出一种融合双语主题模型及双语词向量的排序模型,实现了英文维基百科和中文百度百科之间军事领域的跨语言知识链接。实验结果表明,该模型取得了75%的准确率,显著提高了跨语言知识链接的性能,并且提出的方法不依赖于语言特性和领域特性,因此可以很容易地扩展至其他语言和其他领域的跨语言知识链接。
Cross-language knowledge linking(CLKL)refers to the establishment of links between encyclopedia articles in different languages that describe the same content.CLKL can be divided into two parts:candidate selection and candidate ranking.Firstly,this paper formulated candidate selection as cross-language information retrieval problem,and proposed a method to generate query by combining title with keywords,which greatly improves the recall of candidate selection,reaching 93.8%.In the part of the candidate ranking,this paper trained a ranking model by mixing bilingual topic model and bilingual embedding,implementing military articles linking in English Wikipedia and Chinese Baidu Baike.The evaluation results show that the accuracy of model achieves 75%,which significantly improves the performance of CLKL.The proposed method does not depend on linguistic characteristics and domain characteristics,and it can be easily extended to CLKL in other languages and other domains.
作者
余圆圆
巢文涵
何跃鹰
李舟军
YU Yuan-yuan;CHAO Wen-han;HE Yue-ying;LI Zhou-jun(School of Computer Science and Engineering,Beihang University,Beijing 100191,China;National Computer Network Emergency Response Technical Team/Coordination Center,Beijing 100029,China)
出处
《计算机科学》
CSCD
北大核心
2019年第1期238-244,共7页
Computer Science
关键词
跨语言知识链接
跨语言信息检索
双语主题模型
双语词向量
Cross-language knowledge linking
Cross-language information retrieval
Bilingual topic model
Bilingual embedding