摘要
双语词典抽取任务是自然语言处理一个重要课题。本文基于替换方法重新训练词向量,使得词向量具有跨语言特性。本文主要研究了训练词典的获取方法,以及词向量共训练模型,在中英维基百科语料上进行实验。实验结果表明,按照确信度的方法选取训练词典,基于替换的方法得到的词向量跨语言性质较好,最终抽取的词典具有较高的准确率。
Bilingual lexicon induction is an important task in natural language processing.This paper retrains the word vector based on the substitution method,so that the word embedding gets cross-language characteristics.This paper mainly studies the acquisition of training dictionary and the co-training model of word vector,and carries out experiments on the corpus of Chinese and English Wikipedia.The experimental results show that using the selected training dictionary according to the method of confidence,the word vector obtained by the method of substitution has a good cross-language property,and the dictionary extracted finally has a high accuracy.
作者
郭晋鹏
曹海龙
GUO Jinpeng;CAO Hailong(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
出处
《智能计算机与应用》
2021年第3期217-219,共3页
Intelligent Computer and Applications
关键词
双语词典抽取
无监督
替换方法
bilingual lexicon induction
unsupervised learning
substitution method