期刊文献+

融合文本特征的汉老双语句子相似度计算方法 被引量:1

Textual Feature Based Bilingual Sentence Similarity Measure Between Chinese and Lao
下载PDF
导出
摘要 双语句子相似度旨在计算不同语言句子间的语义相似程度,在信息检索、平行语料库构建、机器翻译等领域有重要作用。由于汉语、老挝语平行语料稀少,且老挝语在语义表达、句子结构上与汉语有明显差异,导致汉老双语句子相似度研究的难度较大。该文提出了一种融合文本特征的汉老双语句子相似度计算方法,并构建了句子相似度模型。首先,在句子相似度模型中将汉语、老挝语的词性、数字共现等文本特征与GloVe预训练词向量融合,以此丰富句子特征,提升模型计算准确率。其次,由基于自注意力的双向长短时记忆网络组成多层孪生网络来提取长距离上下文特征和深层次语义信息,其中自注意力机制可以保证语义信息的有效利用。最后,采用迁移学习的方法将通用模型参数初始化,并使用不同的微调参策略增强模型的泛化能力。实验表明,该文提出的方法,其召回率、准确率和F;值分别达到了82.5%、85.78%和84.00%。 The bilingual sentence similarity aims to calculate the semantic similarity between different language sentences,which are of substantial application in the fields of information retrieval,parallel corpus construction,and machine translation.Challenged by the lack of parallel corpora and the obvious semantic and syntactic differences between Lao and Chinese,this paper proposes a model of bilingual sentence similarity with textual features for Chinese and Lao.Firstly,text features including part of speech and word co-occurrence in Chinese and Lao are fused with GloVe pretrained word vectors.Secondly,long-distance context features and deep-level semantic information are distinguished based on a multi-layered siamese network,which is composed of bidirectional long-term and short-term memory self-attention networks.Finally,the method of transfer learning is used to initialize the model by its parameters,and different strategies of fine-tuning are used to enhance the generalization ability of the model.Experimental results indicate that the recall rate,precision and F;value of the proposed method reach 82.5%,85.78%and 84.00%,respectively.
作者 谭琪辉 周兰江 刘畅 TAN Qihui;ZHOU Lanjiang;LIU Chang(The Key Laboratory of Intelligent Information Processing,School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650500,China;School of Information Science and Technology,Southwest Jiaotong University,Chengdu,Sichuan 611756,China)
出处 《中文信息学报》 CSCD 北大核心 2021年第10期64-72,共9页 Journal of Chinese Information Processing
基金 国家自然科学基金(61662040)
关键词 双语句子相似度 老挝语 迁移学习 文本特征 bilingual sentence similarity Lao transfer learning text features
  • 相关文献

参考文献4

二级参考文献14

共引文献33

同被引文献11

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部