期刊文献+

融合词语多特征的汉老短文本相似度计算

Computation of Chinese and Lao Short Text Similarity Based on Multiple Features of Words
下载PDF
导出
摘要 词语作为文本构成中最具有语义表达的单位,将词语更多的特征如形态学、词性、词性权重等融入到词语语义的表达中,将提升文本相似度量的准确性.该文提出一种融合词语多特征的汉老短文本相似度计算方法,首先利用双向长短期记忆网络(BiLSTM)和卷积神经网络(CNN)分别提取汉老词语的形态学特征,将词向量拼接上形态学特征向量、词性向量、词性权重向量,然后利用BiLSTM和CNN提取汉老短文本的上下文特征和局部语义特征,接着加入ESIM交互注意力机制使汉老语义信息进行交互.最后计算汉老特征语义向量的相对差和相对积,将其结果拼接并输入到全连接层得到汉老双语短文本的相似度分数.实验结果表明,本文提出的方法在有限的语料下取得了更好的效果,F1值达到了78.67%. Words,as the most semantically expressed unit in text composition,incorporate more features of words such as morphology,part of speech,and part of speech weight into the semantic expression of words,which will improve the accuracy of text similarity measurement.This article proposes a similarity calculation method for Chinese and Lao short texts that integrates multiple features of words.First,the bidirectional long short-term memory network(BiLSTM)and convolutional neural network(CNN)are used to extract the morphological features of the Chinese and Lao words respectively,and the word vectors are spliced On the morphological feature vector,part-of-speech vector,part-of-speech weight vector,then use BiLSTM and CNN to extract the contextual features and local semantic features of the Chinese and Lao short texts,and then add the ESIM interactive attention mechanism to make the Chinese and Lao semantic information interact.Finally,the relative difference and relative product of the Chinese-Lao feature semantic vector are calculated,and the results are spliced and input into the fully connected layer to obtain the similarity score of the Chinese-Lao bilingual short text.The experimental results show that the method proposed in this paper achieves better results with limited corpus,and the F1 value reaches 78.67%.
作者 郭雷 周兰江 周蕾越 GUO Lei;ZHOU Lan-jiang;ZHOU Lei-yue(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650550,China;Oxbridge College,Kunming University of Science and Technology,Kunming 650160,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2023年第4期759-765,共7页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61662040)资助。
关键词 汉语-老挝语 形态学 双向长短期记忆网络 ESIM交互注意力机制 Chinese-Lao morphology two-way long and short-term memory network ESIM interactive attention mechanism
  • 相关文献

参考文献9

二级参考文献83

共引文献53

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部