期刊文献+

融合attention机制的BI-LSTM-CRF中文分词模型 被引量:8

BI-LSTM-CRF Chinese Word Segmentation Model with Attention Mechanism
下载PDF
导出
摘要 中文的词语不同于英文单词,没有空格作为自然分界符,因此,为了使机器能够识别中文的词语需要进行分词操作。深度学习在中文分词任务上的研究与应用已经有了一些突破性成果,本文在已有工作的基础上,提出融合Bi-LSTM-CRF模型与attention机制的方法,并且引入去噪机制对字向量表示进行过滤,此外为改进单向LSTM对后文依赖性不足的缺点引入了贡献率?对BI-LSTM的输出权重矩阵进行调节,以提升分词效果。使用改进后的模型对一些公开数据集进行了实验。实验结果表明,改进的attention-BI-LSTM-CRF模型以及训练方法可以有效地解决中文自然语言处理中的分词、词性标注等问题,并较以前的模型有更优秀的性能。 In English words, spaces are used as natural delimiters between words, and there are no such clear delimiters between Chinese words. Therefore, deep learning models and methods that obtain good results in English natural language processing cannot be directly applied. Deep learning has achieved breakthrough results in the field of natural language processing in English. Based on the existing work, this paper proposes a method to integrate the Bi-LSTM-CRF model and the attention mechanism, and introduces a denoising mechanism to filter the word vector representation.In addition, the contribution rate ? of the unidirectional LSTM is reduced. The output weight matrix of the BI-LSTM is adjusted to improve the word segmentation effect. We conducted experiments using the public data set in the above model. Experimental results show that the improved attention-BI-LSTM-CRF model and training method can effectively solve the problem of word segmentation and part of speech tagging in Chinese natu-ral language processing, and can obtain good performance.
作者 黄丹丹 郭玉翠 HUANG Dan-dan;GUO Yu-cui(School of Science,Beijing University of Posts and Telecommunications,Beijing 100876,China)
出处 《软件》 2018年第10期260-266,共7页 Software
关键词 中文分词 BI-LSTM CRF attention机制 贡献因子 去噪机制 DROPOUT Chinese segmentation BI-LSTM CRF Attention mechanism Contribution factor Denoising mechanism Dropout
  • 相关文献

参考文献2

二级参考文献12

共引文献32

同被引文献61

引证文献8

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部