期刊文献+

基于BERT-BiLSTM-CRF的中文分词和词性标注联合方法 被引量:2

Joint Method for Chinese Word Segmentation and Part-of-speech Tagging Based on BERT-BiLSTM-CRF
下载PDF
导出
摘要 针对中文分词、词性标注等序列标注任务,本文提出了结合BERT语言模型、BiLSTM(双向长短时记忆模型)、CRF(条件随机场模型)和马尔可夫族模型(MFM)或树形概率(TLP)构建的中文分词和词性标注联合方法.隐马尔可夫(HMM)词性标注方法忽略了词本身到词性的发射概率,而在利用树形概率或马尔可夫族统计模型的词性标记中,一个词的词性不仅和该词前一个词的词性关联,且与该词自身关联.使用联合方法有助于使用词性信息帮助分词,将两者紧密结合能够帮助消除歧义和改进分词、词性标记的性能.实验结果表明本文使用的中文分词和词性标注联合方法与普通的BiLSTM-CRF分词算法相比,可以明显提升分词性能,而且相比于通常的隐马尔可夫词性标注方法能够大幅度提高词性标注的准确率. For sequence tagging tasks such as Chinese word segmentation and part-of-speech tagging,this paper proposes a joint method for Chinese word segmentation and part-of-speech tagging that combines BERT model,BiLSTM(bi-directional long-short term memory model),CRF(conditional random field model),Markov family model(MFM)or tree-like probability(TLP).Part-of-speech tagging method based on HMM(Hidden Markov Model)ignores the emission probability of the word itself to the part-of-speech.In part-of-speech tagging based on MFM or TLP,the part-of-speech of the current word is not only related to the part-of-speech of the previous word,but also related to the current word itself.The use of the joint method helps to use part-of-speech tagging information to achieve word segmentation,and organically combining the two is beneficial to eliminate ambiguity and improve the accuracy of word segmentation and part-of-speech tagging tasks.The experimental results show that the joint method of Chinese word segmentation and part-of-speech tagging used in this paper can greatly improve the accuracy of word segmentation compared with the usual word segmentation model based on BiLSTM-CRF,and it can also greatly improve the accuracy of part-of-speech tagging compared with the traditional part-of-speech tagging method based on HMM.
作者 袁里驰 YUAN Li-chi(School of Software and Internet of Things Engineering,Jiangxi University of Finance and Economics,Nanchang 330013,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2023年第9期1906-1911,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61962025,61562034)资助.
关键词 BERT 双向长短时记忆模型 中文分词 词性标注 马尔可夫族模型 树形概率 bidirectional encoder representation from transformers bi-directional long-short term memory model Chinese word segmentation part-of-speech tagging Markov family model tree-like probability
  • 相关文献

参考文献15

二级参考文献106

共引文献133

同被引文献16

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部