摘要
朝鲜语词性标注是朝鲜语信息处理的基础,其结果直接影响后续朝鲜语自然语言处理的效果。首先为了解决朝鲜语词性标注中遇到的形态素实际写法与原形不一致的问题,该文提出了一种在seq2seq模型的基础上融合朝鲜语字母信息的朝鲜语形态素原形恢复方法;其次,在恢复形态素原形的基础上,利用LSTM-CRF模型完成朝鲜语分写及词性标注。实验结果表明,该文提出的方法词性标注F1值为94.75%,优于其他方法。
Korean POS tagging is the basis of the Korean information processing,and the result of POS tagging affects Korean Natural Language Processing directly.First of all,in order to solve the problem of inconsistency between the representation morpheme and original morpheme,this paper proposes a method of recovering the original form of Korean morpheme that integrates Korean Jamo information on the basis of seq2 seq model.Then the LSTMCRF model is used to achieve Korean spacing and POS tagging task.The experimental result shows that our method achieved 94.75% POS tagging F1-score,which is better than other methods.
作者
金国哲
崔荣一
JIN Guozhe;CUI Rongyi(Department of Computer Science and Technology,Yanbian University,Yanji,Jilin 133002,China)
出处
《中文信息学报》
CSCD
北大核心
2018年第10期53-58,68,共7页
Journal of Chinese Information Processing
基金
吉林省教育厅重点项目(吉教科合字[2016]第250号)