摘要
重点研究将深度学习技术应用于藏文分词任务,采用多种深度神经网络模型,包括循环神经网络(RNN)、双向循环神经网络(Bi RNN)、层叠循环神经网络(Stacked RNN)、长短期记忆模型(LSTM)和编码器-标注器长短期记忆模型(Encoder-Labeler LSTM)。多种模型在以法律文本、政府公文、新闻为主的分词语料中进行实验,实验数据表明,编码器-标注器长短期记忆模型得到的分词结果最好,分词准确率可以达到92.96%,召回率为93.30%,F值为93.13%。
The application of deep learning on Tibetan word segmentation was studied.Several models of deep neural network were implemented,including recurrent neural network,bi-directional recurrent neural network,stacked recurrent neural network,long short-term memory network and encoder-labeler long short-term memory network.These models were performed on written style corpus,including legal text,government documents and news.Experimental results show that the encoder-labeler long shortterm memory network achieves the best results,the precision,recall and F value reach 92.96%,93.30% and 93.13% respectively.
出处
《计算机工程与设计》
北大核心
2018年第1期194-198,共5页
Computer Engineering and Design
基金
国家自然科学基金项目(61303165
61540057
61132009)
青海省自然科学基金项目(2016-ZJ-Y04
2016-ZJ-740)
国家语委重点基金项目(ZDI135-17)
关键词
深度学习
藏文分词
循环神经网络
长短期记忆
编码器-标注器
deep learning
Tibetan word segmentation
recurrent neural network
long short-term memory
encoder-labeler