摘要
目前,学术界主流的中文分词法是基于字符序列标注的传统机器学习方法,该方法存在需要人工定义特征、特征稀疏等问题.随着深度学习的研究和应用的兴起,研究者提出了将长短时记忆(long short-term memory,LSTM)神经网络应用于中文分词任务的方法,该方法可以自动学习特征,并有效建模长距离依赖信息,但是该模型较为复杂,存在模型训练和预测时间长的缺陷.针对该问题,提出了基于门循环单元(gated recurrent unit,GRU)神经网络的中文分词法,该方法继承了LSTM模型可自动学习特征、能有效建立长距离依赖信息的优点,具有与基于LSTM神经网络中文分词法相当的性能,并在速度上有显著提升.
Currently, the common method for Chinese word segmentation is traditional machine learning on character-based sequence labeling.However,this method faces disadvantages such as manual feature engineering and sparse features.With the increasing re- search and application of deep learning,researchers have proposed a method by applying long short-term memory (LSTM) to Chi- nese word segmentation task. This method is capable of learning features automatically and capturing long-distance dependence as well.However, this method is complicated,and has defects in speed.Therefore,we propose a gated recurrent unit (GRU) neural net- work for Chinese word segmentation,which are also associated with advantages of learning features automatically and the ability of capturing long-distance dependence.Finally,our method performs comparably well as the LSTM neural network for Chinese word segmentation,and exhibits a great improvement in training and predicting speeds.
作者
李雪莲
段鸿
许牧
LI Xuelian DUAN Hong XU Mu(Software School of Xiamen University, Xiamen 361005,China)
出处
《厦门大学学报(自然科学版)》
CAS
CSCD
北大核心
2017年第2期237-243,共7页
Journal of Xiamen University:Natural Science
基金
福建省自然科学基金(2013J01250)
关键词
自然语言处理
中文分词
门循环单元
字嵌入
循环神经网络
natural language processing
Chinese word segmentation
gated recurrent unit (GRU)
character embedding
recurrent neural networks