期刊文献+

基于门循环单元神经网络的中文分词法 被引量:22

A Gated Recurrent Unit Neural Network for Chinese Word Segmentation
下载PDF
导出
摘要 目前,学术界主流的中文分词法是基于字符序列标注的传统机器学习方法,该方法存在需要人工定义特征、特征稀疏等问题.随着深度学习的研究和应用的兴起,研究者提出了将长短时记忆(long short-term memory,LSTM)神经网络应用于中文分词任务的方法,该方法可以自动学习特征,并有效建模长距离依赖信息,但是该模型较为复杂,存在模型训练和预测时间长的缺陷.针对该问题,提出了基于门循环单元(gated recurrent unit,GRU)神经网络的中文分词法,该方法继承了LSTM模型可自动学习特征、能有效建立长距离依赖信息的优点,具有与基于LSTM神经网络中文分词法相当的性能,并在速度上有显著提升. Currently, the common method for Chinese word segmentation is traditional machine learning on character-based sequence labeling.However,this method faces disadvantages such as manual feature engineering and sparse features.With the increasing re- search and application of deep learning,researchers have proposed a method by applying long short-term memory (LSTM) to Chi- nese word segmentation task. This method is capable of learning features automatically and capturing long-distance dependence as well.However, this method is complicated,and has defects in speed.Therefore,we propose a gated recurrent unit (GRU) neural net- work for Chinese word segmentation,which are also associated with advantages of learning features automatically and the ability of capturing long-distance dependence.Finally,our method performs comparably well as the LSTM neural network for Chinese word segmentation,and exhibits a great improvement in training and predicting speeds.
作者 李雪莲 段鸿 许牧 LI Xuelian DUAN Hong XU Mu(Software School of Xiamen University, Xiamen 361005,China)
出处 《厦门大学学报(自然科学版)》 CAS CSCD 北大核心 2017年第2期237-243,共7页 Journal of Xiamen University:Natural Science
基金 福建省自然科学基金(2013J01250)
关键词 自然语言处理 中文分词 门循环单元 字嵌入 循环神经网络 natural language processing Chinese word segmentation gated recurrent unit (GRU) character embedding recurrent neural networks
  • 相关文献

参考文献3

二级参考文献31

共引文献131

同被引文献215

引证文献22

二级引证文献111

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部