摘要
针对传统基于特征的中文分词模型中,参数相对于训练数据过多而难以准确估计特征权值这一问题,提出了一种基于特征嵌入的神经网络方法.嵌入方法将特征转化为低维实值向量,能有效降低特征维度.另外,为了增强模型的性能,给出了一种学习速率线性衰减方法.研究了正则项的方法来增强模型的泛化能力.实验表明:文中提出的模型可以提高中文分词问题的求解效率.
The feature weights are poorly estimated,because the number of parameters is much greater than the limited amount of training data under the traditional Chinese word segmentation model based on feature. To address above problem,this paper proposed an approach based on feature embedding neural network for Chinese word segmentation. The embedding method can reduce the dimensional of features because the model transforms features into low-dimensional real-valued vectors. In addition,in order to enhance performance of the model,we proposed a learning rate linear decay method.Finally,we studied the regularization method to enhance the generalization ability of the model. The experiment results showed that the model can improve the solving efficiency of Chinese word segmentation.
作者
王文涛
穆晓峰
王玲霞
Wang Wentao Mu Xiaofeng Wang Lingxia(College of Computer Science, South-Central University for National ities, Wuhan 430074, Chin)
出处
《中南民族大学学报(自然科学版)》
CAS
北大核心
2017年第1期102-106,共5页
Journal of South-Central University for Nationalities:Natural Science Edition
基金
国家民委教改项目(15013)
中南民族大学研究生创新基金资助项目(2016sycxjj199)
关键词
中文分词
神经网络
特征嵌入
Chinese word segmentation
neural network
feature embedding