摘要
分词作为自然语言处理(NLP)的第一步,有着不可或缺的作用。中文分词(CWS)由于语言的复杂性成为学者研究热点。根据历年文献资料,分词方法主要包括词典分词、统计分词以及神经网络分词。随着机器学习的发展,神经网络成为分词领域主流算法,基于神经网络的长短时记忆网络和基于统计的条件随机场分词在提高准确度方面作用巨大,准确度达97%。在此之后,分词歧义和未登录词识别得到很大改进,但研究发现改进模型结构带来的准确度波动不大,且增加了模型复杂性,降低了运算速度。卷积神经网络能更好地理解语义信息,利用稀疏连接缩短运算时间、提高效率,成为下一步工作重点。
As the first step of natural language processing(NLP),word segmentation has an indispensable role.Due to the complexity of language,Chinese word segmentation(CWS)has become a hot issue for scholars.According to the literature research,the word seg⁃mentation methods mainly include dictionary word segmentation,statistical word segmentation and neural network word segmentation.Due to a lot of research in machine learning,neural networks have become the mainstream algorithm in the field of word segmentation,long and short-term memory networks based on neural networks and conditional random fields based on statistics Word segmentation plays a huge role in improving accuracy,with an accuracy of 97%.After this,word segmentation ambiguity and unregistered word rec⁃ognition have been greatly improved,but it is found that the accuracy of the improved model structure does not fluctuate much,and the accuracy is increased by increasing the complexity of the model,which impairs the speed of calculation.Convolutional neural network(CNN)can better understand semantic information,use sparse connections to shorten the calculation time,improve efficiency,with becomes the focus of further research.
作者
王佳楠
梁永全
WANG Jia-nan;LIANG Yong-quan(Computer Science and Engineering College,Shandong University of Science and Technology,Qingdao 266590,China)
出处
《软件导刊》
2021年第4期247-252,共6页
Software Guide
关键词
中文分词
自然语言处理
神经网络
卷积神经网络
跨领域分词
分词速度
Chinese word segmentation
natural language processing
neural network
convolutional neural network
cross-domin word segmentation
word segmentation speed