摘要
中文分词作为实现机器处理中文的一项基础任务,是近几年的研究热点之一。其结果对后续处理任务具有深远影响,具备充分的研究意义。通过对近5年分词技术研究文献的综合分析,明晰后续研究将以基于神经网络模型的融合方法为主导,进一步追求更精准高效的分词表现。而在分词技术的发展与普及应用中,亦存在着制约其性能的各项瓶颈。除传统的歧义和未登录词问题外,分词还面临着语料规模质量依赖和多领域分词等新难题,针对这些新问题的突破研究将成为后续研究的重点之一。
As a basic task of machine processing, Chinese word segmentation is one of the research hotspots in recent years. The results have a far-reaching impact on the follow-up processing tasks, and are of full research significance. Through the comprehensive analysis of the research literature on word segmentation technology in the past five years, it is clear that the follow-up research will be dominated by the fusion method based on neural network model, and further pursue more accurate and efficient word segmentation performance. In the development and application of word segmentation technology, there are also various bottlenecks restricting its performance. In addition to the traditional ambiguity and unknown words, word segmentation is now faced with new problems such as corpus scale and quality dependence and multi-domain word segmentation. The breakthrough research on these new problems will become one of the focuses of the follow-up research.
作者
钟昕妤
李燕
ZHONG Xin-yu;LI Yan(School of Information Engineering,Gansu University of Traditional Chinese Medicine,Lanzhou 730101,China)
出处
《软件导刊》
2023年第2期225-230,共6页
Software Guide
基金
甘肃中医药大学研究生创新基金项目(2022CX137)。
关键词
中文分词
深度学习
语料依赖
多领域分词
Chinese word segmentation
deep learning
corpus dependence
multi-domain participle