摘要
中文自动分词技术是自然语言处理领域一项很重要的基础工作,随着信息的几何级增长,对目前的分词方法提出了更高要求。本文从中文分词的研究现状出发,首先列举了一些具有代表性的典型分词系统,比较了当今主流的三种分词方法:基于字符串匹配、基于理解和基于统计的分词方法,并对分词问题中的歧义和未登录词识别两大难点进行了重点讨论,最后总结归纳了中文分词技术的研究进展,并对其未来发展方向进行了展望。
Chinese word segmentation is a vital and foundation task in natural language processing field. With the exponential growth of information, it demands a higher level of segmentation method. Starting from current research situation, we firstly listed some typical segmentation systems and compared three segmenta- tion methods which were based on string matching, understanding and statistics respectively. In addition, we focused on discussion on two difficult points, the ambiguity in segmentation and the recognition of unknown word. Finally, we concluded the development of Chinese word segmentation and did some prospect of its fu- ture.
出处
《科技广场》
2009年第11期222-225,共4页
Science Mosaic
基金
国家社会科学基金项目(编号:07BTQ025)
江西省教育厅科技项目重点项目(赣教技字[2006]320号)
国家大学生创新实验计划项目(0810042102).
关键词
中文分词
分词技术
自然语言处理
Chinese Word Segmentation
Word Segmentation Technology
Natural Language Processing