摘要
本文应用N-最短路径法,构造了一种中文自动分词和词性自动标注一体化处理的模型,在分词阶段召回N个最佳结果作为候选集,最终的结果会在未登录词识别和词性标注之后,从这N个最有潜力的候选结果中选优得到,并基于该模型实现了一个中文自动分词和词性自动标注一体化处理的中文词法分析器。初步的开放测试证明,该分析器的分词准确率和词性标注准确率分别达到98.1%和95.07%。
In this paper, we present a model integrating Chinese word segment with part-of-speech tagging. In the early stage, reserves the top N segmentation results as candidates. After Unknown words recognized and POS tagging finished, we get the final result by select form the top N segmentation candidates. We also develop a Chinese lexical analyzer based on this model. The primary experiment proved that the overall accuracy of the proposed analyzer is 98. 1 for segmentation and 95.7% for POS tagging respectively.
出处
《计算机科学》
CSCD
北大核心
2007年第9期174-175,212,共3页
Computer Science
基金
2002年山东省科技发展计划项目基金资助(项目号:2002-276-022090104)
关键词
中文分词
词性标注
N-最短路径法
Chinese word segmentation, Part-of-speech tagging, N-shortest paths method