摘要
本文通过动态建立独立成词能力频次库的方法以及基于词语 /词性搭配的规则库 ,对交集型歧义字段进行处理 ,大大提高了切分正确率。在 4万语料的开放测试中 ,交集型歧义字段的切分正确率可达 98%以上。
This paper presents a segmentation strategy on ambiguous phrases of overlap type with rules based on the independent wording ability frequency and collocating of words & parts of speech, which improves the accuracy of segmentation greatly. In an open test of a Chinese corpus with 40000 characters, the accuracy of segmentation for ambiguous phrases of overlap type reached 98%.
出处
《情报学报》
CSSCI
北大核心
2000年第6期637-643,共7页
Journal of the China Society for Scientific and Technical Information
关键词
汉语
自动分词
汉字切分
交集型
歧义
独立成词能力频次
ambiguity segmentation
ambiguous phrases of overlap type
independent wording ability frequency
Chinese word segmentation