摘要
解决交集型分词歧义问题,对于大规模语料库建设具有十分重要的意义。我们用基于词的二元模型对两个各200万字的语料库中的三字长交集型字串进行了消歧实验,封闭测试正确率达到99%以上,开放测试正确率达到90%以上,比以往最好结果有明显的提高。
It is very important to solve the crossing ambiguities in word segmentation for Chinese information processing. We employ the word-based bi-gram to discriminate the 3-character crossing ambiguous string in two corpora. The precision rates are above 99% and 90% respectively in close test and open test, which are much higher than the best results yielded before.
出处
《南京师大学报(社会科学版)》
CSSCI
北大核心
2004年第6期109-113,共5页
Journal of Nanjing Normal University(Social Science Edition)
关键词
中文信息处理
基于词的二元模型
交集型分词歧义
Chinese information processing
Word-based Bi-gram
crossing ambiguities in Chinese word segmentation