摘要
本文介绍了一个基于自学习的无需人工编制词典的切词系统.首先用统计方法建立起附有相关度的切词词典.然后,提出将切词问题转化为一个有向图中求解最大加权路径问题,并提出利用词典中的相关度信息切分文本的一个新算法.最后,我们对词典和切词的质量都作了系统的分析,并与其他方法作了性能比较.
This paper describes a Chinese word segmentation system without the help of manual dictionary, which is based on automatic machine learning. Using traditional contingency table method, we first, create a dynamic dictionary with coincidence information. Then we argue that the problem of word segmentation can be transfered to the problem of path searching in weigthed graph, and provide a fast algorithm to segment texts in the corpus using the coincidence information. At last, we systematically analysis the accuracy and efficiency of this dictionary, and compare our method with others.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
1996年第4期297-303,共7页
Pattern Recognition and Artificial Intelligence
基金
国家高技术863智能机器人主题
国家自然科学基金
关键词
机器学习
汉字信息处理
词典
切词系统
Binomial Distribution, Coincidence Degree, Generalized Likelihood Ratio, Recall, Precision.