摘要
文章简单考查了目前中文信息处理领域中已有的几种主要的汉语自动分词方法,提出自动分词方法的结构模型ASM(d,a,m),对各种分词方法的时间复杂度进行了计算,对于时间复杂度对分词速度的影响,以及分词方法对分词精度的影响也进行了分析;同时指出并论证在自动分词中设立“切分标志”是没有意义的。
Automatic segmentation of character string into words is now referred to as another bottle-neck problem after Chinese character code in the field of Chinese information processing, On the base of reviewing and analyzing the previous methods of Chinese automatic segmentation, this article established a structure model ASM(d,a,m)to represent all basic methods systematically. And with this model,two new kinds of basic meshods were put forth. Furthermore,the calculation was made on the time complexity of each basic method;the influence of time complexity upon segmentation speed,and that of each basic method upon segmentation accurracacy and intelligent processing were analyzed in detail. Some wrong points of view on the methods of Chinese automatic segmentation were criticized as well.
出处
《中文信息学报》
CSCD
1989年第1期1-9,共9页
Journal of Chinese Information Processing