摘要
针对书面汉语全切分中普遍存在的重复切分问题进行了研究.首先给出了重复切分的定义,然后分析指出切分歧义是引起重复切分的必然原因,从而使得重复切分的存在具有必然性和普遍性,另外讨论了两种可供选择的克服重复切分的方案.最后,对重复切分在全切分中出现的几率及对切分时间的影响进行了实验.实验结果显示,重复切分约占全切分的87%,消除重复切分后全切分的切分时间比消除前节省约84%.
This paper gave a research on repetitive word segmentation that existed universally in omni-word segmentation for written Chinese. Firstly this paper gave the definition of the repetitive word segmentation, then pointed out that word segmentation ambiguity was the necessary reason causing it and making it existed universally and inevitably in Omni-word segmentation. Furthermore, this paper discussed two alternative methods to overcome the repetitive word segmentation, and finally gave an experiment on its proportion and influence on Omni-word segmentation. The result proved a ratio of 87% in proportion and a decrease of 84% in segmentation time.
出处
《小型微型计算机系统》
CSCD
北大核心
2006年第3期520-523,共4页
Journal of Chinese Computer Systems
关键词
全切分
重复切分
自然语言处理
omni-word-segmentation
repetitive word segmentation
NLP