摘要
介绍了一种变长汉语语料自动分词方法,这种方法以信息理论中极限熵的概念为基础,运用汉字字串间最大似然度的概念,对汉语语料进行自动分词.讨论了这些方法的局限性,并列出了一些试验结果.
A variable distance automatic word segmentation method to Chinese corpus is presented.It is based on the concept of limiting entropy in information theory, and utilizes the maximum likelihood between the strings of Chinese characters to do automatic Chinese word segmentation.A method of establishing unsupervised dynamic word segmentation dictionary is specially studied.The limitations of these methods are described.Some experimental results are also covered.
出处
《北京邮电大学学报》
EI
CAS
CSCD
北大核心
1997年第4期66-69,共4页
Journal of Beijing University of Posts and Telecommunications
关键词
信息处理
汉语语料库
自动分词
information processing
Chinese text corpuse
automatic word segmentation