摘要
Zipf定律是一个反映英文单词词频分布情况的普适性统计规律。我们通过实验发现,在现代汉语的字、词、二元对等等语言单位上,其频度与频级的关系也近似地遵循Zipf定律,说明了Zipf定律对于汉语的不同层次的语言单位也是普遍适用的。本文通过实验证实了Zipf定律所反映的汉语语言单位频度—频级关系,并进而深入讨论了它对于汉语自然语言处理的各项技术。
Zipf's law has been widely researched by the linguists and statisticians.The frequency of English words is the most famous example of Zipf's law .In this paper,by means of experiments,we show that Zipf's law is also available in many language structures of Chinese (Chinese character, Chinese word,Chinese word bigram,etc),And Zipf's law has great effect on many technologies of Chinese language processing, especially the construction of Chinese computational language model.
出处
《中文信息学报》
CSCD
北大核心
1999年第2期8-15,共8页
Journal of Chinese Information Processing
基金
国家八六三项目资助
关键词
语言单位
汉语
计算语言模型
频度-频级关系
Zipf's law Chinese character frequency Chinese word frequency Chinese bigram frequency