摘要
布茨定律反映了英文文本同频词的分布规律,但布茨定律是否适用于中文文本很少有学者对其进行深入研究。为了探究布茨定律对于中文文本的适用性,揭示中文文本同频词的统计规律,对大量中文文本同频词进行统计研究,实验过程中注重了实验数据规模的选取和文本长度跨度的设计。实验得出:随着文本长度的增大,低频词的同频词数与不同词数的比值并非定值,而是逐渐减小;低频词的同频词数与不同词数的关系呈幂函数增长。另外,随着文本长度的增大,低频词的同频词数与频次为1的同频词数的比值也非定值,而是逐渐增大。上述结果与布茨所做英文的实验不一致,故得出结论:布茨定律不适用于中文文本。
Booth's law reflects the rule of the same frequency words in English text. But there few of scholars give some research of whether Booth' s law can fit Chinese text well. In order to explore the law of Booth' s applicability of the Chinese text, discover the statisti- cal rules of Chinese text with t frequency words, in this paper, a large number of Chinese text carried on the statistical study with frequency words, pay attention to the selection of the size of the experimental date in the process of the experiment and the design of the span length of the text. Experiments shows that along with the increa~ of length of the text, the low-frequency words of the ratio of different number of words is not a fixed value, but the decreases is gradually; The low-frequency words and the different function words are in growth. In addition, the ratio of the number of words occurringn times and words occurring once is not a fixed value, too, but increases gradually with the length of the article growth. The results are inconsistent with Booth' s experimental results on English text. So we give a conclu- sion that Booth' s law doesn' t fit Chinese text.
出处
《情报杂志》
CSSCI
北大核心
2015年第6期62-67,共6页
Journal of Intelligence
关键词
同频词
齐普夫定律
布茨定律
低频词
same frequency words Zipf's law Booth's law low-frequency words