期刊文献+

布茨定律用于中文同频词规律的实证研究

Empirical Study on Applicability of Booth's Law for the Law of Same Frequency Words in Chinese Text
下载PDF
导出
摘要 布茨定律反映了英文文本同频词的分布规律,但布茨定律是否适用于中文文本很少有学者对其进行深入研究。为了探究布茨定律对于中文文本的适用性,揭示中文文本同频词的统计规律,对大量中文文本同频词进行统计研究,实验过程中注重了实验数据规模的选取和文本长度跨度的设计。实验得出:随着文本长度的增大,低频词的同频词数与不同词数的比值并非定值,而是逐渐减小;低频词的同频词数与不同词数的关系呈幂函数增长。另外,随着文本长度的增大,低频词的同频词数与频次为1的同频词数的比值也非定值,而是逐渐增大。上述结果与布茨所做英文的实验不一致,故得出结论:布茨定律不适用于中文文本。 Booth's law reflects the rule of the same frequency words in English text. But there few of scholars give some research of whether Booth' s law can fit Chinese text well. In order to explore the law of Booth' s applicability of the Chinese text, discover the statisti- cal rules of Chinese text with t frequency words, in this paper, a large number of Chinese text carried on the statistical study with frequency words, pay attention to the selection of the size of the experimental date in the process of the experiment and the design of the span length of the text. Experiments shows that along with the increa~ of length of the text, the low-frequency words of the ratio of different number of words is not a fixed value, but the decreases is gradually; The low-frequency words and the different function words are in growth. In addition, the ratio of the number of words occurringn times and words occurring once is not a fixed value, too, but increases gradually with the length of the article growth. The results are inconsistent with Booth' s experimental results on English text. So we give a conclu- sion that Booth' s law doesn' t fit Chinese text.
出处 《情报杂志》 CSSCI 北大核心 2015年第6期62-67,共6页 Journal of Intelligence
关键词 同频词 齐普夫定律 布茨定律 低频词 same frequency words Zipf's law Booth's law low-frequency words
  • 相关文献

参考文献14

  • 1Kingsley, Zipf George. Human Behavior and the Principle of Least Effort:An Introduction to Human Ecology E M ]. Addison- Wesley Press, 1949 : 573-584.
  • 2Li W. Random Texts Exhibit Zipf's-law-like Word frequency Distribution [ J ]. Information Theory, IEEE Transactions on, 1992, 38(6) : 1842-1845.
  • 3Mahowald K, Fedorenko E, Piantadosi S T. Info/information theory: Speakers Choose Shorter Words in Predictive Contexts [J]. Cognition, 2013, 126(2) : 313-318.
  • 4Coustaty M, Pareti R, Vincent N. Towards Historical Document Indexing :Exu-action of Drop Cap Letters[ J]. International Jour-nal on Document Analysis and Recognition (IJDAR), 2011, 14 ( 3 ) : 243-254.
  • 5Jiang Q, Tan C H, Phang C W. Understanding Chinese on Line Users and Their Visits to Websites: Application of Zipf's Law [J]. International Journal of Information Management, 2013, 33 (5) : 752-763.
  • 6Matlaba V J, Holmes M J, McCann P. A Century of the Evolu- tion of the Urban System in Brazil[J]. Review of Urban &Re- gional Development Studies, 2013, 25(3) : 129-151.
  • 7Malevergne Y, Saichev A, Sornette D. Zipf's Law and Maxi- mum Sustainable Growth [ J ]. Journal of Economic Dynamics and Control, 2013, 37(6) : 1195-1212.
  • 8Booth A D. A Law of Occurrences for Words of Low Frequency [J]. Information and control, 1967, 10(4) : 386-393.
  • 9Dahui W, Menghui L, Zengru D. True Reason for Zipf's Law in Language[ J]. Physica A: Statistical Mechanics and its Applica- tions, 2005, 358(2) : 545-550.
  • 10Hanna P, Ming J, Smith F J. Extending Zipf's Law to n-grams for Large CorporaE J]. Artificial Intelligence Review, 2009, 32 (1/4) : 101-113.

二级参考文献33

共引文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部