基于最大似然估计方法的齐普夫定律验证

Validation of Zipf's Law Based on Maximum Likelihood Estimation

下载PDF

导出

摘要文章采用最大似然估计的方法对齐普夫分布曲线进行拟合。该方法对齐普夫定律的词谱分布,利用KS检验的方法得到在双对数坐标下拟合度最优的直线。与传统的最小二乘法相比,该方法拟合结果更为准确。为了验证该方法的有效性,通过3组中英文语料实验发现,英文较好地符合齐普夫定律,中文并不太符合。 This paper proposes a method of how to calculate the slope of Zipf＇s law based on maximum likelihood estimation.In this method,the frequency spectrum forms of Zipf＇s law is adopted for mathematic reasons and the Kolmogorov-Smirnov（KS）method is used to obtain a goodness-of-fit line in dual-logarithm coordinate.Compared with the traditional least square method,the maximum likelihood estimation method is more accurate in fitting results.To validate the method,the paper conducts an experiment with three Chinese and English corpuses.The experiment shows that the English words conform with the Zipf＇s law better,while the Chinese words do not conform with the Zipf＇s law.

作者韩普路高飞王东波

机构地区南京大学信息管理学院

出处《情报理论与实践》 CSSCI 北大核心 2012年第11期6-11,共6页 Information Studies:Theory & Application

基金 "863"计划项目"科技文献服务为主的搜索引擎研制"(项目编号:2011AA01A206) 2011年南京大学研究生科研创新基金资助项目"中英双语文本聚类技术及其应用研究"(项目编号:2011CW12)的成果之一

关键词齐普夫定律最大似然估计词谱分布 Zipf＇s law maximum likelihood estimation word frequency distribution

分类号 G353.1 [文化科学—情报学]

引文网络
相关文献

参考文献19

1ZIPF G K. Human behavior and the principle of least-effort [M]. Cambridge MA: Addison-Wesley, 1949.
2JAYARAM B D, VIDYA M N. Zipf' s law for Indian languages [J]. Journal of Quantitative Linguistics, 2008, 15 (4) : 293- 317.
3TUZZI A, POPESCU I I, ALTMANN G. Zipf' s laws in Italian texts [ J]. Journal of Quantitative Linguistics, 2009, 16 (4) : 354-367.
4ADAMIC L A, HUBERMAN B A. Zipf' s law and the Intemet [J]. Glottometries, 2002 (3): 143-450.
5GABAIX X. Zipf' s law for cities : an explanation [ J ]. Quar- terly Joumal of Economics, 1999, 114: 739-767.
6AXTELL R L. Zipf distribution of U. S. firm sizes [ J ]. Sci- ence, 2001,293 : 1818-1820.
7LI W. Zipf' s law everywhere [J] , Glottometrics, 2005 (5) : 14-21.
8ROUSSEAU R, ZHANG Q. Zipf' s data on the frequency of Chinese words revisited [ J]. Scientometfics, 1992, 24 (2) : 201-220.
9SHENG L, LI C. English and Chinese languages as weighted complex networks [ J ]. Physica A, 2009, 388 ( 12 ) : 2561-2570.
10HA L Q, SICILIA-GARCIA E I, MING J, et al. Extension of Zipf' s law to words and phrases [ C ] //Proceedings of the 19th International Conference on Computational Linguistics ( COLING 2002), 2002 : 315-320.

1丁邦俊.区间删失情况下的Pareto分布估计(英文)[J].应用概率统计,2014,30(4):415-422.
2张立文,周秀轻.P阶自回归模型中的变点检验问题[J].南京师大学报（自然科学版）,2010,33(2):13-17. 被引量：2
3谷秋鹏.最大似然估计在一元线性回归中的应用[J].潍坊学院学报,2012,12(4):51-52.
4秦琼,刘海英,王志平.BA演化模型的一种扩展[J].大连海事大学学报,2006,32(4):121-124.
5张欢.最大似然估计方法在遥感技术中的应用[J].科技经济市场,2016(5):13-13.
6李顺静.基于不完全数据的最大似然估计方法——EM算法[J].重庆工商大学学报（自然科学版）,2014,31(5):29-33. 被引量：3
7兰从庆,王晓珑.超声反射CT中多体位置和形状参数的最大似然估计[J].应用声学,1995,14(2):33-37. 被引量：1
8王欣亭.切削力实验方程式建立方法[J].郑州大学学报（理学版）,1996,32(S2):124-126.
9张婷婷,高金玲.经验logistic回归方法与最大似然估计方法的对比分析[J].佳木斯大学学报（自然科学版）,2014,32(1):139-142. 被引量：1
10赵雪珍.张綖《诗余图谱》与其词创作的呼应[J].文教资料,2016(17):5-6.

情报理论与实践

2012年第11期

浏览历史

内容加载中请稍等...

基于最大似然估计方法的齐普夫定律验证

参考文献19

相关作者

相关机构

相关主题

浏览历史