摘要
研究历史上各个时期汉语文学作品中的字频分布具有重要意义,可以帮助我们更加深入研究汉语言的历史演变,但这在以前的语言统计工作中是缺乏的。该文对唐代以来的文学作品按不同时期进行分类建立语料库,字频分析的结果表明自唐代以来人们使用汉字的习惯处于不断变化之中,时期越相近,汉字的使用习惯就更具一致性。从分布上看,不同时期的字频都可以用一个指数截断的幂律函数进行很好的拟合,随着历史的发展,幂律性质不断衰减而指数性质不断增强。
It is meaningful to investigate character frequency distribution among Chinese literatures across different periods since it could help us to know more about how Chinese language evolves over time. This paper presents the change of Chinese character frequency distribution since Tang Dynasty, by counting the character frequencies of 5 classical as well as modern Chinese literatures. It is clear that two character frequency distributions are more similar when they are derived from closer periods, and all the distributions could be well fitted by exponential power law functions. And the exponential property is increasing while the power law feature is decreasing over time.
出处
《中文信息学报》
CSCD
北大核心
2011年第3期93-97,共5页
Journal of Chinese Information Processing
基金
北京师范大学青年教师科研基金资助项目
关键词
汉语文学作品
字频分布
指数截断的幂律
Chinese literature
character frequency distrihutiont exponential truncated power law