摘要
"字"是汉语句子长度测量的可靠单位,通过对120万字汉语本族语者语料中的所有句子进行切分、统计,发现汉语以"字"为单位的句长分布范围为1-63个字,平均句长为10.91个字,最高频句长区间为6-8个字,最常用区间为2-15个字。汉语句子在所有句长上呈"长尾"分布,在高频区间上呈正态分布。1-30个字句长句子的"字"、"词"匹配和各区间句子的频次分布有其内在规律。
This paper analyzes the mean length and distribution of sentence in Chinese native speakers' language materials amount to 1,200,000 Chinese characters. It reveals that the range of Chinese sentence length is 1-63 characters, and the mean lengthof sentence is 10. 91 characters. The range of the most high-frequently used sentences length is 6-8 characters, and the most commonly used length is 2-15 characters. Chinese sentence length presents a " long tail" distribution in all intervals, and a normal distributionin high frequency intervals. The "character-word" matching of 1-30 characters sentences and the frequency distribution of sentencesin different intervals have their own regularities.
出处
《齐齐哈尔大学学报(哲学社会科学版)》
2018年第1期133-138,共6页
Journal of Qiqihar University(Philosophy & Social Science Edition)
基金
黄山学院引进人才启动项目:基于语法信息语料库的韩国学生汉语定中短语习得研究(2017xskq003)
黄山学院示范性课程改革项目:对外汉语教学概论(2016XBJC02)
关键词
字
汉语句子
平均句长
句长分布
characters
Chinese sentences
mean length of sentences
distribution of sentences length