摘要
本文以语料库为基础,以协同语言学理论为指导,对汉语词汇长度与词汇在语篇中的使用频数之间的关系进行计量语言学分析。研究结果表明,词汇的使用频数和词汇长度之间存在明显的依存关系。词汇越长,在语篇中的使用频数越低,二者成反比关系。幂函数模型y=axb能够准确地描述汉语词汇的这种规律性特征。研究结果同时表明,模型参数a具有较强的文本语体区分功能。本研究结果不仅完善了词长和词频之间关系的相关理论,为二者之间关系的语言共性提供了新证据,而且为语体识别和文本分类提供了新的方法和思路。
The present paper reports on a corpus-based quantitative linguistic study on the relationship between word length and word frequency in Chinese within the theoretical framework of Synergetic Linguistics. Results show a high dependency of word frequency on word length. The longer a word is, the less frequently it is used in discourse, which reflects an inverse relation between these two properties. The power model y = axb is proved to fit best the data and capture this regularity. The results further indicate that the parameter of this model, a, is powerful in distinguishing the texts of different styles. The results of the study not only complement the current theories on the relationship between word length and word frequency, providing new evidence for the relationship as a linguistic universal, but also offer a new paradigm for style identification and text classification.
出处
《外国语》
CSSCI
北大核心
2013年第3期29-39,共11页
Journal of Foreign Languages
基金
国家社科基金项目"基于平行语料库的术语自动抽取及双语术语词典编撰研究"(11BYY053)
国家留学基金委公派访问学者(含博士后研究)项目
关键词
词长
词频
计量语言学
语言共性
word length
word frequency
quantitative linguistics
linguistic universals