摘要
词和短语的区分问题是汉语语言学研究的一个重点和难点,传统的"结构稳固、意义凝聚、音节长度适中"并不能很好地解决这一问题。近年来,从"频率"角度对这一问题展开的讨论增加,但主要限制在关于"词感"的讨论方面,还缺乏大规模数据统计的支持。"频率"是否真正适合作为区分词和短语的一个标准还没有定论。本文基于对近5亿字现代汉语语料2-gram串统计结果最高频1000个字符串的考察,得出:"频率"还不能直接作为界定词的标准,它在解决词和短语区分的模糊地带方面能否发挥较大作用还需进一步探讨。
The distinction between words and phrases is an important and difficult problem in Chinese linguistics.The traditional standard of“the solidity of structure,the non-transparency of meaning,the moderation of syllable length”is not a good solution to this problem.In recent years,discussions from the“Frequency”perspective increased,but they were mainly restricted on the“instinctive feel of a word”,which lack large-scale data statistical support.If“Frequency”is really suitable to be the distinguishing standard between words and phrases has not reached a conclusion.In his article we investigated a modern Chinese corpus of nearly 500 million characters,and draw a conclusion based on the analysis of the 1000 high-frequency 2-grams:“Frequency”is not defined in the proper meaning of a word,nor be directly used as a distinguishing standard.And if it could play a greater role in the fuzzy regions of distinction of words and phrases needs to be further explored.
出处
《现代语文》
2018年第7期129-134,共6页
Modern Chinese
基金
广东省教育厅"创新强校工程"青年创新人才类项目(人文社科)[项目编号:2017WQNCX046]"基于汉语二语语料库的语言复杂度计量研究"资助
广东省哲学社会科学"十二五"规划2013年度项目[项目编号:GD113YWW02]资助