摘要
文采是文本的重要属性,在写作实践与阅读体验中都发挥着重要的作用,但目前对汉语文本文采属性的量化研究仍存有不足。本文基于已有研究,构建了一个适用于评价汉语文本文采的语言特征体系,并基于机器学习模型考察了该体系在不同来源、粒度以及文采混合程度语料上的文采自动评估效果。实验结果表明:(1)本文所建立的特征体系对不同类型的文本具有普遍适应性,能够有效地对汉语文本文采进行评估与判断(在不同语料中的加权F1值可达89.94%与78.25%);(2)形体、语义层面的语言特征能够显著影响文采自动评估的效果,而语音层面的语言特征影响则较小,其中影响最大的特征维度为语言多样性、语言熟悉度、语言复杂度、语义具象性与语篇辞格,关键性特征为平均对数字频、平均词汇习得等级、辞格平均使用数、平均词义具体值、未登录词比例、人称代词比例、语义分布与感官形容词比例8项语言指标。
Literary grace is an important attribute of texts,and it plays an important role in both writing practice and reading experience.However,there are still insufficient quantitative studies.Based on the existing research,this paper constructs a linguistic feature system suitable for the literary grace of Chinese texts,and examines the effect of automatic evaluation of the system in terms of different sources,granularity and mix degrees of literary grace.The results show that:(1) the established linguistic feature system has universal adaptability to different types of texts,and can effectively identify and judge the literary grace level of Chinese texts(the weighted F1 value in different corpora can reach 89.94% and 78.25%).(2) Linguistic features at the physical and semantic levels can significantly affect the effect of automatic evaluation of literary grace,while linguistic features at the phonetic level have little impact.Among them,the most influential feature dimensions are language diversity,language familiarity,language complexity,semantics concreteness,and discourse rhetoric.The key features are word frequency,word acquisition level,figure-of-speech number,word-specific meaning value,proportion of unregistered words,proportion of the personal pronouns,semantic distribution and proportion of sensory adjectives.
作者
李怡
王诗可
于东
刘鹏远
LI Yi;WANG Shike;YU Dong;LIU Pengyuan
出处
《语言文字应用》
北大核心
2023年第1期130-144,共15页
Applied Linguistics
基金
教育部人文社会科学研究青年基金(19YJCZH230)
北京语言大学梧桐创新平台(21PT04)
研究生创新基金(22YCX134)的资助。
关键词
汉语文本文采
语言特征
机器学习
文采自动评估
literary grace of Chinese text
linguistic features
machine learning
automatic evaluation of literary grace