中文文本可读性探讨：指标选取、模型建立与效度验证

Investigating Chinese Text Readability: Linguistic Features, Modeling, and Validation

导出

摘要本研究根据中文特性发展可读性指标，接着建立中文文本可读性数学模型，并进行模型效度验证。本研究以所发展24个可读性指标为预测变项，386篇教科书文章之年级值为效标变项，建立逐步回归（stepwise regression）与SVM可读性数学模型，再以96篇新文章为测试资料进行模型验证。研究结果显示：在逐步回归模型中，难词数、单句数比率、实词频对数平均与人称代名词数为重要的预测变项；以SVM模型F-score方法所得的重要预测变项则为难词数、二字词数、字数与中笔画字元数等。逐步回归模型与SVM模型对新文章的预测正确性分别为55.21%及72.92%，两种模型预测低年级文章之正确性均高於高年级文章。 This study aims to (a) develop readability indicators based on the textual factors that influence reading comprehension; (b) construct the readability model for Chinese text; and (c) validate the proposed readability models. This study constructs readability models employing step regression and SVM, using 24 readability indicators as its predictive variable and the grade level of 386 textbook articles as the criteria. The proposed models are then validated according to an additional 96 texts. The results show that in step regression, the critical predictors are the number of complex words, proportion of simple sentences, average logarithm of content word frequency, and number of personal pronouns. In the SVM model, the critical predictors selected by using the F-score include the number of complex words, number of two-character words, number of characters, and number of intermediate-stroke characters. The accuracy rates of step regression and SVM are 55.21% and 72.92%, respectively. Both models predict the texts more accurately at the lower grade levels than at the higher grade levels.

作者宋曜廷(Yao-Ting Sung) 陳茹玲(Ju-Ling Chen) 李宜憲(Yi-Shian Lee) 查日龢(Jih-Ho Cha) 曾厚強(Hou-Chiang Tseng) 林維駿(Wei-Chun Lin) 張道行(Tao-Hsing Chang) 張國恩(Kuo-En Chang)

出处《中華心理學刊》

关键词可讀性正確性逐步迴歸 SVM數學模型 accuracy readability stepwise regression support vector machine

中華心理學刊

2013年第1期

浏览历史

内容加载中请稍等...

中文文本可读性探讨：指标选取、模型建立与效度验证

相关作者

相关机构

相关主题

浏览历史