摘要
本文采用计算语言学方法,使用文本自动分类模型考察篇章结构特征如何影响文本的可读性。本文设计了篇章标注规范和标注集,对“统编版语文教材语料库”进行篇章特征标注;然后抽取文本篇章特征,讨论其与文本可读性的相关关系;最后,使用支撑向量机进行可读性自动分级实验,考察篇章特征对文本难度的预测能力。实验结果显示:篇章结构特征的加入能够明显提升文本分级的效果,与词汇语法特征的对比实验结果说明篇章特征对文本可读性有正面影响。本文的工作将文本可读性研究向篇章层面推进,为相关研究和应用提供了参考。
In this paper,we use computational linguistics to investigate how the structural features of a text affect the readability of a text using an automatic text classification model.We started with annotating discourse information in the Chinese Textbook Corpus.Then,the correlation between discourse features extracted from the corpus and text readability was examined.Finally,we tested the predicting accuracy of discourse features in automatic text readability assessment experiments employing SVM models.Results from the experiments show that discourse features significantly raise the f-score of automatic text readability assessment.Positive influences of discourse features are also observed from comparison experiments using lexical and syntactic features.It advances the study of text readability to the level of text,and provides reference data for related research and application.
作者
柏晓鹏
吉伶俐
BAI Xiaopeng;JI Lingli
出处
《语言文字应用》
CSSCI
北大核心
2022年第3期62-72,共11页
Applied Linguistics
基金
上海市哲学社会科学一般课题“现代汉语多义词义项可区分度的计算”(2020BYY015)
华东师范大学新文科创新平台项目“语言认知与演化跨学科协同研究创新平台”(2022ECNU-XWK-XK005)
华东师范大学文化传承创新研究专项项目“面向国际中文教学的可靠性数据问题研究”(2022ECNU-WHCCYJ-25)的资助。
关键词
文本可读性
篇章结构特征
分级阅读
文本自动分类
教材语料库
text readability
discourse features
graded reading
automatic text classification
textbook corpus