摘要
自动作文评分中一项重要的特征就是语言错误。该特征的准确识别和提取,既能为作文评分提供依据,又可为学生提供详尽的语言使用情况的反馈。通过统计大规模英语语料库中词汇前后接续的频次,获得本族语词汇二元接续数据,并据此对中国大学生英语作文进行标注并分析低频接续情况,发掘出低频接续但属于正确使用的语言规律,构造过滤规则。结合使用过滤规则后与词频分布数据,词汇接续错误识别准确率接近69%,从而能够为自动作文评分和反馈提供更多的支持。
Error is one of the most important features in automated essay scoring (AES) research. The accurate identification and extraction of this feature not only provide support for essay scoring but also offer feedback about language use for writers. The frequencies of binary adjacent word pairs (BAWPs) in large corpus of native English speakers were counted to retrieve the data of BAWPs as the foundation of the research. BAWPs in Chinese college students' English compositions were tagged with the frequencies appearing in native corpus and low frequency BAWPs that are correct were analyzed to construct filter rules of misreport. Combining with these rules and word frequency lists, the highest precision of error identification is close to 69%, which can greatly facilitate AES research.
出处
《外语电化教学》
CSSCI
2010年第4期15-20,共6页
Technology Enhanced Foreign Language Education
基金
广东外语外贸大学2009年度校级青年项目(面向大学英语教学的自动作文评分和反馈研究)
关键词
二元接续
错误识别
大学英语写作
自动作文评分
Binary Adjacent Words
Error Detection
College English Writing
Automated Essay Scoring