摘要
本研究以上海交通大学英语水平考试的汉译英试题为例,探讨了AI自动评分原理与机制,并基于大规模考试数据验证其评分效度。研究发现,AI自动评分结果与人工评分相关系数可达0.76,两种评分结果的均值无显著性差异,但在高分段和低分段人工评分的质量更高。本研究还对AI自动评分在大规模考试中应用的可行性以及目前存在的若干问题进行了探讨。
This study provided an explication of the working mechanism of an AI based automatic scoring engine for Chinese-English translation of SJTU-EPT,and evaluated the validity of automatic scoring using large-scale test data.The results showed that the correlation coefficient between automatic scores and human scores was as high as 0.76,and there was no significant difference between the mean scores generated by the two scoring methods.Human scoring,however,slightly outperformed automatic scoring in grading translations of high and low quality.Discussions were presented on practicality of employing automatic scoring in large-scale tests and current problems associated with the performance of AI based automatic scoring.
作者
张利东
朱一清
ZHANG Lidong;ZHU Yiqing
出处
《外语界》
CSSCI
北大核心
2022年第2期41-48,55,共9页
Foreign Language World
基金
上海市浦江人才计划(编号15PJC072)的资助。
关键词
深度学习
自动评分
效度验证
汉译英
deep learning
automatic scoring
validation
Chinese-English translation