摘要
试卷自动评分研究中有个十分重要却常被忽略的环节——对于雷同内容的自动检测。翻译考试的雷同译文与一般的重复文档有着不同的特点。通过对比各种文档相似算法,最终选择了特征码的方法用于检测汉译英雷同译文,并结合翻译考试的特点,提出了随机特征码的方法,解决了特征码选取位置难以确定的问题,同时降低了特征码对于编辑差异的敏感度,提高了雷同内容检测的查准率和查全率。该算法的复杂度为线性,适用于大规模翻译试卷的快速检查。
One of the extremely important but always neglected aspects in auto-scoring for tests is the automatic detection of similar answers. The feature of similar answer versions in translation tests is different from that of common duplicated documents. Compared with several document similarity algorithms, the feature code algorithm is finally selected for detection of similar English translations from Chinese. According to the features of similar translations, we propose random feature code approach to solve the problem of feature code position, reducing the sensitivity for trivial differences in feature code fixing while improving precision and recall of detection. The algorithm has linear complexity which is competent for similarity detection in a large-scaled translation exam.
出处
《外语电化教学》
CSSCI
2009年第6期14-17,共4页
Technology Enhanced Foreign Language Education
基金
教育部基地2007年度项目:大规模考试主观题(英汉互译)自动评分系统的研制(编号:07JJD740070)
关键词
自动评分
雷同内容检测
文本相似度
特征码
Auto-scoring
Similarity Detection
Document Similarity
Feature Code