摘要
针对科技类学术论文的跨语种反抄袭识别问题,以中英跨语种抄袭的识别为目标展开了研究,用于探讨进行跨语种抄袭识别的方法.通过挖掘中文译文的内在规律找到了一组可以表明译文风格的译文特征,并通过这些译文特征和决策树算法识别出存在抄袭嫌疑的科技论文.试验系统开放测试的准确率和召回率分别到达了88.68%和79.17%.
Research on anti plagiarism detection of scientific papers in single language has acquired rele- vance and a number of practical systems have been developed. However, the relevant study and achieve- ment are relatively few in cross-lingual anti-plagiarism. Targeting at scientific papers, this paper discussed the implementation of Chinese-English cross-lingual plagiarism detection. The paper locates a set of trans- lation features by digging internal laws of Chinese translation. Through these features, papers which are suspected of plagiarism can be identified by the decision tree algorithm. In open test, its recalling rate achieves 88.68% and the precision rate 79.17%.
出处
《上海交通大学学报》
EI
CAS
CSCD
北大核心
2012年第6期989-993,998,共6页
Journal of Shanghai Jiaotong University
基金
教育部科技论文快速共享专项研究课题(2010121)
国家高技术研究发展计划(863)项目(2010AA012505)
关键词
论文抄袭
译文特征
跨语种
决策树
paper plagiarism
translation feature
cross-language
decision tree