摘要
【目的】考察查重报告中相似比例作为稿件重复与否判断标准的可信度,并识别错判原因。【方法】对CrossCheck/iThenticate生成的642篇查重报告进行人工核查,采用分类算法的评价指标对相似比例的可信度进行评价,并分析错判原因。【结果】整体相似比例[包括总相似比例(TS)和主体部分相似比例(MS)]和单篇相似比例(SS)判断法的正确率均小于75%,SS法的召回率(85%)和精确率(47%)平衡协调较好(F_(1)=0.61),3种判定方法按照相似比例可信度的排序为SS法、MS法、TS法,但仍存在大量错判案例。【结论】设定合适的阈值,MS和SS可作为判断稿件重复与否的参考,但仍需对易出错条目进行人工核对,不宜过度依赖查重系统的检测结果。
[Purposes]This study intends to evaluate whether the similarity indexes in plagiarism check reports are reliable and analyze the reasons for the unreliable cases.[Methods]The plagiarism check reports of 642 papers yielded by CrossCheck/iThenticate were examined.Indexes of the sorting algorithm were used to assess the reliability of the similarity indexes and the reasons for the unreliable cases were analyzed.[Findings]Either overall similarity index percentage[including the total similarity(TS)and the main-body similarity(MS)]methods or single similarity(SS)index percentage method had an accuracy of<75%.With recall of 85%and precision of 47%,SS method had an F_(1)of 0.61.The reliability reduced in the order of SS method,MS method,and TS method.Meanwhile,a great number of manuscripts were incorrectly judged according to the similarity index percentages.[Conclusions]MS and SS can be used as references on condition of appropriate maximum limits,but manual double check is necessary,especially for the error-prone items.
作者
张姣
ZHANG Jiao(Editorial Office of Frontiers of Environmental Science&Engineering,School of Environment,Tsinghua University,1 Qinghuayuan,Haidian District,Beijing 100084,China)
出处
《中国科技期刊研究》
CSSCI
北大核心
2021年第11期1355-1361,共7页
Chinese Journal of Scientific and Technical Periodicals
基金
中国科技期刊卓越行动计划重点期刊项目(2019-B10)
中国工程院战略研究与咨询项目环境与轻纺工程领域组(2021-ZD-01-10)。
关键词
科技期刊
稿件
剽窃
相似比例
指标
评价
Scientific journal
Manuscript
Plagiarism
Similarity index
Index
Evaluation