期刊文献+

AI生成与学者撰写中文论文摘要的检测与差异性比较研究 被引量:9

Detection and Comparative Study of Differences Between AI-Generated and Scholar-Written Chinese Abstracts
下载PDF
导出
摘要 [研究目的]该研究从实证角度对AI生成与学者撰写的中文论文摘要的检测方法进行研究,并分析其文本内容特征差异,可为AI生成文本的自动检测及相关研究提供参考。[研究方法]首先,以图书馆学领域100篇高被引论文为例,基于论文题目应用GPT-4大模型生成相应的摘要,构建分析数据集;其次,采用有监督的机器学习和深度预训练模型对GPT-4生成和学者撰写的摘要进行分类检测,同时采用查重软件对内容的重复率进行检测;最后,分别从摘要长度、句子数量、词汇特征、常用搭配等维度,揭示AI生成与学者撰写中文论文摘要之间的异同点。[研究结论]基于训练语料所搭建的分类器可有效识别中文论文摘要是否由AI生成,其中,逻辑回归(Logistic)、集成学习模型(RF、LightGBM)和BERT模型的F_(1)-Score均超过90%。AI生成的摘要呈现出较高的同质性,具有较强的写作逻辑性,并惯用归纳总结等学术话语体系;而学者撰写的摘要则具有显著的个性化差异,使用凸显实际含义的搭配较多,并常用与国家政策密切相关的词语。 [Research purpose]This study investigates the detection methods of AI-generated and scholar-written Chinese paper abstracts from an empirical perspective,and analyzes the differences of text content features,providing a reference for the automatic detection of AI-generated text and related research.[Research method]First,using 100 highly cited papers in the field of library science as an example,we generate corresponding abstracts based on the paper titles using the GPT-4 large model,and construct an analysis dataset.Next,we employ supervised machine learning and deep pre-trained models to classify and detect GPT-4-generated and scholar-written abstracts,and use plagiarism detection software to examine content duplication rates.Finally,we reveal the similarities and differences between AI-generated and scholar-written Chinese paper abstracts in terms of abstract length,sentence count,lexical features,and common collocations.[Research conclusion]The classifier built based on the training corpus can effectively identify whether the Chinese paper abstract is generated by AI,among which,the F_(1)-Score of logistic regression(Logistic),ensemble learning models(RF,LightGBM)and BERT model are all over 90%.AI-generated summaries present a high degree of homogeneity,have strong writing logic,and habitually use academic discourse systems such as induction and summary;while the abstract written by scholars has significant individual differences,uses more word combinations that highlight the actual meaning,and often uses words closely related to national policies.
作者 王一博 郭鑫 刘智锋 王继民 Wang Yibo;Guo Xin;Liu Zhifeng;Wang Jimin(Department of Information Management,Peking University,Beijing 100871;Peking University Library,Beijing 100871)
出处 《情报杂志》 北大核心 2023年第9期127-134,共8页 Journal of Intelligence
基金 国家社会科学基金重点项目“开放科学数据集统一发现的关键问题与平台构建研究”(编号:20ATQ007)的研究成果。
关键词 图书馆学 AIGC GPT-4 论文摘要 摘要检测 文本分类 library science AIGC GPT-4 paper abstract abstract detect text classification
  • 相关文献

参考文献15

二级参考文献76

共引文献795

同被引文献88

引证文献9

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部