Can ChatGPT evaluate research quality?

下载PDF

导出

摘要 Purpose:Assess whether ChatGPT 4.0 is accurate enough to perform research evaluations on journal articles to automate this time-consuming task.Design/methodology/approach:Test the extent to which ChatGPT-4 can assess the quality of journal articles using a case study of the published scoring guidelines of the UK Research Excellence Framework(REF)2021 to create a research evaluation ChatGPT.This was applied to 51 of my own articles and compared against my own quality judgements.Findings:ChatGPT-4 can produce plausible document summaries and quality evaluation rationales that match the REF criteria.Its overall scores have weak correlations with my self-evaluation scores of the same documents(averaging r=0.281 over 15 iterations,with 8 being statistically significantly different from 0).In contrast,the average scores from the 15 iterations produced a statistically significant positive correlation of 0.509.Thus,averaging scores from multiple ChatGPT-4 rounds seems more effective than individual scores.The positive correlation may be due to ChatGPT being able to extract the author’s significance,rigour,and originality claims from inside each paper.If my weakest articles are removed,then the correlation with average scores(r=0.200)falls below statistical significance,suggesting that ChatGPT struggles to make fine-grained evaluations.Research limitations:The data is self-evaluations of a convenience sample of articles from one academic in one field.Practical implications:Overall,ChatGPT does not yet seem to be accurate enough to be trusted for any formal or informal research quality evaluation tasks.Research evaluators,including journal editors,should therefore take steps to control its use.Originality/value:This is the first published attempt at post-publication expert review accuracy testing for ChatGPT.

作者 Mike Thelwall

机构地区 Information School

出处《Journal of Data and Information Science》 CSCD 2024年第2期1-21,共21页 数据与情报科学学报（英文版）

关键词 ChatGPT Large Language Models LLM Research Excellence Framework REF 2021 Research quality Research assessment

分类号 G05 [文化科学]

引文网络
相关文献

参考文献1

1Tianyu Wu,Shizhu He,Jingping Liu,Siqi Sun,Kang Liu,Qing-Long Han,Yang Tang.A Brief Overview of ChatGPT:The History,Status Quo and Potential Future Development[J].IEEE/CAA Journal of Automatica Sinica,2023,10(5):1122-1136. 被引量：70

共引文献69

1陈然,赵晶.基于样式生成对抗网络的风景园林方案生成及设计特征识别[J].风景园林,2023,30(7):12-21. 被引量：10
2刘明,吴忠明,廖剑,任伊灵,苏逸飞.大语言模型的教育应用:原理、现状与挑战——从轻量级BERT到对话式ChatGPT[J].现代教育技术,2023,33(8):19-28. 被引量：33
3张超,韩虓,王芳.ChatGPT与知识生产和复用:赋能、挑战与治理[J].图书与情报,2023(3):52-60. 被引量：6
4王磊,徐子竞,朱戈,门海.生成式人工智能赋能网络安全人才培养的探索研究[J].中国电化教育,2023(9):101-108. 被引量：12
5刘昭,王波.ChatGPT对数字金融的影响及其法律规制[J].海南金融,2023(9):41-53. 被引量：2
6冯晶晶,刘奕辰,李嘉铭.人工智能对就业影响的分类与评估--基于三维BCG矩阵“技术-经济-文化”分析方法[J].江苏科技信息,2023,40(25):66-73.
7瞿崇晓,郑寄平,张永晋,范长军,刘硕.GPT技术原理及其潜在军事应用研究[J].中国电子科学研究院学报,2023,18(7):624-633. 被引量：4
8曾润喜,秦维.人工智能生成内容的认知风险:形成机理与治理[J].出版发行研究,2023(8):56-63. 被引量：12
9许孝媛.“人有人的用处”:人工智能相关国际电影学术研究综述(2013-2023)[J].电影新作,2023(4):41-52. 被引量：2
10崔思贤,张耀文,贾婕,王旭东.基于AI技术的花境设计应用分析[J].园林,2023,40(12):106-112. 被引量：1

1罗旭飞,吕晗,史乾灵,王子君,刘辉,朱迪,王晔,陈耀龙.大语言模型在循证医学领域的应用[J].中国循证医学杂志,2024,24(4):373-377.
2严毅梅(编译).肝胆系统肿瘤新闻三则[J].癌症康复,2023(4):82-83.
3Albert T Anastasio,Anthony N Baumann,Kempland C Walley,Kyle J Hitchman,Conor O’Neill,Jonathan Kaplan,Samuel B Adams.Academic productivity correlates with industry earnings in foot and ankle fellowship programs in the United States:A retrospective analysis[J].World Journal of Orthopedics,2024,15(2):129-138.
4Editorial office of Baosteel Technical Research.List of winners of excellent papers of Baosteel Technical Research in 2023[J].Baosteel Technical Research,2023,17(4):47-47.
5Jiakai Li,Jianpeng Hu,Geng Zhang.Enhancing Relational Triple Extraction in Specific Domains:Semantic Enhancement and Synergy of Large Language Models and Small Pre-Trained Language Models[J].Computers, Materials & Continua,2024,79(5):2481-2503.
6Wendy Ye.Publisher's Note[J].Genes & Diseases,2024,11(3):1-1.
7赵怀普.欧盟在中美欧互动中的多重角色与中欧关系[J].China International Studies,2024(1):79-102.
8Yuan Yuan,Shangli Ji,Yali Song,Zhaodi Che,Lu Xiao,Shibo Tang,Jia Xiao.Global trends in diabetic eye disease research from 2012 to 2021[J].Neural Regeneration Research,2024,19(10):2310-2320.
9世界观·刊中人[J].南方人物周刊,2023(19):5-5.
10Eustache Megnigbéto.Science collaboration in West Africa after the first regional STI policy (2011-2020)[J].Data Science and Informetrics,2023,3(3):32-52. 被引量：1

Journal of Data and Information Science

2024年第2期

浏览历史

内容加载中请稍等...

Can ChatGPT evaluate research quality?

参考文献1

共引文献69

相关作者

相关机构

相关主题

浏览历史