期刊文献+

基于大型语言模型的药理学考试主观题智能评分研究

Research on intelligent scoring of subjective questions in Pharmacology exams based on Large Language Models
下载PDF
导出
摘要 文章探讨大型语言模型(large language model,LLM)在药理学主观题智能评分中的应用效果。选取ChatGPT 4.0、Claude 2、讯飞星火认知大模型3.0、智谱清言3.0和文心一言3.5五种LLM,通过多种评分标准和提示工程技术,对药理学短文本类主观题进行评分。结果显示,ChatGPT 4.0评分上表现最为出色,平均绝对误差率(mean absolute error rate,MAER)和均方根误差(root mean square error,RMSE)分别为0.0517和1.0339,且组内相关系数(ICC)高达0.936,表明其评分具有较高的一致性和准确性。Claude 2紧随其后,MAER和RMSE分别为0.0724和1.2999,ICC为0.893,同样显示出良好的评分性能。其他模型在评分一致性和偏差方面表现较差,尤其是讯飞星火认知大模型3.0,MAER和RMSE分别为0.2828和3.0286,ICC仅为0.217。总体来看,LLM能有效利用其语言理解和逻辑推理能力,实现主观题的智能评分,并提供详尽的评分解析,这有助于提升学生的学习效率和自我评估能力。相比传统人工评分,LLM在主观题智能评分方面具有更高的效率和成本效益。该研究为ChatGPT等先进模型在教育领域的应用提供了新的视角和方法,也为未来教育结合人工智能的发展与应用提供借鉴。 This article explores the application effect of Large Language Model(LLM)in in⁃telligent scoring of subjective questions in Pharmacology.Five LLMs,namely ChatGPT 4.0,Claude 2,iFLYTEK Spark Large Cognitive Model 3.0,ChatGLM 3.0,and ERNIE Bot 3.5,were selected to score the subjective questions of short text of Pharmacology through a variety of scoring standards and prompt engineering techniques.The results showed that in terms of scoring,ChatGPT 4.0 performed the best,with mean absolute error rate(MAER)and root mean square error(RMSE)of 0.0517 and 1.0339,respectively,and intraclass correlation coefficient(ICC)of 0.936,indicating a high level of consistency and accuracy in its scoring.Claude 2 followed closely,with MAER and RMSE of 0.0724 and 1.2999,respectively,and ICC of 0.893,demonstrating good scoring performance.Other models perform poorly in terms of score consistency and bias, especially iFLYTEK Spark Large Cognitive Model 3.0, with MAER and RMSE of 0.282 8 and 3.028 6, respectively, and ICC of only 0.217. Overall, LLM can effectively utilize its language comprehension and logical reasoning abilities, achieve intelligent scoring of subjective questions, and provide detailed scoring analysis, which helps to improve student’s learning efficiency and self-evaluation ability. Compared with traditional manual scoring, LLM has higher efficiency and cost-effectiveness in intelligent scoring of subjective ques⁃ tions. This study provides a new perspective and method for the application of advanced models such as ChatGPT in the field of education, and also provides reference for the development and application of artificial intelligence in future education.
作者 向巴卓玛 王珍珍 畅洪昇 赵岩松 廖国龙 马星光 XIANGBA Zhuoma;WANG Zhenzhen;CHANG Hongsheng;ZHAO Yansong;LIAO Guolong;MA Xingguang(Beijing University of Chinese Medicine,School of Management,Beijing 102488,China;Beijing University of Chinese Medicine,School of Chinese Materia Medica,Beijing 102488,China;Beijing University of Chinese Medicine,School of Traditional Chinese Medicine,Beijing 102488,China)
出处 《中国医学教育技术》 2024年第5期572-579,共8页 China Medical Education Technology
基金 北京中医药大学哲学社会科学培育基金项目“基于敏捷数据管理方法论和大语言模型的医学主观题智能阅卷研究”(2024-JYB-PY-006) 北京中医药大学教育科学研究课题“基于低代码的在线考试系统及可视化分析”(XJY22048)。
关键词 人工智能 大型语言模型 主观题智能评分 药理学 提示工程 artificial intelligence Large Language Models intelligent scoring of subjective questions Pharmacology prompt engineering
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部