摘要
目的评估ChatGPT-4.0、ERNIE Bot-4.0在检验医学领域的应用表现,探讨其在专业领域内的应用潜力及面临的挑战。方法以全国临床医学检验技术(中级)考试真题作为基准,对比2个模型在检验医学知识掌握和答题一致性方面的表现;通过30个检验医学病例评估模型在检验结果解读和辅助诊断方面的能力。结果在临床医学检验技术测试中,2个模型均通过了60%的合格线。ChatGPT-4.0在答题速度和一致性方面优于ERNIE Bot-4.0,但在答题正确率上明显低于ERNIE Bot-4.0(73.25%vs 80.75%),且ERNIE Bot-4.0正确率高于临床检验人员此项考试的平均正确率78.03%。不同题型正确率分析方面,ERNIE Bot-4.0和ChatGPT-4.0均在实验技术题型中表现最差(66.32%和60.53%),在医学基础知识题型上表现最好,成绩都为86.00%。在病例分析测试中,ERNIE Bot-4.0的各项评分均高于ChatGPT-4.0,两者均在常规病例分析上表现良好,但在复杂病例分析中会发生错误。结论在检验医学领域,2个大语言模型都展现出了一定的应用潜力,特别是在中文环境下,ERNIE Bot-4.0在答题正确率和病例分析能力方面显著优于ChatGPT-4.0,这显示了其在国内应用中的相对优势。不过,2个模型在实验技术知识、复杂病例的分析能力以及结果输出的准确性和一致性方面还有待提升。在现阶段,直接将这类通用型大语言模型应用于临床检验结果解读及辅助诊断仍存在一定风险,这为检验报告的解读提供了新的研究方向。
Objective To evaluate the performance of ChatGPT-4.0 and ERNIE Bot-4.0 in the field of laboratory medicine,and explore their application potential and challenges in this professional domain.Methods Using the national clinical medical laboratory technology(intermediate)examination questions as a benchmark,we compared the performance of the two models in mastering laboratory medicine knowledge and answering consistency.We also and assessed the models′ability in interpreting test results and assisting diagnosis through 30 laboratory medicine cases.Results In the clinical medical examination technology test,both models passed the 60%qualification threshold.ChatGPT-4.0 was superior to ERNIE Bot-4.0 in terms of answering speed and consistency,but its answering accuracy was significantly lower than that of ERNIE Bot-4.0(73.25%vs 80.75%).ERNIE Bot-4.0′s accuracy rate was higher than the average accuracy rate of clinical aboratory personnel in this examination(78.03%).In the accuracy analysis of different question types,both performed worst in experimental technology questions(ERNIE Bot-4.0:66.32%,ChatGPT-4.0:60.53%)and best in basic medical knowledge questions(both scoring 86.00%).In the case analysis test,ERNIE Bot-4.0 outperformed ChatGPT-4.0 in all categories.Both models performed well in routine case analysis but made errors in complex case analysis.Conclusion In the field of laboratory medicine,both large language models have shown certain application potential,especially in a Chinese context,where ERNIE Bot-4.0 significantly outperforms ChatGPT-4.0 in terms of answering accuracy and case analysis ability,indicating its relative advantage in domestic applications.However,both models still need improvement in experimental technical knowledge,complex case analysis capabilities,and the accuracy and consistency of result output.At the current stage,there are still certain risks in directly applying such general large language models to clinical test result interpretation and assisted diagnosis,which provides a new research direction for the interpretation of test reports.
作者
陆小琴
佳薇
武宇翔
武永康
LU Xiaoqin;JIA Wei;WU Yuxiang;WU Yongkang(Department of Laboratory Medicine,West China Hospital of Sichuan University,Chengdu 610041,Sichan;Jintang First People′s Hospital,Chengdu 610400,Sichan;School of Pharmacy and Laboratory Medicine,Ya′an Vocational and Technical College,Ya'an 625000,Sichuan;Hainan Medical University,Haikou 571199,Hainan,China)
出处
《临床检验杂志》
CAS
2024年第8期619-623,共5页
Chinese Journal of Clinical Laboratory Science
基金
2023年度四川省留学回国人员科技活动项目(川人社-202303-5)。
关键词
大语言模型
医学检验
人工智能
结果解读
病例分析
large language model
medical laboratory
artificial intelligence
result interpretation
case analysis