大语言模型在检验医学领域的应用潜力与挑战评估

Evaluation of the Application Potential and Challenges of Large Language Models in the Field of Laboratory Medicine

下载PDF

导出

摘要目的评估ChatGPT-4.0、ERNIE Bot-4.0在检验医学领域的应用表现,探讨其在专业领域内的应用潜力及面临的挑战。方法以全国临床医学检验技术(中级)考试真题作为基准,对比2个模型在检验医学知识掌握和答题一致性方面的表现;通过30个检验医学病例评估模型在检验结果解读和辅助诊断方面的能力。结果在临床医学检验技术测试中,2个模型均通过了60%的合格线。ChatGPT-4.0在答题速度和一致性方面优于ERNIE Bot-4.0,但在答题正确率上明显低于ERNIE Bot-4.0(73.25%vs 80.75%),且ERNIE Bot-4.0正确率高于临床检验人员此项考试的平均正确率78.03%。不同题型正确率分析方面,ERNIE Bot-4.0和ChatGPT-4.0均在实验技术题型中表现最差(66.32%和60.53%),在医学基础知识题型上表现最好,成绩都为86.00%。在病例分析测试中,ERNIE Bot-4.0的各项评分均高于ChatGPT-4.0,两者均在常规病例分析上表现良好,但在复杂病例分析中会发生错误。结论在检验医学领域,2个大语言模型都展现出了一定的应用潜力,特别是在中文环境下,ERNIE Bot-4.0在答题正确率和病例分析能力方面显著优于ChatGPT-4.0,这显示了其在国内应用中的相对优势。不过,2个模型在实验技术知识、复杂病例的分析能力以及结果输出的准确性和一致性方面还有待提升。在现阶段,直接将这类通用型大语言模型应用于临床检验结果解读及辅助诊断仍存在一定风险,这为检验报告的解读提供了新的研究方向。 Objective To evaluate the performance of ChatGPT-4.0 and ERNIE Bot-4.0 in the field of laboratory medicine,and explore their application potential and challenges in this professional domain.Methods Using the national clinical medical laboratory technology(intermediate)examination questions as a benchmark,we compared the performance of the two models in mastering laboratory medicine knowledge and answering consistency.We also and assessed the models′ability in interpreting test results and assisting diagnosis through 30 laboratory medicine cases.Results In the clinical medical examination technology test,both models passed the 60%qualification threshold.ChatGPT-4.0 was superior to ERNIE Bot-4.0 in terms of answering speed and consistency,but its answering accuracy was significantly lower than that of ERNIE Bot-4.0(73.25%vs 80.75%).ERNIE Bot-4.0′s accuracy rate was higher than the average accuracy rate of clinical aboratory personnel in this examination(78.03%).In the accuracy analysis of different question types,both performed worst in experimental technology questions(ERNIE Bot-4.0:66.32%,ChatGPT-4.0:60.53%)and best in basic medical knowledge questions(both scoring 86.00%).In the case analysis test,ERNIE Bot-4.0 outperformed ChatGPT-4.0 in all categories.Both models performed well in routine case analysis but made errors in complex case analysis.Conclusion In the field of laboratory medicine,both large language models have shown certain application potential,especially in a Chinese context,where ERNIE Bot-4.0 significantly outperforms ChatGPT-4.0 in terms of answering accuracy and case analysis ability,indicating its relative advantage in domestic applications.However,both models still need improvement in experimental technical knowledge,complex case analysis capabilities,and the accuracy and consistency of result output.At the current stage,there are still certain risks in directly applying such general large language models to clinical test result interpretation and assisted diagnosis,which provides a new research direction for the interpretation of test reports.

作者陆小琴佳薇武宇翔武永康 LU Xiaoqin;JIA Wei;WU Yuxiang;WU Yongkang(Department of Laboratory Medicine,West China Hospital of Sichuan University,Chengdu 610041,Sichan;Jintang First People′s Hospital,Chengdu 610400,Sichan;School of Pharmacy and Laboratory Medicine,Ya′an Vocational and Technical College,Ya'an 625000,Sichuan;Hainan Medical University,Haikou 571199,Hainan,China)

机构地区四川大学华西医院实验医学科金堂县第一人民医院雅安职业技术学院药学与检验学院海南医科大学

出处《临床检验杂志》 CAS 2024年第8期619-623,共5页 Chinese Journal of Clinical Laboratory Science

基金 2023年度四川省留学回国人员科技活动项目(川人社-202303-5)。

关键词大语言模型医学检验人工智能结果解读病例分析 large language model medical laboratory artificial intelligence result interpretation case analysis

分类号 R446 [医药卫生—诊断学]

引文网络
相关文献

参考文献5

1郭华源,刘盼,卢若谷,杨菲菲,徐洪丽,庄严,黄高,宋士吉,何昆仑.人工智能大模型医学应用研究[J].中国科学：生命科学,2024,54(3):482-506. 被引量：15
2柯沛,雷文强,黄民烈.以ChatGPT为代表的大型语言模型研究进展[J].中国科学基金,2023,37(5):714-723. 被引量：9
3夏光辉,曹艳林,陈炳澍,查滨.大模型人工智能技术在医疗服务领域应用的专家共识[J].中国卫生法制,2023,31(5):124-126. 被引量：5
4鄢盛恺,柯元南,李珅珅,姜红,李江,杨辉,武阳丰.相关科室血脂异常患者对检验报告单有用性评价及调脂治疗相关知识调查[J].北京大学学报（医学版）,2010,42(6):675-680. 被引量：13
5杨松,董静肖,赵秀英.检验与临床沟通状况及临床需求分层的调研与分析[J].临床检验杂志,2023,41(12):945-946. 被引量：1

二级参考文献15

1中国成人血脂异常防治指南[J].中华心血管病杂志,2007,35(5):390-419. 被引量：5230
2The Collaborative Research Group for the Second Multi-center Survey of Clinical Management of Dyslipidemia in China.第二次中国临床血脂控制达标率及影响因素多中心协作研究[J].中华心血管病杂志,2007,35(5):420-427. 被引量：126
3Tziomalos K,Athyros VG,Karagiannis A,et al.Dyslipidemia as a risk factor for ischemic stroke[J].Curt Top Med Chem,2009,9(14):1291-1297.
4Miller M.Dyslipidemia and cardiovascular risk:the importance of early prevention[J].QJM,2009,102(9):657-667.
5Smith SC Jr.Multiple risk factors for cardiovascular disease and diabetes mellitus[J].Am J Med,2007,120(3 Suppl 1):S3-S11.
6Morisky DE,Green LW,Levine DM.Concurrent and predictive validity of a self-reported measure of medication adherence[J].Med Care,1986,24(1):67-77.
7王陇德.中国居民营养与健康状况调查报告之一[M].北京:人民卫生出版社,2005,48～77.
8赵连成,梁立荣,陈祚,田秀芝,武阳丰.我国高胆固醇血症患者临床控制状况变化趋势2000年与2004—2006年达标率比较[J].中华心血管病杂志,2007,35(9):861-864. 被引量：12
9鄢盛恺.应进一步加强血脂检验与临床的联系[J].临床检验杂志,2008,26(4):243-245. 被引量：26
10提高临床血脂控制达标率的专家建议[J].中华心血管病杂志,2010,38(4):294-298. 被引量：36

共引文献37

1曹卫华.横断面研究在临床研究中的应用[J].北京大学学报（医学版）,2010,42(6):659-660. 被引量：5
2李佳慧,姜红,孙星河,李坤砷,柯元南,鄢盛恺,武阳丰.临床血脂检验报告单改革对相关科室医生调脂治疗知识和行为的影响效果评价[J].中华心血管病杂志,2012,40(4):318-322. 被引量：7
3姜红,李佳慧,张蕊,李珅珅,李云飞,武阳丰,柯元南,鄢盛恺.改进血脂检验报告单前后门诊患者调脂治疗知识和行为调查[J].中华全科医师杂志,2012,11(7):502-506. 被引量：7
4姜红,武阳丰,柯元南,李佳慧,李珅珅,张蕊,鄢盛恺.临床医务人员对血脂异常防治指南的认知和应用现状调查[J].中华老年心脑血管病杂志,2012,14(9):939-942. 被引量：6
5宋全军,张红.156例高脂血症不规范治疗调查分析[J].农垦医学,2012,34(4):344-345.
6刘琦瑛.血脂异常与他汀类药物的应用[J].中国乡村医药,2013,20(7):76-78.
7李江,鄢盛恺.临床血脂分析与应用新进展[J].临床检验杂志,2013,31(5):324-327. 被引量：10
8孙艳萍,王烨,李影影,邢玉晶,蔡郁.临床药师开展用药教育对老年高血压合并高胆固醇血症患者的影响[J].药品评价,2013,10(24):16-19. 被引量：14
9刘小燕,林炳柱.高龄体检血脂检验分析研究[J].中外医疗,2014,33(10):30-31. 被引量：4
10张敏,张远,龙恩武,闫俊峰.两种不同用药教育方式在老年高血压合并糖尿病住院患者中的应用比较[J].实用医院临床杂志,2016,13(5):131-133. 被引量：14

1周密.熵权-TOPSIS法在Excel中的实操和结果解读[J].数字技术与应用,2024,42(3):182-184.
2薄钧戈,乔亚男,齐琪,刘虎军,黄鑫.探索AIGC技术在高校编程课程中的应用潜力与挑战[J].计算机技术与发展,2024,34(6):214-220. 被引量：1
3纪昊哲,张新.人工智能大模型在机器人运动控制的应用[J].信息与电脑,2024,36(8):62-64.
4张雯.中小城市房地产企业开发项目面临的资金风险与应对策略[J].会计师,2024(5):32-34.
5傅继栋,李立娟,祁庆雨.基于人工智能的建筑施工自动化与智能化探讨[J].新材料·新装饰,2024,6(10):162-165.
6刘农业.电气工程及其自动化的智能化技术运用探究[J].葡萄酒,2023(15):0076-0078.
7李成林.语言因素对小学数学应用题难度的影响[J].学生·家长·社会,2022(46):0047-0049.
8朱云.提升初中英语考试中阅读理解能力的实践策略[J].考试与评价,2024(8):0031-0033.
9谢佩君.农业种植中现代农业机械化的具体应用[J].当代农机,2024(8):38-38. 被引量：1
10万力.人才测评在事业单位人力资源管理中的应用[J].信息产业报道,2020(3):0066-0068.

临床检验杂志

2024年第8期

浏览历史

内容加载中请稍等...

大语言模型在检验医学领域的应用潜力与挑战评估

参考文献5

二级参考文献15

共引文献37

相关作者

相关机构

相关主题

浏览历史