基于融合模型的中文病历文本智能纠错研究

Research on Chinese Medical Records Error Detection Based on Ensemble Model

下载PDF

导出

摘要目的针对中文医学病历文本,进行错别字智能纠错,以改善中文病历质量,减少诊疗文书差错产生的概率。方法将统计语言模型和基于神经网的预训练模型相融合,进行中文病历文本错别字纠错的训练和验证,最终通过综合指标F1进行模型效果的评估。结果实验结果显示,融合模型的中文病历错别字纠错F1为0.6254,优于单统计语言模型和单预训练模型的F1值0.4813和0.5970。结论基于统计语言模型和预训练模型的融合方法,在中文病历文本错别字纠错方面有较好的效果,对临床病历书写质量的保障有一定的现实辅助意义。 Objective In order to improve the quality of Chinese medical records and reduce the probability of errors in medical documents,this study intends to carry out intelligent error correction for Chinese medical records.Methods Combinedwiththe statistical language model and neural network-based pre-train model,the ensemble model wasused to train and verify the error correction of Chinese medical records.The performance was evaluated by F1,a comprehensivemetric.Results The experimental results showed that F1 of the ensemble model was 0.6254,which was better than that of the single statistical language model and the single pre-train model,with F1 values of 0.4813 and 0.5970 respectively.Conclusion The fusion model based on statistical language model and pre-train model has a good effect in the error correction of Chinese medical record text,and has a certain practical significance for the quality guarantee of clinical medical record writing.

作者姜会珍焦雪莹邹凌伟许仕杰朱卫国 JIANG Huizhen;JIAO Xueying;ZOU Lingwei;XU Shijie;ZHU Weiguo(Peking Union Medical College Hospital,Chinese Academy of Medical Sciences&Peking Union Medical College,Beijing100730,China)

机构地区中国医学科学院北京协和医院北京左医科技有限公司

出处《中国卫生信息管理杂志》 2023年第3期448-453,共6页 Chinese Journal of Health Informatics and Management

基金中国医学科学院医学与健康科技创新工程(项目编号:2021-I2M-1-056)。

关键词自然语言处理中文病历错别字纠错融合模型 natural language processing Chinese medical record error detection model ensemble

分类号 R-034 [医药卫生] R319 [医药卫生—基础医学]

引文网络
相关文献

参考文献4

1王阿铃.病历质量控制两级随机盲态模式的构建与应用[J].中国卫生标准管理,2020,11(18):12-14. 被引量：1
2黄建隆,郭胜杰,孙世传.基于人工智能的病历质控系统研究[J].中国数字医学,2018,13(10):42-43. 被引量：21
3王辰成,杨麟儿,王莹莹,杜永萍,杨尔弘.基于Transformer增强架构的中文语法纠错方法[J].中文信息学报,2020(6):106-114. 被引量：30
4陈欢,张奇.基于话题翻译模型的双语文本纠错[J].计算机应用与软件,2016,33(3):284-287. 被引量：1

二级参考文献30

1吴文健,孙晖,季国忠.电子病历书写时限存在问题及改进[J].中国病案,2020,0(1):9-11. 被引量：21
2Aw A T,Zhang M,Xiao J,et al.A phrase-based statistical model for SMS text normalization[C]//Proceedings of the COLING/ACL on Main conference poster sessions.Association for Computational Linguistics,2006:33-40.
3Kobus C,Yvon F,Damnati G.Normalizing SMS:are two metaphors better than one?[C]//Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1.Association for Computational Linguistics,2008:441-448.
4Han B,Baldwin T.Lexical normalisation of short text messages:Makn sens a#twitter[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies-Volume 1.Association for Computational Linguistics,2011:368-378.
5Liu F,Weng F,Jiang X.A broad-coverage normalization system for social media language[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics:Long Papers-Volume 1.Association for Computational Linguistics,2012:1035-1044.
6Han B,Cook P,Baldwin T.Automatically constructing a normalisation dictionary for microblogs[C]//Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.Association for Computational Linguistics,2012:421-432.
7Wang P,Ng H T.A beam-search decoder for normalization of social media text with application to machine translation[C]//Proceedings of NAACL-HLT,2013:471-481.
8Su J,Wu H,Wang H,et al.Translation model adaptation for statistical machine translation with monolingual topic information[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics:Long Papers-Volume 1.Association for Computational Linguistics,2012:459-468.
9Huang E H,Socher R,Manning C D,et al.Improving word representations via global context and multiple word prototypes[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics:Long Papers-Volume 1.Association for Computational Linguistics,2012:873-882.
10Gruber A,Weiss Y,Rosen-Zvi M.Hidden topic Markov models[C]//International Conference on Artificial Intelligence and Statistics,2007:163-170.

共引文献49

1张生盛,庞桂娜,杨麟儿,王辰成,杜永萍,杨尔弘,黄雅平.面向汉语作为第二语言学习的个性化语法纠错[J].中文信息学报,2021,35(12):28-35. 被引量：4
2王子斌,张全,谢聪,余沛,余泓江,李沣庭.基于知识图谱与BERT的安全领域汉字文本纠错模型[J].计算机应用,2023,43(S01):75-80.
3张梅,纪天啸.面向深度学习的高质量纠错语料库自动生成方法研究[J].北方工业大学学报,2024,36(2):127-132.
4袁靖,卜天,秦晓蕾,崔萌,赵韡.医院病案数字化转型与应用实践[J].中国数字医学,2021,16(9):12-15. 被引量：7
5王伟伟.大数据时代网络安全体系保护患者信息安全的价值研究[J].电脑编程技巧与维护,2019(5):153-155. 被引量：7
6杨利谦,李琪,王天俊.基于医务管理信息系统的医院、科室、医师三级质控体系建设[J].中国医院,2019,23(10):8-9. 被引量：12
7沈鑫,李晓晴,徐翠香,王一波,易智,段降龙.基于人工智能的电子病历实时质量控制探索[J].中华医院管理杂志,2020,36(3):206-209. 被引量：26
8吕亚奇,曾跃萍,宋菲,田明月,陈燕华,张欣.病案多维度终末质控工作的实践与思考[J].中国数字医学,2020,15(12):81-84. 被引量：6
9张化冰,练洋,王坤,王怡,潘慧,朱惠娟.电子病历完整性人工智能筛查辅助教学系统的初步建立和使用[J].高校医学教学研究（电子版）,2020,10(5):32-35. 被引量：6
10丁佳丽,史亚香,焦蕴.基于人工智能的病历质控系统的设计与应用[J].中国数字医学,2021,16(2):45-48. 被引量：9

1马平悦,何亚京,蒋斐斐,谢祥成,徐群红,费晓.PBL联合Mini⁃CEX在留学生内科临床见习中的应用研究[J].中国高等医学教育,2023(4):91-92. 被引量：1
2张帅,高晓苑,杨涛,刘杰.融合半监督学习与RoBERTa多层表征的中文医学命名实体识别[J].软件导刊,2023,22(5):23-28. 被引量：1
3程幼瑜,周书芬,万思思.“互联网+护理”的高血压患者血压控制效果Meta分析[J].深圳中西医结合杂志,2023,33(6):128-131. 被引量：3
4王卫东,张志峰,徐金慧,杨习贝.基于RoBERTa与字词融合的电子病历命名实体识别方法研究[J].江苏科技大学学报（自然科学版）,2023,37(2):47-52. 被引量：1
5袁靖,廉晓丹,刁晓林,秦晓蕾,于飞,卜天,赵韡.基于临床视角的病历书写质量问题原因探讨及改善对策[J].中国病案,2023,24(3):20-22. 被引量：5
6黎芳铃.AI病案质控系统在电子病历质控中的应用现状及发展趋势分析[J].中文科技期刊数据库（全文版）医药卫生,2023(4):20-23. 被引量：1
7黄健格,贾真,张凡,李天瑞.基于多特征嵌入的中文医学命名实体识别[J].计算机科学,2023,50(6):243-250.
8张璟超.邢台地区住院医师规范化培训效果评价研究[J].中文科技期刊数据库（全文版）医药卫生,2021(7):106-107.
9孙德秋,窦水儿,王德礼,孙卫东,孙安源.公立医院绩效考核背景下运用PDCA循环加强中医理法方药一致性的管理[J].江苏卫生事业管理,2023,34(4):491-493. 被引量：3
10韩泓丞,林玉萍,郭钦钵,张栋,许美凤,朱龙飞,李小棉,冯丽丽,岳婕.面向多模态医学语料库的皮肤镜图像分类[J].西北大学学报（自然科学版）,2023,53(3):377-386.

中国卫生信息管理杂志

2023年第3期

浏览历史

内容加载中请稍等...

基于融合模型的中文病历文本智能纠错研究

参考文献4

二级参考文献30

共引文献49

相关作者

相关机构

相关主题

浏览历史