期刊文献+

基于真实世界的产科病案诊断文本的数据挖掘研究

Study of data mining research based on real-world obstetrical medical records diagnosis text
下载PDF
导出
摘要 目的产科的病案诊断文本,科研价值高但挖掘难度大。本文提出了一种组合算法方法,从文本中自动挖掘出满足科研要求的标准诊断术语,且可在不同医院产科应用。方法本文的组合算法先基于标注语料训练MC-BERT模型,训练后的模型进行术语标准化,再用Louvain算法归类冗余术语,自动输出科研诊断术语。结果组合算法的术语标准化在测试集上的F1达到0.9235,并可自动将1107个标准诊断术语聚类为106个科研诊断术语。组合算法在另一家医院的验证集上也得到了验证,术语标准化算法F1达到0.9094。结论该方法能从病案诊断文本中批量高效获取科研诊断术语,训练后的模型可在多家医院产科应用。 Objective The medical record diagnostic texts of obstetrics are essentially important for scientific research but are difficult to extract.This paper presents a combinatorial algorithm to automatically extract standard diagnostic terms from the diagnostic texts and can be applied in different hospitals'obstetrics.Methods A combined algorithm was proposed as method.First,the MC-BERT model was trained based on the labeled corpus,and the trained model was used to standardize the terms.Then,the Louvain algorithm was used to classify redundant terms and automatically output scientific research diagnostic terms.Result The term normalization of the combined algorithm achieved an F1 of 0.9235 on the test set,and could automatically cluster 1107 standard diagnostic terms into 106 scientific research diagnostic terms.The combined algorithm was also validated on the validation set of another hospital,and the F1 of the term normalization algorithm reached 0.9094.Conclusion This method can efficiently obtain scientific research diagnostic terms in batches from the diagnostic texts of medical records,and the trained model can be applied in many hospitals'obstetrics.
作者 马银瑶 毕文帅 毛锦江 孟晨伟 吕翰林 王雷 MA Yinyao;BI Wenshuai;MAO Jinjiang;MENG Chenwei;LYU Hanlin;WANG Lei(Department of Obstetrics,Guangxi Zhuang Autonomous Region People's Hospital,Guangxi Zhuang Autonomous Region,Nanning530000,China;Institute of Biointelligence Technology,BGI Research-Shenzhen,Guangdong Province,Shenzhen518083,China;Department of Obstetrics,Guigang City People's Hospital,Guangxi Zhuang Autonomous Region,Guigang537000,China)
出处 《中国当代医药》 CAS 2023年第20期23-28,F0003,共7页 China Modern Medicine
基金 广西重点研发计划项目(桂科AB22035056)。
关键词 产科 诊断术语 真实世界 组合算法 数据挖掘 Obstetrics Diagnostic terms Real-world Algorithms combination Data mining
  • 相关文献

参考文献3

二级参考文献27

共引文献53

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部