基于数据增强和扩张卷积的ICD编码分类

ICD coding classification based on data augmentation and dilated convolution

下载PDF

导出

摘要针对ICD编码分类任务存在的标签分布不平衡、临床记录文本过长和标签空间庞大等问题,提出一种基于数据增强和扩张卷积的ICD编码分类方法。首先,引入预训练模型BioLinkBERT,在生物医学领域采用无监督学习方式进行训练,以缓解域不匹配问题;其次,运用Mixup数据增强技术扩充隐藏表示,从而增加数据多样性及提升模型分类的鲁棒性,解决标签分布不平衡问题;最后,利用多粒度扩张卷积有效捕获文本数据中的长距离依赖关系,避免因输入文本过长影响模型效果。实验结果表明,该模型在MIMIC-Ⅲ数据集的两个子集上与多种方法进行比较,相较于基准模型的F_1值和precision@k值分别提升0.4%~1.5%和1.2%~1.6%。因此,本研究为解决ICD编码分类中的挑战提供有效的解决方案。 To address the problems of unbalanced label distribution,excessively long medical record text and large label space in the international classification of diseases(ICD)coding classification task,this paper proposed an ICD coding classification method based on data augmentation and dilated convolution.Firstly,this method introduced the pre-trained model BioLinkBERT,trained in the biomedical domain using unsupervised learning,to alleviate the domain mismatch problem.Secondly,it applied the Mixup data augmentation technique to expand the hidden representations,thereby increasing data diversity and improving model robustness for classification,addressing the problem of imbalanced label distribution.Finally,the model effectively captured long-range dependencies in the text data using multi-granularity dilated convolution,avoiding the impact of long input text on the model’s performance.The experimental results demonstrate that the proposed model achieves notable improvements over the baseline model on two subsets of the MIMIC-Ⅲdataset when compared with various methods.Specifically,the F 1 scores and precision@k values improves 0.4%to 1.5%and 1.2%to 1.6%,respectively.Therefore,this study provides an effective solution to solve the challenges of ICD coding classification.

作者闫婧赵迪孟佳娜林鸿飞 Yan Jing;Zhao Di;Meng Jiana;Lin Hongfei(School of Computer Science&Engineering,Dalian Minzu University,Dalian Liaoning 116600,China;School of Computer Science&Technology,Dalian University of Technology,Dalian Liaoning 116024,China;Dalian Yongjia Electronic Technology Co.,Dalian Liaoning 116024,China)

机构地区大连民族大学计算机科学与工程学院大连理工大学计算机科学与技术学院大连永佳电子技术有限公司

出处《计算机应用研究》 CSCD 北大核心 2024年第11期3329-3336,共8页 Application Research of Computers

基金辽宁省自然科学基金资助项目(2022-BS-104)。

关键词 ICD编码分类 BioLinkBERT预训练模型 Mixup数据增强扩张卷积 ICD code classification BioLinkBERT pre-trained model Mixup data augmentation dilated convolution

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1李元,李睿,林金山,金凌峰,邵先军,张冠军.基于字词混用集成模型的电力变压器缺陷记录文本挖掘方法[J].电力工程技术,2024,43(6):153-162.
2堵红群,李岳阳,崔方正,罗海驰,顾中轩.基于多维度融合的肺结节分类算法[J].中国医学物理学杂志,2024,41(11):1428-1436.
3张自强,刘涛.图注意力神经网络支持下的建筑物形状识别[J].测绘科学,2024,49(9):125-133.
4张君莹,谢培亮,张敏.病案首页智能辅助编码技术应用探析[J].现代医院,2024,24(11):1761-1763.
5陈越昆,马宗泽,郭敬松,李剑锋,张云港.结合混淆数据增强和关系图注意网络的方面级文本情感分析[J].云南民族大学学报（自然科学版）,2024,33(6):767-775.
6袁素华,李毅莲.ICD编码人工智能审核质控模式的设计与效果研究[J].中国病案,2024,25(10):29-32.
7王耀东,于航,李宁,朱力强,史红梅,余祖俊.基于纹理特征增强的重载铁路钢轨缺陷检测算法[J].铁道学报,2024,46(11):93-101.
8胡新荣,陈静雪,黄子键,王帮超,姚迅,刘军平,朱强,杨捷.基于图卷积网络的掩码数据增强[J].计算机应用,2024,44(11):3335-3344.
9张溢文,蔡满春,陈咏豪,朱懿,姚利峰.融合多种参数高效微调技术的深度伪造检测方法[J].计算机科学与探索,2024,18(12):3335-3347.
10黄剑屹,陈济兰.腹腔镜微创手术名称编码分析[J].中国医疗器械信息,2024,30(19):169-171.

计算机应用研究

2024年第11期

浏览历史

内容加载中请稍等...

基于数据增强和扩张卷积的ICD编码分类

相关作者

相关机构

相关主题

浏览历史