The International Classification of Diseases(ICD)is an international standard and tool for epidemiological in-vestigation,health management,and clinical diagnosis with a fundamental role in intelligent medical care.Th...The International Classification of Diseases(ICD)is an international standard and tool for epidemiological in-vestigation,health management,and clinical diagnosis with a fundamental role in intelligent medical care.The assignment of ICD codes to health-related documents has become a focus of academic research,and numerous studies have developed the process of ICD coding from manual to automated work.In this survey,we review the developmental history of this task in recent decades in depth,from the rules-based stage,through the traditional machine learning stage,to the neural-network-based stage.Various methods have been introduced to solve this problem by using different techniques,and we report a performance comparison of different methods on the pub-licly available Medical Information Mart for Intensive Care dataset.Next,we summarize four major challenges of this task:(1)the large label space,(2)the unbalanced label distribution,(3)the long text of documents,and(4)the interpretability of coding.Various solutions that have been proposed to solve these problems are analyzed.Further,we discuss the applications of ICD coding,from mortality statistics to payments based on disease-related groups and hospital performance management.In addition,we discuss different ways of considering and evaluat-ing this task,and how it has been transformed into a learnable problem.We also provide details of the commonly used datasets.Overall,this survey aims to provide a reference and possible prospective directions for follow-up research work.展开更多
基金Beijing Municipal Natural Science Foundation(Grant No.M22012)BUPT Excellent Ph.D.Students Foundation(Grant No.CX2021122).
文摘The International Classification of Diseases(ICD)is an international standard and tool for epidemiological in-vestigation,health management,and clinical diagnosis with a fundamental role in intelligent medical care.The assignment of ICD codes to health-related documents has become a focus of academic research,and numerous studies have developed the process of ICD coding from manual to automated work.In this survey,we review the developmental history of this task in recent decades in depth,from the rules-based stage,through the traditional machine learning stage,to the neural-network-based stage.Various methods have been introduced to solve this problem by using different techniques,and we report a performance comparison of different methods on the pub-licly available Medical Information Mart for Intensive Care dataset.Next,we summarize four major challenges of this task:(1)the large label space,(2)the unbalanced label distribution,(3)the long text of documents,and(4)the interpretability of coding.Various solutions that have been proposed to solve these problems are analyzed.Further,we discuss the applications of ICD coding,from mortality statistics to payments based on disease-related groups and hospital performance management.In addition,we discuss different ways of considering and evaluat-ing this task,and how it has been transformed into a learnable problem.We also provide details of the commonly used datasets.Overall,this survey aims to provide a reference and possible prospective directions for follow-up research work.