This paper describes our approach for the Chinese clinical named entity recognition(CNER) task organized by the 2020 China Conference on Knowledge Graph and Semantic Computing(CCKS) competition. In this task, we need ...This paper describes our approach for the Chinese clinical named entity recognition(CNER) task organized by the 2020 China Conference on Knowledge Graph and Semantic Computing(CCKS) competition. In this task, we need to identify the entity boundary and category labels of six entities from Chinese electronic medical record(EMR). We constructed a hybrid system composed of a semi-supervised noisy label learning model based on adversarial training and a rule post-processing module. The core idea of the hybrid system is to reduce the impact of data noise by optimizing the model results. Besides, we used post-processing rules to correct three cases of redundant labeling, missing labeling, and wrong labeling in the model prediction results. Our method proposed in this paper achieved strict criteria of 0.9156 and relax criteria of 0.9660 on the final test set, ranking first.展开更多
基金This work is supported by the National Key R&D Program of China(2020AAA0106400)the National Natural Science Foundation of China(No.61831022,No.61806201)+1 种基金the Key Research Program of the Chinese Academy of Sciences(Grant No.ZDBS-SSW-JSC006)This work is also supported by Beijing Academy of Artificial Intelligence(BAAI).
文摘This paper describes our approach for the Chinese clinical named entity recognition(CNER) task organized by the 2020 China Conference on Knowledge Graph and Semantic Computing(CCKS) competition. In this task, we need to identify the entity boundary and category labels of six entities from Chinese electronic medical record(EMR). We constructed a hybrid system composed of a semi-supervised noisy label learning model based on adversarial training and a rule post-processing module. The core idea of the hybrid system is to reduce the impact of data noise by optimizing the model results. Besides, we used post-processing rules to correct three cases of redundant labeling, missing labeling, and wrong labeling in the model prediction results. Our method proposed in this paper achieved strict criteria of 0.9156 and relax criteria of 0.9660 on the final test set, ranking first.