期刊文献+

中文电子病历命名实体标注语料库构建 被引量:19

The construction of annotated corpora of named entities for Chinese electronic medical records
下载PDF
导出
摘要 针对中文电子病历命名实体语料标注空白的现状,研究了中文电子病历命名实体标注语料库的构建。参考2010年美国国家集成生物与临床信息学研究中心(1282)给出的电子病历命名实体类型及修饰类型的定义,在专业医生的指导下制定了详尽的中文电子病历标注规范;通过对大量中文电子病历的分析,提出了一套完整的中文电子病历命名实体标注方案,而且采用预标注和正式标注的方法,建立了一定规模的中文电子病历命名实体标注语料库,其标注语料的一致性达到了92%以上。该工作对中文电子病历的命名实体识别及信息抽取研究提供了可靠的数据支持,对医疗知识挖掘也有重要意义。 In view of the current blank in semantical annotatxon ot nameo enuuz~ ul ~,,, (CEMRs), a study on construction of annotated corpora for CEMRs' named entities was condueted. By reference to the definitions of named entity type and modification type of electronie medical records given by the US Informat- ics for Integrating Biology and the Bedside (I2B2) in 2010, an annotation specification for CEMRs was developed under the guidance of professional doctors; Based on the analysis of a large number of CEMRs, a complete scheme for annotation of CEMRs' named denties was proposed, and a large-scale annotated corpus for named entities of CEMRs was established by using the methods of pre-annotating and formal annotating. Its annotation consistency is over 92%. This annotated corpora can provide reliable data for named entity recognition for CEMRs and information extraction research, and it is very useful for medical knowledge mining.
出处 《高技术通讯》 CAS CSCD 北大核心 2015年第2期143-150,共8页 Chinese High Technology Letters
基金 国家自然科学基金(60975077)资助项目
关键词 中文电子病历(CEMR) 命名实体 标注语料库 标注规范 标注一致性(IAA) Chinese electronic medical record( CEMR), named entity, annotated corpus, annotation specifi-cation, inter-annotator agreement (IAA)
  • 相关文献

参考文献17

  • 1中华人民共和国卫生部.电子病历基本规范(试行)http://www.mob.gov.cn/publimohyzs/s3585/201003/46174.htm:国家卫生计生委统计信息中心,2010.
  • 2Wasserman R C. Electronic medical records (EMRs), epidemiology, and epistemology: reflections on EMRs and future pediatric clinical research. Academic Pediatrics, 2011,11 (4) :280-287.
  • 3Pestian J P, Brew C, Matykiewicz P, et al. A shared task involving multi-label classification of clinical free text. In: Proceedings of the Workshop on BioNLP 2007: Biologi- cal, Translational, and Clinical Language Processing, Stroudsburg, USA, 2007. 97-104.
  • 4Voorhees E, Tong R. Overview of the TREC 2011 medi- cal records track. In: Proceedings of the 20th Text RE- trieval Conference Proceedings, Montgomery, USA, 2011.
  • 5Hersh W R, Voorhees E M. Overview of the TREC 2012 medical records track. In: Proceedings of the 21st Text REtrieval Conference Proceedings, Montgomery, USA, 2012.
  • 6任彩玲.电子病历遭遇三大障碍[J].信息系统工程,2008,21(2):28-30. 被引量:7
  • 7Xia F, Yetisgen-Yildiz M. Clinical corpus annotation: challenges and strategies. In: Proceedings of the Third Workshop on Building and Evaluating Resources for Bio- medical Text Mining (BioTxtM2012) in conjunction with the International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, 2012.
  • 8Uzuner O, Solti I, Xia F, et al. Community annotation experiment for ground truth generation for the i2b2 medi- cation challenge. Journal of the American Medical lnfor- matics Association, 2010, 17(5): 519-523.
  • 9Uzuner O, South B R, Shen S D S. 2010 i2b2 / VA challenge on concepts , assertions , and relations in clini- cal text. Journal of the American MedicaE lnformatics As- sociation, 2011,18(5) :552-557.
  • 10Bodenreider O. The unified medical language system ( UMLS ) : integrating biomedical terminology. Nucieic acids research ,2004,32( Database issue) :267-270.

二级参考文献138

  • 1车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6. 被引量:115
  • 2林东,邵军力.医学诊疗领域通用专家系统设计与实现[J].自动化学报,1995,21(3):380-382. 被引量:6
  • 3中华人民共和国卫生部.电子病历基本规范(试行)[Online],available:http://www.gov.cn/zwgk/2010-03/04/content_1547432.htm,December27,2013.
  • 4Wasserman R C. Electronic medical records (EMRs), epi- demiology, and epistemology: reflections on EMRs and fu- ture pediatric clinical research. Academic Pediatrics, 2011, 11(4): 280-287.
  • 5Uzuner O, Mailoa J, Ryan R, Sibanda T. Semantic relations for problem-oriented medical records. Artificial Intelligence in Medicine, 2010, 50(2): 63-73.
  • 6Demner-Fushman D, Chapman W W, McDonald C J. What can natural language processing do for clinical decision sup- port? Journal of Bioxnedical Informatics, 2009, 42(5): 760- 772.
  • 7Eysenbach G. Recent advances: consumer health informat- ics. British Medical Journal, 2000, 320(7251): 1713-1716.
  • 8Sager N, Friedman C, Lyman M S. Review of Medical lan- guage processing: computer management of narrative data. Computational Linguistics, 1989, 15(3): 195-198.
  • 9National Institutes of Health. Research Repositories, Databases, and the HIPAA Privacy Rule [Online], available: http: / / privacyruleandresearch.nih.gov / pdf/r esearch-r eposit- ories_final.pdf, December 27, 2013.
  • 10Uzuner O, Luo Y, Szolovits P. Evaluating the state-of-the- art in automatic de-identification. Journal of the American Medical lnformatics Association, 2007, 14(5): 550-563.

共引文献126

同被引文献243

引证文献19

二级引证文献139

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部