期刊文献+

中文病历文档术语提取和否定检出方法 被引量:9

Term Extraction and Negation Detection Method in Chinese Clinical Document
下载PDF
导出
摘要 利用生物医学术语系统中的词汇和概念,为存有大量珍贵信息的非结构化临床文档建立有效的索引,以便进行信息挖掘和利用,国际上相关研究已经开展多年,而基于中文病历文档概念索引的研究尚属空白。本研究将现有的中文版的国际疾病分类(ICD)集成到统一医学语言系统(UMLS)中,依据中文语言处理的特殊性,对中文电子病历文档进行统计分析,提出了一套中文病历文档术语提取和否定检出的方法,可用于建立中文病历文档的概念索引。术语提取阶段利用高灵敏的最大匹配法并结合通用分词技术来控制假阳性;而在概念否定意义检出部分,充分利用中文特点并基于现有中文处理技术提出了一种简化的子句模式匹配方法。选取了两组医疗文本数据集对算法进行了验证,术语提取算法的灵敏性分别为99.51%和100%,误检率分别为1.46%和1.66%。否定检出算法的阳性预测值均为100%,阴性预测值分别为100%和98.99%,除标点使用不规范等文书质量问题外,基本可以正确检出。 Narrative clinical documents contain a wealth of information for medical study.Indexing these documents using concepts in a biomedical terminology can improve information retrieval and mining in medical records.International studies in this domain have developed for several years,but the study based on Chinese clinical document remains a blank.After analyzing special character of Chinese medical language,this paper integrated Chinese version of International Classification of Disease(ICD) to the Unified Medical Language System(UMLS) terminology system and proposed a set of term extraction and negation detection method for Chinese clinical document which could be used to build concept-based index for documents.In the term extract phase the high-sensitivity Reverse Maximum Matching(RMM) method was used and a general Chinese word segmentation tool was used to decline false positive results.In negation detection phase,a simplified syntax pattern matching was proposed.Two algorithms were tested and evaluated in 2 clinical documents data sets.Term extract algorithm had a sensitivity of 99.51% and 100% while wrong detection rate 1.46% and 1.66%.Both negation detection algorithms had a positive predictive value of 100%,and negative predictive values of 100% and 98.99%.The negation detection algorithm could perfectly work except unusual punctuation used in clinical documents.
出处 《中国生物医学工程学报》 CAS CSCD 北大核心 2008年第5期716-721,734,共7页 Chinese Journal of Biomedical Engineering
基金 国家863项目(2006AA02Z348)
关键词 医学语言处理 术语提取 否定检出 medical language processing term extract negation detection
  • 相关文献

参考文献19

  • 1Van Mulligen EM, Stare H, Van Ginneken AM. Clinical data entry. [A]. In : Proceedings/ AMIA Annual Symposium [C]. Orland : Hanley&Belfus, 1998.81 - 85.
  • 2Los RK, Van Ginneken AM, Van Der Lei J. OpenSDE: A strategy. for expressive and flexible structured data entry [J]. International Journal of Medical Informatics, 2005, 74:481 -490.
  • 3Tange HJ. Consultation of medical narratives in the electronic medical record [J]. Methods of Informatlon in Medicine, 1999,38 (4 - 5) :289 - 293.
  • 4Berg M, Langenberg C, Berg I, et al. Considerations for sociotechnical desgin: experiences with an electronic patient record in a clinical context [J]. International Journal of Medical Informatics, 1998,52(1-3):243 - 251.
  • 5Salton G. Automatic Text Processing : The Transformation, Analysis, and Retrieval of Information by Computer [M]. Boston : Addison-Wesley Longman Publishing Co Inc, 1989.
  • 6Aronson AR, Rindflesch TC, Browne AC. Exploiting a large thesaurus for information retrieval [ A]. In: Proceedings of RIAO [C]. New York: ACM, 1994. 197 - 216.
  • 7NLM. Unified Medical Language System (UMLS) [S].
  • 8Rindflesch TC, Aronson AR. Ambiguity resolution while mapping free text to the UMLS Metathesaurus [A]. In: Proceedings- The Annual Symposium on Computer Applications in Medical Care [C]. Washington: Heinley&Belfus, 1994. 240- 244.
  • 9Elkin PL, Cimino JJ, Lowe HJ, et al. Mapping to MESH: the art of trapping MESH equivalence from within narrative text [ A ]. In : Proceedings- The Annual Symposium on Computer Applications in Medical Care [C]. Washington DC : IEEE CS Press, 1988. 185 - 190
  • 10Wagner MM. An automatic indexing method for medical documents [A]. In: Proceedings - The Annual Symposium on Computer Applications in Medical Care [C]. New-York: McGraw-Hill 1991. 1011- 1017, 1018.

二级参考文献13

  • 1黄昌宁.关于处理大规模真实文本的谈话[J].语言文字应用,1993(2):1-10. 被引量:25
  • 2夸克等.英语语法大全[M].华东师范大学出版社,1988.
  • 3白拴虎.汉语词性自动标注系统研究[D].清华大学计算机科学与技术系硕士学位论文,1992.
  • 4Collins, M. and Brooks, J. Preposition phrase attachment through a backed-off model. In: Proceedings of the 3rd WVLC, Cambridge, MA, 1995.
  • 5Schank, R., and Abelson, R. Scripts, Plans, Goals and Understanding: An Inquiry into Human Knowledge Structures. Hillsdale: Lawrence Erlbaum Associates, Publishers, 1977.
  • 6Rich, Elaine. Artificial Intelligence. London: McGraw-Hill Book Company, 1983,295--344.
  • 7In: Artificial Intelligence at MIT: Expending Frontiers, Vol.1. Winston, P. H., and Shellard, S.A. (eds.). Cambridge, Mass: MIT Press, 1990.
  • 8Garside, R., Leech, G. and Sampson, G. (eds.). The Computational Analysis of English: A Corpus-Based Approach. London: Longman, 1989.
  • 9吴栋.中文信息检索引擎中的若干技术.
  • 10吴应良,韦岗,李海洲.一种基于N-gram模型和机器学习的汉语分词算法[J].电子与信息学报,2001,23(11):1148-1153. 被引量:23

共引文献37

同被引文献63

引证文献9

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部