期刊文献+

在线医疗文本中的实体识别研究 被引量:16

Entity Recognition Research in Online Medical Texts
下载PDF
导出
摘要 针对在线医疗文本,设计考虑医疗领域特性的识别特征,并在自建数据集上进行实体识别实验。针对常见的5类疾病:胃炎、肺癌、哮喘、高血压和糖尿病,采用近年来较先进的机器学习模型条件随机场,进行训练和测试,抽取目标实体包括疾病、症状、药品、治疗方法和检查5类。通过采用逐一添加特征的实验方式,验证所提特征的有效性,取得总体上81.26%的准确率和60.18%的召回率,随后对识别特征给出进一步分析。 The authors design recognition features with the consideration of medical field characteristic for the online medical text, and the experiment of the entity recognition is carried out on the self-built data set. Concerned about five common diseases: gastritis, lung cancer, asthma, hypertension and diabetes. In the experiment, an advanced machine learning model Conditional Random Field is used for training and testing. The target entities include five kinds: disease, symptoms, drugs, treatment methods and check. The effectiveness of the proposed features is verified by using the experimental method, and the accuracy of the total 81.26% is obtained and the recall rate is 60.18%. Subsequently, the further analysis is given for the recognition features.
出处 《北京大学学报(自然科学版)》 EI CAS CSCD 北大核心 2016年第1期1-9,共9页 Acta Scientiarum Naturalium Universitatis Pekinensis
基金 天津市科技支撑项目(13ZCZDGX01098) 天津市自然科学基金(14JCQNJC00600) 中国民航信息技术科研基地开放课题(CAAC-ITRB-201303)资助
关键词 实体识别 数据挖掘 条件随机场 医疗信息 named entity recognition data mining conditional random field medical information
  • 相关文献

参考文献22

  • 1黄丹.网络医疗对医疗服务理念的挑战[J].中药研究与信息,2005,7(9):31-32. 被引量:4
  • 2Grishman R, Sundheim B. Message Understanding Conference-6: a.brief history // COLING. Copen- hagen, 1996, 96:466-471.
  • 3Doddington G R, Mitchell A, Przybocki M A, et al. The automatic content extraction (ACE) program- tasks, data, and evaluation // LREC. Lisbon, 2004: 837-840.
  • 4胡双,陆涛,胡建华.文本挖掘技术在药物研究中的应用[J].医学信息学杂志,2013,34(8):49-53. 被引量:9
  • 5杨锦锋,于秋滨,关毅,蒋志鹏.电子病历命名实体识别和实体关系抽取研究综述[J].自动化学报,2014,40(8):1537-1562. 被引量:121
  • 6DiSantostefano J. International classification of diseases 10th revision (ICD-10). The Journal for Nurse Practitioners, 2009, 5(1): 56-57.
  • 7Lindberg D A, Humphreys B L, McCray A T. The unified medical language system. Methods of Infor- mation in Medicine, 1993, 32(4): 281-291.
  • 8McDonald C J, Overhage J M, Tierney W M, et al. The regenstrief medical record system: a quarter century experience. International Journal of Medical Informatics, 1999, 54(3): 225-253.
  • 9Kraus S, Blake C, West S L. Information extraction from medical notes //Medinfo 2007. Brisbane, 2007: 1-2.
  • 10郑强,刘齐军,王正华,朱云平.生物医学命名实体识别的研究与进展[J].计算机应用研究,2010,27(3):811-815. 被引量:25

二级参考文献225

  • 1车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6. 被引量:115
  • 2林东,邵军力.医学诊疗领域通用专家系统设计与实现[J].自动化学报,1995,21(3):380-382. 被引量:6
  • 3俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:150
  • 4Burr Settles. Biomedical named entity recognition using conditional random fields and rich feature sets[C]//Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. Geneva, Switzerland ; COLING, 2004 : 104 -- 107.
  • 5Hieuxuan. FlexCRFs, flexible conditional random fields [EB/OL]. http,//www, jaist, ae. jp. html.
  • 6中国科学院计算技术研究所.汉语词法分析工具ICT-CLAS[EB/0L].http://www.nlp.org.cn/.
  • 7Zhang Leo Maximum entropy modeling toolkit for python and C+ + [EB/OL]. 2007-07. http:Hhomepages, inf. ed. ac. uk/s0450736/maxent_toolkit, html.
  • 8Chang Chihchung, Lin Chihjen. LIBSVM -- a library for support vector machines[EB/OL], http://www, csie.ntu. edu. tw/-cjlin/libsvm.
  • 9TANABE L, WILBUR W J. A priority model for named entities [ C ]//Proc of Human Language Technology Conference. Morristown : Association for Computational Linguistics, 2006 : 33-40.
  • 10GU Bao-hua. Recognizing nested named entities in GENIA corpus [ C ]//Proc of Human Language Technology Conference. Morristown : Association for Computational Linguistics, 2006 : 112-113.

共引文献210

同被引文献108

引证文献16

二级引证文献111

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部