期刊文献+

基于条件随机场的中医命名实体识别 被引量:38

Recognition of Chinese Medicine Named Entity Based on Condition Random Field
下载PDF
导出
摘要 中医医案蕴藏着丰富的知识,如何完成对海量医案的自动标注以便对其进行知识挖掘显得尤为重要.针对明清古医案中症状、病机的自动识别标注问题,采用了基于条件随机场(CRF)的方法,提出数据清洗以及缩减合并词性以减少特征空间规模.最后,通过仿真实验将该方法与最大熵、支持向量机这两种统计方法进行对比.结果表明:该方法在针对明清古医案中症状、病机这类中医命名实体识别具有明显的优势. Traditional Chinese medicine contains rich knowledge,how to complete the medical case of mass tagging in order to ex- tract their knowledge seems particularly important. This paper uses CRF to mark symptoms and pathogenesis in Medical Records of the Ming and Qing dynasties. In accord with characteristics of chinese we put forward a proposal of data cleansing, and we combine the part of speech in order to reduce the size of feature space. In order to verify the superiority of CRF in Chinese medicine named en- tity identification,we use maximum entropy and SVM to compare with CRF. The results showed that:our methods proposed for the Ming and Qing dynasties in the case of ancient medical symptoms, pathogenesis such Named Entity Recognition of Chinese medicine has a distinct advantage.
出处 《厦门大学学报(自然科学版)》 CAS CSCD 北大核心 2009年第3期359-364,共6页 Journal of Xiamen University:Natural Science
基金 国家自然科学基金(60803078)资助
关键词 条件随机场 中医命名实体 数据清洗 交叉验证 condition random field (CRF) traditional Chinese medicine named entity data cleansing cross validation
  • 相关文献

参考文献7

  • 1Burr Settles. Biomedical named entity recognition using conditional random fields and rich feature sets[C]//Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. Geneva, Switzerland ; COLING, 2004 : 104 -- 107.
  • 2胡俊锋,陈蓉,陈源,陈浩,于中华.一种松耦合的生物医学命名实体识别算法[J].计算机应用,2007,27(11):2866-2869. 被引量:2
  • 3Hieuxuan. FlexCRFs, flexible conditional random fields [EB/OL]. http,//www, jaist, ae. jp. html.
  • 4中国科学院计算技术研究所.汉语词法分析工具ICT-CLAS[EB/0L].http://www.nlp.org.cn/.
  • 5Zhang Leo Maximum entropy modeling toolkit for python and C+ + [EB/OL]. 2007-07. http:Hhomepages, inf. ed. ac. uk/s0450736/maxent_toolkit, html.
  • 6Chang Chihchung, Lin Chihjen. LIBSVM -- a library for support vector machines[EB/OL], http://www, csie.ntu. edu. tw/-cjlin/libsvm.
  • 7俞士汶,段慧明,朱学锋,孙斌.北京大学现代汉语语料库基本加工规范[J].中文信息学报,2002,16(5):49-64. 被引量:126

二级参考文献18

  • 1[美]MANNING CD,[德]SCHUTZE H.统计自然语言处理基础[M].苑春法,李庆中,王昀,等译.北京:电子工业出版社,2005.335-338.
  • 2ROSARIO B,HEARST M.Classifying semantic relations in bioscience text[C] // Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics.[S.l.]:Association for Computational Linguistics,2004:430-437.
  • 3TAMAMES J.Text detective:BioAlma's gene annotation tool[J].BMC Bioinformatics,2005,6:S10.
  • 4CIARAMITA M,GANGEMI A,RATSCH E,et al.Unsupervised learning of semantic relations between concepts of a molecular biology ontology[C]// IJCAI.Berlin:Morgan Kaufinann,2005:659-664.
  • 5CHIANG J H,YU H C.MeKE:Discovering the functions of gene products from biomedical literature via sentence alignment[J].BMC Bioinformatics,2003,19(11):1417-1422.
  • 6ZHOU GUODONG,ZHANG JIE,SU JIAN,et al.Recognizing names in biomedical texts:a machine learning approach[J].Bioinformatics,2004,20(7):1178-1190.
  • 7CUNNINGHAM H.Information extraction:a user guide,CS-97-02[R].Sheffield:University of Sheffield,1997:1-20.
  • 8OHTA T,TATEISI Y,KIM J-D.The GENIA corpus:an annotated research abstract corpus in molecular biology domain[C] // Proceedings of the Human Language Technology Conference.San Diego,California,USA:[s.n.],2002:73-77.
  • 9KRAUTHAMMER M,RZHETSKY A,MOROZOV P,et al.Using BLAST for identifying gene and protein names in journal articles[J].GENE,2000,259(1):245-252.
  • 10OLSSON F,ERIKSSON G,FRANZEN K,et al.Notions of correctness when evaluating protein name taggers[C/OL]// Proceedings of the 19th International Conference on Computational Linguistics.2002:765-771[2007-05-10].http://www.sics.se/~fredriko/papers/coling02.pdf.

共引文献126

同被引文献336

引证文献38

二级引证文献330

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部