期刊文献+

基于加权投票K—近邻法的生物医学缩略语消歧 被引量:3

Disambiguating Biomedical Abbreviations Based on K-Nearest Neighbor with Weighted Voting Method
下载PDF
导出
摘要 生物医学文献信息抽取对充分挖掘利用生物医学领域取得的重要成果,促进生物医学的进一步发展具有重要意义。本文针对生物医学缩略语的分析理解问题,提出了基于加权投票K—近邻法的生物医学缩略语消歧算法。该算法基于"One Sense Per Discourse"假设自动生成带类标实例数据,消歧特征选用能表达文本主题的全局特征词,分类算法采用加权投票K—近邻法。在包含177762篇Medline摘要的真实语料上进行的实验表明,本文所提出的算法明显优于相关工作中的算法。此外,实验还表明,对于缩略语消歧,加权投票K—近邻法与经典K—近邻法相比,不但具有高的预测准确率,而且性能更加稳定。 Information extraction from biomedical literature is very useful for utilizing the achievements in biomedical field and promoting further improvement of Biology and Medicine, This paper, aiming at biomedical abbreviation analysis and understanding, proposes an approach for disambiguating biomedical abbreviations based on K nearest neighbor (K-NN) with weighted voting, In the approach, the samples with labels are generated automatically based on the hypothesis of "One Sense Per Discourse". And the wordsdescribing the topic of a discourse are chosen as the features for abbreviation disambiguation, The classification model used in the approach is based on K-NN with weighted voting. The experimental results on a testing set containing 177 762 Medline abstracts show that the ap proach proposed in the paper can obtain higher precision than others in related work. The experiments also prove that K-NN with weighted voting can get not only higher precision, but also better stability in comparison with the traditional K-NN in abbreviation disambiguation task.
出处 《中文信息学报》 CSCD 北大核心 2008年第2期18-23,共6页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(90409007)
关键词 计算机应用 中文信息处理 生物医学信息抽取 缩略语消歧 加权投票K-近邻法 computer application Chinese information processing biomedical information extraction abbreviationdisambiguation K-NN with weighted voting.
  • 相关文献

参考文献16

  • 1Collier N., Nobata C., and Tsujii J.. Extracting the Names of Genes and Gene Products with a Hidden Markov Model [A]. Proc. of the 18^th International Conference on Computational Linguistics[C]. Saarbrucken, Germany: 2000.
  • 2Fukuda, et al.. Toward Information Extraction:Identifying Protein Names from Biomedical Papers[A]. Proc. of the Pacific Symposium on Biocomputing 98 [C]. Hawaii: 1998.
  • 3Liu H., Johnson S. B., and Friedman C.. Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS[J]. Journal of the American Medical Informaties Association, 9(6): 621- 636.
  • 4Chang J., Schutze H., and Altman R.. Creating an Online Dictionary of Abbreviations from MEDLINE[J]. Journal of the American Medical Informatics As- sociation, 9(6):612-620.
  • 5Schwartz A. and Hearst M.. A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text [A]. Proc. of the Pacific Symposium on Biocomputing 2003 [C]. 2003.
  • 6Pakhomov S.. Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts [A]. Proc. of the 40^th Annual Meeting of the Association for Computational Linguistics (ACL)[C]. 160-167.
  • 7Yu Z. Tsuruoka Y., and Tsujii J.. Automatic Resolution of Ambiguous Abbreviations in Biomedical Texts using Support Vector Machines and One Sense Per Discourse Hypothesis[A]. Proc. of ACM SIGIR'03 Workshop on Text Analysis and Search for Bioinformatics[C]. 57-62.
  • 8Yu H., et al.. Mapping Abbreviations to Full Forms in Biomedical Articles[J]. Journal of American Medical Information Association, 2002,9: 262-272.
  • 9Castano J., Zhang J., and Pustejovsky J.. Anaphora Resolution in Biomedical Literature [A]. International Symposium on Reference Resolution[C]. 2002.
  • 10Yu H., et al.. Automatic Extraction of Gene and Protein Synonyms from Medline and Journal Articles[A]. Proc. of AMIA Symp[C]. 2002. 919-923.

共引文献29

同被引文献76

引证文献3

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部