期刊文献+

采用上下文特征匹配的中文机构名简称识别 被引量:4

Chinese Organization Abbreviation Recognition Using Context Features
下载PDF
导出
摘要 现有识别机构名简称的方法多依赖全称,也依赖简称的组成形式.针对这两个问题,提出一种采用上下文特征匹配的机构名简称识别方法.本文提出的上下文特征分为机构名独有特征和干扰词与机构名相交特征,每一个特征赋予一个错误率权重,在不同错误率范围内,采用上下文特征匹配算法识别机构名简称.还通过建立干扰词表和扩展操作,进一步提高了识别的准确率与召回率.实验中,本文方法在封闭数据集上的F值达到92.23%.利用封闭数据集训练的特征和干扰词,在开放测试集上的F值取得70.28%.最后,与依赖全称生成简称的识别方法进行对比,本文方法识别出有匹配全称的简称和无匹配全称的简称,比依赖全称的识别方法有更好的效果. Many existing methods of recognizing organization abbreviations rely on their full-names and component form of organiza-tion abbreviation. Instead of depending on them, thispaperpresents a new method using context feature to recognize the organization ab-breviation. The context feature which has an error rateconsists of the single feature possessed only by organization name and the inter-secting feature of noise word and organization name. This paper chooses the feature within a certain range of error rateand nsesfeaturematching algorithmto recognize the organization abbreviation. Italso establishes noise word list and uses extended operation to furtherimprove the precision rate and the recall rate. The F value of the paper is 92.23% in close set,and it can get the F value of 70.28%in open set making use of the context feature and noise word list trained in close set. At last,comparing with the method based on gen-erating abbreviation from full-name,this paper achieves a better experimental result. Whether the abbreviations match the full-name,they all can be recognized by this method.
作者 郝娟 杨静
出处 《小型微型计算机系统》 CSCD 北大核心 2015年第7期1432-1437,共6页 Journal of Chinese Computer Systems
基金 上海市科委重大项目(12dz1500205)资助 上海国际合作项目(13430710100)资助
关键词 机构名简称 上下文特征 相交特征 独有特征 特征匹配算法 干扰词 organization abbreviations context features intersecting features single features feature matching noise words
  • 相关文献

参考文献3

二级参考文献36

  • 1Wren J D, Chang J T, Pustejovsky J, Adar E, Garner H R, Altman R B. Biomedical term mapping databases. Nucleic Acid Research, 2005, 33: 289-293.
  • 2Yoshida M, Fukuda K, Takagi T. Pnad-css: A workbench for constructing a protein name abbreviation dictionary. Bioinformatics, 2000, 16(2): 169-175.
  • 3Nenadic G, Spasic I, Ananiadou S. Automatic acronym acquisition and term variation management within domain-specific texts. In Proc. the LREC-3, Las Palmas, Spain, 2002, pp.2155-2162.
  • 4Schwartz A, Hearst M. A simple algorithm for identifying abbreviation definitions in biomedical texts. In Proc. the Pacific Symposium on Biocomputing (PSB 2003), pp.451-462.
  • 5Manuel Zahariev. An efficient methodology for acronymexpansion matching. In Proc. the International Conference on Information and Knowledge Engineering ( IKE), Las Vegas, USA, 2003, pp.32-37.
  • 6Adar E. Sarad: A simple and robust abbreviation dictionary. Bioinformatics, 2004, 20(4): 527-533.
  • 7Tsuruoka Y, Ananiadou S, Tsujii J. A machine learning approach to abbreviation generation. In Proc. the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, Michigan, USA, 2005, pp.25-31.
  • 8Fu G, Luke K, Zhang M, Zhou G. A hybrid approach to Chinese abbreviation expansion. In Proe ICCPOL'06: 21st International Conference on Computer Processing of Oriental Languages, Singapore, 2006, pp.277-287.
  • 9Huang C R, Ahrens K, Chen K J. A data-driven approach to psychological reality of the mental lexicon: Two studies on Chinese corpus linguistics. In Proe. Language and Its Psychobiological Bases, Taipei, 1994a.
  • 10Huang C R, Hong W M, Chen K J. Suoxie: An information based lexical rule of abbreviation. In Proc. the Second Pacific Asia Conference on Formal and Computational Linguistics Ⅱ, Japan, 1994b, pp.49-52.

共引文献122

同被引文献36

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部