期刊文献+

基于词形规则模板的术语层次关系抽取方法 被引量:11

Term Hierarchical Relation Extraction Method Based on Morphology Rule Template
下载PDF
导出
摘要 术语层次关系抽取是领域概念关系体系构建的重要基础。针对目前术语关系抽取中手工实现的问题,提出了基于词形规则模板匹配的术语层次关系抽取方法,实现从科技论文文本中抽取类属关系(IS-A)和整体部分关系(PART—OF)关系。利用复合术语的head和modifier特征,比较两个术语之间存在的边缘共用词汇,构造模板来确定它们之间的IS—A和PART-OF关系;提出泛化度指标,用于测量两个术语在概念层次树上的相对位置;提出相关度概念,用于测量两个术语之间在语义上的相关性。对不存在共用词汇和不匹配模板的术语采用泛化度差值和相关度来判断它们之间是否存在层次关系。实验从信息资源管理领域的论文文本中提取层次关系术语对1306对,准确率达到92.5%,证明提出的方法是有效的。 A term relationship extraction method was put forward integrating morphology rules and statistics analysis to extract two types of hierachical relations among term pairs : IS-A and PART-OF. In morphology rule analysis, five types of templates were designed to judge hierachical relations among terms with common left or fight sub-string using head or modifier feature of multi terms. A generation index was put forward to measure generation degree of a term and judge hieraehieal position of two terms; an association index was put forward to measure association degree of term pair and judge similarity relation of two terms in concept tree. Presented methods contains following process: term generation measure computation, term pair association degree computation, morphology rule analysis and template match, non-match term pair relationship judgement. In experiments, 1306 hierarchical relation term pairs were extracted from information resource management paper corpus, and the precision is 92.5%.
出处 《情报学报》 CSSCI 北大核心 2013年第7期708-715,共8页 Journal of the China Society for Scientific and Technical Information
基金 中国科学技术信息研究所预研项目“基于内容和链接的学术社交网络分析”(YY201221) “十二五”国家科技支撑计划项目“面向外文科技知识组织体系的大规模语义计算关键技术研究”(2011BAH10804)“基于STKOS的知识服务应用示范”(2011BAH10B06) 中国人民大学明德学者科学研究基金(中央高校基本科研业务费专项资金资助)“知识工程背景下信息资源管理术语构建方法研究”项目(10XNJ052)资助
关键词 术语关系抽取 层次关系 词形规则 文本挖掘 term relation extraction, hierarchical relation, morphology rule, text mining
  • 相关文献

参考文献20

  • 1何琳.基于多策略的领域本体术语抽取研究[J].情报学报,2012,31(8):798-804. 被引量:16
  • 2孙霞,王小凤,董乐红,吴江.术语关系自动抽取方法研究[J].计算机科学,2010,37(2):189-191. 被引量:7
  • 3Boguraev B, Kennedy C. Applications of term identificationtechnology : domain description and content characterization[J]. Natural Language Engineering, 1999,5( 1 ) : 17-44.
  • 4ISO. Terminology work-principles and methods[ S]. 2009.
  • 5贾秀玲,文敦伟.一种本体学习中分类关系提取方法的研究[J].计算机技术与发展,2007,17(10):31-33. 被引量:11
  • 6Mark Sanderson, Croft Bruce. Deriving concept hierarchiesfrom text [ C]. // Proceedings of the 22nd annualinternational ACM SIGIR conference on Research anddevelopment in information retrieval, 1999 : 206-213.
  • 7Brian Roark, Charniak Eugene. Noun-phrase Co-occurrence Statistics for Semi-automatic SemanticLexicon Construction [ C]. //36th Annual Meeting of theAssociation for Computational Linguistics and 17 thInternational Conference on Computational Linguistics,1998: 1110-1116.
  • 8Hui Yang, Callan Jamie. A Metric-based Frameworkof the 47th annual meeting of the Association forComputational Linguistics and 4th international jointconference on natural language processing of the AsianFederation of Natural Languages processing 2009 ( ACL-IJCNLP 2009) ,2009:271-279.
  • 9Hearst M A. Automatic acquisition of hyponyms fromlarge text corpora [ C]. 1992: 539-545.
  • 10Ellen Riloff,Shepherd Jessica. A Corpus-based Approachfor Building Semantic Lexicons [ C]. 1997 : 117-124.

二级参考文献79

共引文献70

同被引文献120

引证文献11

二级引证文献49

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部