
基于多策略的领域本体术语抽取研究 被引量:1

Terminology Extraction Research Based on Integrated Strategy Method
摘要 术语的抽取是领域本体构建的基础工作,决定了本体构建的质量。获取的术语除了要求有准确的短语识别率,还要求有较高的术语领域度。本文试图研究一种不依赖于背景语料的术语领域度筛选方法。本文的主要工作集中在两个方面:一是通过统计和规则相结合的方法从领域语料中抽取候选术语(短语),二是提出了通过候选术语的分布度、活跃度以及主题度进行计算的多策略术语抽取方法,并通过实验进行了验证和分析。实验结果表明,在小规模航空航天领域语料库上进行验证性实验后发现,在不大量增加计算时间复杂度的情况下,能够有效提高领域术语抽取的质量,获得令人较满意的结果。 Terminology extraction is one of the most important basic prepare work for ontology construction, which assured the qualification of ontologies for building. The acknowledged terminolo- gy should not only have high recognized precision, but also have high termhood in the domain. This paper tried to find a method for terminology extraction not relied on background corpus supported. Our work focused on two aspects, one is discussing a phrase recognized approach based on statistical and Chinese grammar rule, and the other is that we proposed an approach for termhood calculation of candidate terminology which synthesized three factors of distribution degree, activity degree and sub- ject degree. Experiment on testing corpus shows that our method can have good result in terms of precision and recall.
作者 何琳
出处 《中国索引》 2013年第1期45-52,共8页 Journal of the China Society of Indexers
基金 教育部人文社会科学青年基金项目《基于知识组织资源仓库的中文本体自动构建研究》(项目编号09YJC870015) 中央高校基本科研业务费专项基金(KYZ201159)《面向qRT-PCR实验的内参基因挖掘技术研究》的研究成果之一
关键词 术语抽取 多策略 术语分布度 术语活跃度 术语主题度 Terminology Extraction, Integrated Strategy, Distribution Degree, Activity Degree,Subject Degree
  • 相关文献


  • 1Didier Bourigauh, Electlicit De France. Surface Grammatical. Analysis for the Extraction of Terminological Noun Phrases [ C ]//Proceedings of COLING92. Association for Computational Linguistics. France, 1992 : 977 - 981.
  • 2Justeson J S, Katz S M. Technical terminology: Some linguistic properties and an algorithm for identification in text [J]. Natural Language Engineering, 1995,1 ( 1 ) : 9 - 2"7.
  • 3Frantzi K T, Ananiadou S, Mima H. Automatic Recognition of Multi - word terms: the C - value/NC - value Method [ J]. International Journal on Digital Libraries, 2000, 3 (2) : 115 -130.
  • 4Nakagawa. Experimental Evaluation of Ranking and Selection Methods in Term Extraction [ M ]. Recent Advances in Computational Terminology. 2001 : 303 -326.
  • 5Manning C D, Sehtze H. Foundations of Statistical Natural Language Processing [ M ]. Cambridge Massachusetts : MIT Press. 1999.
  • 6K. A. Ahmad, H. Fulford, M. Rogers. W hat is a term? The semi - automatic extraction of terms from text [J]. Translation Studies. An Inter - discipline ( 1994 ) : 267 - 278.
  • 7Uchimoto K, Sekine S, Murata M, et al. Term recognition using corpora from different fields [ J]. Terminology,2001, 6(2) : 233 -256.
  • 8Chung T. A corpus comparison approach for terminology extraction [ J ]. Terminology,2003,9 (2) : 221 - 246.
  • 9GB/T10112-1999,术语工作原则与方法[S].
  • 10揭春雨,冯志伟.基于知识本体的术语定义(下)[J].术语标准化与信息技术,2009(3):14-23. 被引量:13


  • 1吕学强,张乐,黄志丹,胡俊峰.基于散列技术的快速子串归并算法[J].复旦学报(自然科学版),2004,43(5):948-951. 被引量:4
  • 2Chun-XiaZhang,Cun-GenCao,FangGu,Jin-XinSi.Domain-Specific Formal Ontology of Archaeology and Its Application in Knowledge Acquisition and Analysis[J].Journal of Computer Science & Technology,2004,19(3):290-301. 被引量:8
  • 3冯志伟.术语学中的概念系统与知识本体[J].术语标准化与信息技术,2006(1):9-15. 被引量:37
  • 4张锋,樊孝忠,许云.Chinese Term Extraction Based on PAT Tree[J].Journal of Beijing Institute of Technology,2006,15(2):162-166. 被引量:2
  • 5Oakes M P,Paice C D.Term extraction for automatic abstracting[M] //Bourigault D,Jacquemin C,L'Homme M-C.Recent Advances in Computational Terminology.John Benjamins Publishing Company,2001:353-370.
  • 6Fortuna B,Lavrac N,Velardi P.Advancing Topic Ontology Learning through Term Extraction[C].PRICAI 2008,LNAI 5351,2008:626-635.
  • 7Cerbah F,Euzenat J.Using Terminology Extraction to Improve Traceability from Formal Models to Textual Requirements[C].NLDB 2000,LNCS 1959,2001:115-126.
  • 8Bourigault D.Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases[C] //Proceedings of COLING'92,1992:977-981.
  • 9Frantzi K T,Ananiadou S,Mima H.Automatic Recognition of Multi-word terms:the C-value/NC-value Method[J].International Journal on Digital Libraries,2000,3(2):115-130.
  • 10Yoshida M,Nakagawa H.Automatic Term Extraction Based on Perplexity of Compound Words[C] //IJCNLP 2005:269-279.



  • 1Hahn V.Turning Informal Thesauri into Formal Ontologies:A Feasibility Study on Biomedical Knowledge Re-use [J].Comparative and Functional Genomics,2003,4(1):94-97.
  • 2Missikoff M,Navigli R,Velardi P.Integrated Approach to Web Ontology Learning and Engineering [J].Computer,2002,35(11):60-63.
  • 3Shamsfard M.,Barforoush A.Learning Ontologies from Natural Language Texts [J].International Journal of Human Computer Studies,2004,60(1):17-63.
  • 4Damerau F J.Evaluating Domain-oriented Muiti-Word Terms from Text [J].Information Processing and Management,2006,29(4):433-447.
  • 5Cohen J D.Highlights:Language-and Domain-Independent Automatic Indexing Terms for Abstracting [J].Journal of the American Society Information Science,2007,46(3):162-174.
  • 6ICTCLAS [EB/OL].[2013-07-20].http://ictclas.nlpir.org/.
  • 7程波波,张友华,李绍稳,辜丽川,朱利君.茶学本体学习中的概念抽取[J].计算机系统应用,2010,19(7):111-114. 被引量:2
  • 8丁晟春,傅柱.基于航天叙词表的领域本体半自动化构建研究[J].情报理论与实践,2011,34(11):113-116. 被引量:17
  • 9常春,赖院根.数字环境下通用概念获取方法[J].图书情报工作,2011,55(22):22-25. 被引量:9
  • 10段宇锋,鞠菲.基于N-Gram的专业领域中文新词识别研究[J].现代图书情报技术,2012(2):41-47. 被引量:10










使用帮助 返回顶部