摘要
术语的抽取是领域本体构建的基础工作,决定了本体构建的质量。获取的术语除了要求有准确的短语识别率,还要求有较高的术语领域度。本文试图研究一种不依赖于背景语料的术语领域度筛选方法。本文的主要工作集中在两个方面:一是通过统计和规则相结合的方法从领域语料中抽取候选术语(短语),二是提出了通过候选术语的分布度、活跃度以及主题度进行计算的多策略术语抽取方法,并通过实验进行了验证和分析。实验结果表明,在小规模航空航天领域语料库上进行验证性实验后发现,在不大量增加计算时间复杂度的情况下,能够有效提高领域术语抽取的质量,获得令人较满意的结果。
Terminology extraction is one of the most important basic prepare work for ontology construction, which assured the qualification of ontologies for building. The acknowledged terminolo- gy should not only have high recognized precision, but also have high termhood in the domain. This paper tried to find a method for terminology extraction not relied on background corpus supported. Our work focused on two aspects, one is discussing a phrase recognized approach based on statistical and Chinese grammar rule, and the other is that we proposed an approach for termhood calculation of candidate terminology which synthesized three factors of distribution degree, activity degree and sub- ject degree. Experiment on testing corpus shows that our method can have good result in terms of precision and recall.
出处
《中国索引》
2013年第1期45-52,共8页
Journal of the China Society of Indexers
基金
教育部人文社会科学青年基金项目《基于知识组织资源仓库的中文本体自动构建研究》(项目编号09YJC870015)
中央高校基本科研业务费专项基金(KYZ201159)《面向qRT-PCR实验的内参基因挖掘技术研究》的研究成果之一
关键词
术语抽取
多策略
术语分布度
术语活跃度
术语主题度
Terminology Extraction, Integrated Strategy, Distribution Degree, Activity Degree,Subject Degree