期刊文献+

领域本体术语抽取研究 被引量:10

Research on Domain Ontology Term Extraction
原文传递
导出
摘要 【目的】尽可能多地抽取多字词本体术语,以保证本体构建的质量。【方法】提出基于部件扩展的本体术语抽取方法。利用部件的领域聚合性和词性特征,采用领域词频比较的方法抽取部件;考虑术语长度、术语词性构成以及术语内部结合度等因素,设计合理的扩展规则对部件扩展以形成候选术语;利用上下文关联信息、语境信息从候选术语集中筛选出本体术语。【结果】利用该方法在IT领域实验数据集上进行测试,实验结果准确率为83.5%,召回率为87%,准确率相比Baseline方法要高出2.5个百分点。【局限】部件抽取方法需要借助于平衡语料库,部件的质量直接影响术语抽取效果。【结论】实验结果表明该方法是有效的,对本体学习、本体构建具有积极意义。 [Objective] Ontology terms are extracted as more as possible for the quality of Ontology construction. [Methods] This paper proposes an Ontology term extraction method based on term component extension. It uses the polymerization characteristics and POS features of the terms, extracts term components by word frequency comparison approach. Considering the factors of term length, term POS and term internal associative strength of character strings, reasonable extended rules are designed for components extension to get the candidate terms. Then, Ontology terms arefiltered from candidate terms by using the relational information and the contextual information. [Results] Experimental result shows that accuracy rate is 83.5%, the recall rate is 87%, the accuracy rate is 2.5 percentages over the baseline. [Limitations] It needs a balanced corpus to extract term component, and term extracting effect is effected by the quality of the term. [Conclusions] The method is effective and has a positive significance for Ontology learning and Ontology construction etc.
出处 《现代图书情报技术》 CSSCI 北大核心 2014年第1期43-50,共8页 New Technology of Library and Information Service
基金 国家自然科学基金项目"基于本体的专利自动标引研究"(项目编号:61271304) 北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目"面向领域的互联网多模态信息精准搜索方法研究"(项目编号:KZ201311232037)的研究成果之一
关键词 本体术语术语抽取术语部件 部件扩展 Ontology term Term extraction Term component Component extension
  • 相关文献

参考文献18

二级参考文献95

共引文献89

同被引文献170

引证文献10

二级引证文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部