期刊文献+

一种基于混合判定模型的复合概念抽取方法 被引量:3

A Method of Compound Concept Extraction Based on Hybrid Judgment Model
下载PDF
导出
摘要 从大规模领域语料库中抽取领域概念,现有方法不能有效识别复合概念.本文提出一种基于混合判定模型的复合概念抽取方法,首先对文本进行分词处理,为每个词条添加词条标签,并对词条集进行噪音词消除和同义词合并处理,然后统计词条的加权词频,根据词条标签值计算位置亲和度和位置匹配度,判定和筛选可组合成复合概念的原子词条,最后通过设置不同复合深度值,实现多重复合概念抽取.采用不同规模语料库进行抽取实验,实验结果表明本文方法具有更高的召回率和准确率. The existing methods could not identify compound concept effectively from large-scale domain corpus.This paper proposes a method of compound concept extraction based on a hybrid model.Firstly,we make segmentation processing for corpus texts and add entry label for each term.We secondly remove noise words and merge synonyms for the entry set.Then we count the weighted term frequency,the location affinity degree,the location matching degree,and make a stepwise estimation to identify composite concept with atomic terms.Ultimately we realize the extraction of multiple-compound concept via giving different compound depth.On the foundation of the extraction method,we carried out the experiments with different corpora for compound concept extraction.The results indicated the method has higher recall and precision.
出处 《电子学报》 EI CAS CSCD 北大核心 2013年第3期488-495,共8页 Acta Electronica Sinica
关键词 语料库 领域概念 复合概念 加权词频 词条标签 位置亲和度 复合深度 corpus domain concept compound concept weighted term frequency entry label location affinity compound depth
  • 相关文献

参考文献12

  • 1李善平,尹奇韡,胡玉杰,郭鸣,付相君.本体论研究综述[J].计算机研究与发展,2004,41(7):1041-1052. 被引量:275
  • 2叶育鑫,欧阳丹彤.混合语义约简和选择估值优化SPARQL[J].电子学报,2010,38(5):1205-1210. 被引量:5
  • 3邱田,李鹏飞,林品.一个基于概念语义近似度的Web服务匹配算法[J].电子学报,2009,37(2):429-432. 被引量:23
  • 4李曼,王大治,杜小勇,王珊.基于领域本体的Web服务动态组合[J].计算机学报,2005,28(4):644-650. 被引量:141
  • 5陈刚,陆汝钤,金芝.基于领域知识重用的虚拟领域本体构造[J].软件学报,2003,14(3):350-355. 被引量:112
  • 6Huaping Zhang, Honglmi Yu, Deyi Xiong, Qun Liu. HHMM- based chinese lexical analyzer ICTCLAS [A]. Proceedings of the Second SIGHAN Workshop on Chinese Language Process- ing [ C]. Morristown, NJ USA: Association for Computational Linguistics,2003,184 - 187.
  • 7崔世起,刘群,孟遥,于浩,西野文人.基于大规模语料库的新词检测[J].计算机研究与发展,2006,43(5):927-932. 被引量:32
  • 8Fuchun Peng, Fangfang Feng, Andrew McCallum. Chinese seg- mentation and new word detection using conditional random fields E A~. Proceedings of the 20th International Conference on Computational Linguistics [C]. Morristown, NJ USA: Associa- tion for Computational Linguistics, 21304.562 - 568.
  • 9Xu Sun, Yaozhong Zhang, Takuya Matsuzaki, Yoshimasa Tsu- ruoka, Jun' ichi Tsujii. A discriminative latent variable chinese segmenter with hybrid word/character information [ A]. Pro-ceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics [C] .Mordstown,NJ USA:Asso- ciation for Computational Linguistics, 2009.56 - 64.
  • 10Ruiqiang Zhang, Keiji Yasuda, Eh'chiro Sumita. Chinese word segmentation and statistical machine translation [ J ]. ACM Transactions on Speech and Language Processing (TSLP), 2008,5(2) : 1 - 19.

二级参考文献125

共引文献570

同被引文献30

  • 1郭雨梅,景勇,郭晓亮,张璐.开放科学形势下科技期刊知识服务平台运营模式探析[J].编辑学报,2023,35(3):273-278. 被引量:14
  • 2张启蕊,张凌,董守斌,谭景华.训练集类别分布对文本分类的影响[J].清华大学学报(自然科学版),2005,45(S1):1802-1805. 被引量:27
  • 3贾秀玲,文敦伟.一种本体学习中分类关系提取方法的研究[J].计算机技术与发展,2007,17(10):31-33. 被引量:11
  • 4张新,党延忠.基于规则与统计的本体概念自动获取方法研究[J].情报学报,2007,26(6):813-820. 被引量:10
  • 5Sun Xu, Zhang Yaozhong, Matsuzaki T, et al. A discriminative latent variable chinese segmenter with hybrid word/character information[C] //Proc of Annual Conference of the North American Chapter of the Association for Computational Linguistics. Morristown, NJ:Association for Computational Linguistics, 2009:56-64.
  • 6Peng Fuchun, Feng Fangfang, McCallum A. Chinese segmentation and new word detection using conditional random fields[C] //Proc of the 20th International Conference on Computational Linguistics. Morristown, NJ:Association for Computational Linguistics, 2004:562-568.
  • 7Zhang Ruiqiang, Yasuda K, Sumita E. Chinese word segmentation and statistical machine translation[J] . ACM Trans on Speech and Language Processing, 2008, 5(2):1- 19.
  • 8VELARDI P, MISSIKOFF M, BASILI R. Identification of relevant terms to support the construction of domain ontologies[C]//Proceedings of workshop on Human Language Technologies.ACL workshop on HLT,Toulouse:ACM,2001:1-8.
  • 9NAVIGLI R, VELARDI P. Learning domain ontologies from document warehouses and dedicated web sites[J]. Computational Linguistics, 2004,30(2): 151-179.
  • 10王大亮,涂序彦,郑雪峰,佟子健.多策略融合的搭配抽取方法[J].清华大学学报(自然科学版),2008,48(4):608-612. 被引量:6

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部