期刊文献+

采用连续词袋模型(CBOW)的领域术语自动抽取研究 被引量:20

Automatic Extraction of Domain Terms Using Continuous Bag-of-Words Model
原文传递
导出
摘要 【目的】更准确便捷地完成术语词汇的自动抽取。【方法】利用CBOW模型计算构成术语的各个词部件的向量空间模型。通过词向量之间的余弦相似度衡量术语词汇内部各个词部件的关联度。利用Page Rank算法计算候选词汇的领域代表性并排序,通过阈值的设定,抽取出更为具有领域代表性的术语词汇。【结果】在以自然语言处理领域内的论文摘要作为数据集的实验中取得较高的准确率和召回率。【局限】测试的数据训练集偏小,而数据集的训练效果直接影响实验的效果。【结论】实验结果表明利用CBOW模型完成术语的抽取工作是一个较为合理、可行的方法。 [Objective] This study tries to extract domain terms more accurately and conveniently. [Methods] First, proposed a method using the CBOW model to build word vectors for each component of the terms. Then, applied the cosine similarity to calculate the internal correlation degree among each term’s individual components. To get more representative terms, we used the Page Rank algorithm to rank the candidates. [Results] We obtained high recall and precision rates using the paper abstacts in the field of natural language processing as the training pool. [Limitations] The training pool was relatively small, which might influence the results. [Conclusions] This study shows that CBOW model is a more appropriate method to extract terminologies.
作者 姜霖 王东波
出处 《现代图书情报技术》 CSSCI 2016年第2期9-15,共7页 New Technology of Library and Information Service
基金 南京农业大学人文社会科学研究基金项目"人文社会科学组块级汉英平行语料库构建及知识挖掘研究"(项目编号:SK2013023) 国家自然科学基金项目"基于CSSCI的句法级汉英平行语料库构建及知识挖掘研究"(项目编号:71303120)的研究成果之一
关键词 术语抽取 神经网络 CBOW模型 Terminology extraction Neural network Continuous Bag-of-Words Model
  • 相关文献

参考文献14

  • 1吴云芳,穗志方,邱利坤,宋作燕,胡俊峰.信息科学与技术领域术语部件描述[J].语言文字应用,2003(4):34-39. 被引量:16
  • 2Bourigault D. Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases [C]. In: Proceedings of the 14th Conference on Computational Linguistics. Association for Computational Linguistics, 1992: 977-981.
  • 3Justeson J S, Katz S M. Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text [J]. Natural Language Engineering, 1995, 1 (1): 9-27.
  • 4Ananiadou S. A Methodology for Automatic Term Recognition [C]. In: Proceedings of the 15th Conference on Computational Linguistics. Association for Computational Linguistics, 1994: 1034-1038.
  • 5张锋,许云,侯艳,樊孝忠.基于互信息的中文术语抽取系统[J].计算机应用研究,2005,22(5):72-73. 被引量:36
  • 6Frantzi K, Ananiadou S, Mima H. Automatic Recognition of Multi-word Terms: The C-value/NC-value Method [J]. International Journal on Digital Libraries, 2000, 3(2): 115-130.
  • 7ManningCD,SchutzeH.统计自然语言处理基础[M].范春法译.第4版.北京:电子工业出版社,2005:95-97.
  • 8Takeuchi K, Collier N. Use of Support Vector Machines in Extended Named Entity Recognition [C]. In: Proceedings of the 6th Conference on Natural Language Learning. Stroudsburg, PA, USA: Association for Computational Linguistics, 2002: 1-7.
  • 9Lafferty J D, McCallum A, Pereira F C. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C]. In: Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers, 2001:282-289.
  • 10章成志.基于多层术语度的一体化术语抽取研究[J].情报学报,2011,30(3):275-285. 被引量:19

二级参考文献45

  • 1刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:197
  • 2侯汉清 ,章成志 ,郑红 .Web概念挖掘中标引源加权方案初探[J].情报学报,2005,24(1):87-92. 被引量:32
  • 3周俊生,戴新宇,尹存燕,陈家骏.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,34(5):804-809. 被引量:111
  • 4Cohen J D. Highlights: language- and domain-independent automatic indexing terms for abstracting[ J]. Journal of the American Society for Information Science, 1995,46 (3) : 162-174.
  • 5Frantzi K T, Ananiadou S, Tsujii J. The C-value/NCvalue method of automatic recognition for multi-word terms [ C ] // Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries ( ECDL 1998 ). Heraklion, Crete, Greece, 1998: 585-604.
  • 6Church K W, Hanks P. Word association norms, mutual information, and lexicography [ J ]. Computational Linguistics,1990, 16( 1 ) : 22-29.
  • 7Dunning T. Accurate methods for the statistics of surprise and coincidence [ J]. Computational Linguistics, 1993, 19(1) : 61-74.
  • 8Daille B. Study and implementation of combined techniques for automatic extraction of terminology [ C ] //Proceedings of the 32th Annual Meeting of the Association for Computational Linguistics [ C ]. New Mexico, USA,1994: 29-36.
  • 9Sornlertlamvanich V, Potipiti T, Charoenporn T. Automatic corpus-based Thai word extraction with the e4.5 learning algorithm [ C] //Proceedings of the 18th Conference on Computational Linguistics. Saarbrticken, Germany,2000: 802-807.
  • 10Patry A, Langlais P. Corpus-based terminology extraction [ C ]// Proceedings of 7th International Conference on Terminology and Knowledge Engineering. Copenhagen, Denmark, 2005 : 313-321.

共引文献177

同被引文献312

引证文献20

二级引证文献98

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部