期刊文献+

基于句法分析与词向量的领域新词发现方法 被引量:14

Newly-emerging Domain Word Detection Method Based on Syntactic Analysis and Term Vector
下载PDF
导出
摘要 很多已经存在的词汇和词组可能会被运用于它们之前从未被运用过的领域文本中,这样的词汇或词组被称为领域新词。领域新词的发现可以为该领域的研究人员提供最新的领域发展动态,帮助其分析该领域的最新舆情,因此具有非常重要的意义。针对领域新词发现这一问题,文中提出了一种基于依存句法分析与词向量的领域新词发现方法。首先,提出了句法词典的概念,并基于依存句法分析,结合TF-IDF值的计算,提出了构建领域句法词典的方法;然后,使用领域句法词典,结合词向量技术,完成了领域新词发现方法的设计;最后,使用来自于护肤品论坛的真实文本数据集对所提方法进行了正确性验证。实验结果表明,构建的句法词典的质量较高,所提方法在进行领域新词发现时具有良好的性能。 Many existing words and phrases may be used in a domain in which they have never appeared before.These words and phrases are called newly-emerging domain words.The researchers can get insight into the latest development tendency and public opinions of a domain through these newly-emerging words.Therefore,it is significant to detect newly-emerging domain words.Based on dependency syntactic analysis and term vector,this paper proposed a newly-emerging domain words detection method.Firstly,the concept of syntactic dictionary was proposed,and its constructing method was proposed for some specific domains based on the dependency syntax of sentences and TF-IDF values of training corpus.Next,domain syntactic dictionary and term vectors were used to detect newly-emerging domain words.The comprehensive experiments were conducted to evaluate the proposed method with comment data from a skin-care products forum.The experimental results show that the syntactic dictionary is effective and the proposed method has good performance in newly-emerging domain word detection.
作者 赵志滨 石玉鑫 李斌阳 ZHAO Zhi-bin;SHI Yu-xin;LI Bin-yang(School of Computer Science and Engineering,Northeastern University,Shenyang 110819,China;School of Information Science and Technology,University of International Relations,Beijing 100091,China)
出处 《计算机科学》 CSCD 北大核心 2019年第6期29-34,共6页 Computer Science
基金 国家重点研发计划项目(2018YFB1004700) 国家自然科学基金项目(61472070) 航天专业部新技术研究高校合作项目(SKX182010023)资助
关键词 句法分析 词向量 领域新词发现 句法词典 Syntactic analysis Term vector Newly-emerging domain words Syntactic dictionary
  • 相关文献

参考文献4

二级参考文献32

  • 1朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:326
  • 2卢志茂,刘挺,李生.统计词义消歧的研究进展[J].电子学报,2006,34(2):333-343. 被引量:28
  • 3刘桃,刘秉权,徐志明,王晓龙.领域术语自动抽取及其在文本分类中的应用[J].电子学报,2007,35(2):328-332. 被引量:31
  • 4黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19. 被引量:249
  • 5HUANG J H, POWERS D. Chinese word segmentation based on contextual entropy[C]// Proceedings of the 17th Asian Pacific Conference on Language, Information and Computation. Singapore, 2003:152-158.
  • 6YE Yunming, WU Qingyao, LI Yan, et al. Unknown Chinese word extraction based on variety of overlapping strings[J]. Information Processing & Management, 2013, 49(2): 497-512.
  • 7MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL].(2013-10-23)[2014-02-23].http://dblp.uni-trier.de/db/journals/corr/corr1301.html#abs-1301-3781.
  • 8MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality [J].Advances in Neural Information Processing Systems, 2013:3111-3119.
  • 9E. Ukkonen.??On-line construction of suffix trees(J)Algorithmica . 1995 (3)
  • 10Fuchun Peng,Fangfang Feng,Andrew McCallum.Chinese segmentation and new word detection using conditional random fields. Proceeding of The 20th International Conference on Computational Linguistics . 2004

共引文献46

同被引文献111

引证文献14

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部