期刊文献+

学术文本词汇功能识别--在关键词自动抽取中的应用 被引量:9

Recognition of Lexical Functions in Academic Texts:Application in Automatic Keyword Extraction
下载PDF
导出
摘要 传统的关键词自动抽取常以候选词的出现频次、位置等非语义信息构建特征,并未考虑关键词在学术文献中承担的特定语义角色,即词汇功能。通过对现有数据统计,本文发现作者标注关键词中约有67.99%是研究问题或研究方法词。因此,本文将关键词的词汇功能分为三类:“研究问题”“研究方法”和“其他”,在传统的词频特征以及位置特征基础上,融合词汇功能特征,使用计算机领域的学术文献基于分类和排序两种思想进行关键词抽取实验。实验结果表明,融合词汇功能后,关键词抽取效果得到明显提升。相较于基准实验,二分类模型的准确率Acc和F值分别相对提升24.63%和25.19%,达到了0.840和0.666;排序模型的MAP、NDCG@5和P@5分别相对提升168.32%、189.50%和148.30%,提升至0.813、0.828和0.447,证明了学术文献词汇功能特征在关键词自动抽取中具有重要作用。 Traditional automatic keyword extraction often uses non-semantic information such as the frequency and location of candidate keywords to construct features without considering the specific semantic role of keywords in the academic text,that is,lexical function.Our statistical analysis found that 67.99%of the keywords in our dataset represented research questions or methods.Therefore,we classified lexical functions into three categories:Research Questions,Research Methods,and Others.Then,based on the word frequency and position features,a method was proposed to implement lexical functions in computer science papers through a classification model and ranking model.The results showed that our method could outperform the baseline with base features.The Acc and F of the classification model were improved to 0.840 and 0.666,with relative improvements of 24.63%and 25.19%,respectively.The MAP,NDCG@5,and P@5 of the ranking model improved by 168.32%,189.50%,and 148.30%,reaching 0.813,0.828,and 0.447,respectively.All improvements showed that lexical functions play an important role in automatic keyword extraction.
作者 姜艺 黄永 夏义堃 李鹏程 陆伟 Jiang Yi;Huang Yong;Xia Yikun;Li Pengcheng;Lu Wei(School of Information Management,Wuhan University,Wuhan 430072;Institute for Information Retrieval and Knowledge Mining,Wuhan University,Wuhan 430072;Center for Studies of Information Resources,Wuhan University,Wuhan 430072)
出处 《情报学报》 CSSCI CSCD 北大核心 2021年第2期152-162,共11页 Journal of the China Society for Scientific and Technical Information
基金 国家社会科学基金重大项目“基于认知计算的学术论文评价理论与方法研究”(17ZDA292)。
关键词 词汇功能 关键词抽取 支持向量机 学习排序 学术文本 lexical function keyword extraction SVM learning to rank academic text
  • 相关文献

参考文献7

二级参考文献60

  • 1冯璐,冷伏海.共词分析方法理论进展[J].中国图书馆学报,2006,32(2):88-92. 被引量:562
  • 2索红光,刘玉树,曹淑英.一种基于词汇链的关键词抽取方法[J].中文信息学报,2006,20(6):25-30. 被引量:88
  • 3Callon M, Courtial J P, Turner W, et al. From translations to problematic networks: An introduction to co-word analysis [ J ]. Social Science Information, 1983 ( 2 ) : 191-235.
  • 4HuCP, HuJM, DengSL, etal. Aco-wordanalysisof library and information science in China[J]. Scientometrics, 2013(7) : 1-14.
  • 5An X Y, Wu Q Q. Co-word analysis of the trends in stem cells field based on subject heading weighting [ J ]. Scientometrics, 2011, 88(1 ) : 133-144.
  • 6Zheng B, McLean D C, Lu X. Identifying biological concepts from a protein-related corpus with a probabilistic topic model[J]. Bmc Bioinformatics, 2006, 7(1): 58.
  • 7Voutilainen A. NPtool,a detector of English noun phrases [ C ]//Proceedings of the Workshop on Very Large Corpora. 1993.
  • 8Wang Z Y, Li G, Li C Y, et al. Research on the semantic-based co-word analysis [ J ]. Scientometrics, 2012, 90(3) : 855-875.
  • 9Su H N, Lee P C. Mapping knowledge structure by keyword co-occurrence: a first look at journal papers in Technology Foresight [ J ]. Scientometrics, 2010, 85 (1): 65-79.
  • 10Callon M, Law J, Rip A. Mapping out the dynamics of science and technology: Sociology of science in the real world[ M ]. London : Macmillan, 1986.

共引文献246

同被引文献125

引证文献9

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部