期刊文献+

基于论文摘要和引文文本语料的突破性研究特征词识别 被引量:5

Identifying Feature Words Based on Abstracts and Citation Text Corpus of Breakthrough Research
原文传递
导出
摘要 [目的/意义]基于作者对自身研究的描述性评价和后续研究者的评论性引用视角,利用摘要和引文语料提取突破性研究的特征词,从而了解突破性研究的摘要和引文语料特征以帮助对于突破性研究的识别。[方法/过程]选取Science评选为"Breakthrough of the Year"的关键文献和Nobel Prize获得者的"key publications"作为突破性研究语料数据,整合论文的摘要和引文语料进行特征词提取。特征词提取中,首先利用Stanford CoreNlp工具对语料进行分词及词频统计,并结合专家意见提取特征词元。然后将特征词作为种子词,利用医学文本的语义关系对特征词进行语义拓展。最后通过查全率和查准率进一步对比摘要和引文的特征词拓展前后的检索识别效果。[结果/结论]突破性研究语料中遴选出8个摘要语料的特征词元和8个引文语料的特征词元。特征词检索识别中,摘要和引文的拓展特征词的查全率最高,引文特征词的查准率最高,引文拓展特征词的查全率和查准率综合效果较好。 [Purpose/significance]Based on the author's descriptive evaluation of his research and the critical citations of later researchers,the abstract and citation corpus of the breakthrough research are used to extract the feature words.Feature words can be used to understand the abstract and citation corpus features of the breakthrough research and contribute to the identification of breakthrough research.[Method/process]Key documents selected by Science as"Breakthrough of the Year"and"key publications"of Nobel Prize winners were selected as breakthrough research corpus data.Feature words were extracted by integrating abstracts and citation corpus of the paper.In the feature word extraction,the Stanford CoreNlp tool was used to perform word frequency statistics on the corpus,and the feature words were filtered in combination with expert opinions.Then we used the semantic relationship of medical texts to semantically expand feature words,which were used as the seed words.Finally,the retrieval and recognition effects of the abstract and citation feature words were further compared by the recall rate and the precision rate.[Result/conclusion]In the breakthrough research corpus,we selected 8 feature tokens of abstract corpora and 8 feature tokens of citation corpora.In the retrieval and recognition of feature words,the recall rate of the extended feature words of abstracts and citations is the highest,the precision of citation feature words is the highest.The comprehensive effect of the recall rate and precision of citation expansion feature words are better.
作者 杨雪梅 王雪 杜建 唐小利 Yang Xuemei;Wang Xue;Du Jian;Tang Xiaoli(Institute of Medical Information,Chinese Academy of Medical Sciences,Beijing 100005;National Institute of Health Data Science,Peking University,Beijing 100191)
出处 《图书情报工作》 CSSCI 北大核心 2020年第11期125-132,共8页 Library and Information Service
基金 国家社会科学基金项目"基于科学与技术交叉模型的创新前沿识别方法与应用研究"(项目编号:18BTQ064) 中国医学科学院医学与健康科技创新工程"医学科技创新评价与卫生服务体系构建研究"(项目编号:2016-I2M-3-018)研究成果之一。
关键词 突破性研究 特征词 摘要文本 引用语句 breakthrough research feature words abstract text citing sentence
  • 相关文献

参考文献4

二级参考文献17

共引文献76

同被引文献60

引证文献5

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部