期刊文献+

基于机器学习技术的自动引文分类研究综述 被引量:2

Review of Automatic Citation Classification Based on Machine Learning
原文传递
导出
摘要 【目的】梳理和总结自然语言处理和机器学习技术在自动引文分类中的应用现状。【文献范围】在Scopus数据库以citation classification、citation polarity、citation function、feature selection等关键词为基础构建检索策略,筛选出代表性文献共46篇。【方法】从引文分类流程、引文分类任务、技术方法等角度对当前研究进行分析和评述,并探讨研究趋势和挑战。【结果】引文功能分类研究有从多分类向二分类转移的趋势;深度学习模型可以同时实现引文情感和功能分类;自动引文分类面临语料库学科单一、引用语境界定存在争议、分类数据不平衡性等问题。【局限】主要基于文献对自动引文分类研究进行评述,对产业界的分类系统和平台的调研覆盖不够。【结论】建议制定和完善关于代码、数据、语料等科研数据重用的评价方式,鼓励开放共享;结合引文分类和引文频次构建多维度的评价模型;基于用户的检索结果,智能化推荐支持该研究的文献或观点冲突的文献供进一步阅读。 [Objective]This paper summarizes the application of natural language processing and machine learning technology in automatic citation classification.[Coverage]We searched“citation classification”,“citation polarity”,“citation function”and“feature selection”with Scopus database,and retrieved a total of 46 representative literature.[Methods]These research was reviewed from the perspectives of citation classification process,tasks and methods.Then,we discussed their future development trends and challenges.[Results]The research of citation classification is shifting from multi-class to binary class.Deep learning model can classify sentiments and functions of citations simultaneously.The challenges facing automatic citation classification include single discipline corpus,controversial definition of citation contexts and unbalanced classification data.[Limitations]This review does not discuss many classification systems in the industry.[Conclusions]We need to develop the evaluation method for re-using scientific research data such as codes,data and corpus,which could help to build open science.Combining citation classification and counts could establish a multi-dimensional evaluation model.Based on the user’s search results,the system could recommend documents supporting or objecting the related research for further reading.
作者 周志超 Zhou Zhichao(Health Science Library,Peking University,Beijing 100191,China)
出处 《数据分析与知识发现》 CSSCI CSCD 北大核心 2021年第12期14-24,共11页 Data Analysis and Knowledge Discovery
基金 CALIS全国医学文献信息中心项目(项目编号:CALIS-2020-01-003)的研究成果之一。
关键词 自动引文分类 自然语言处理 引文内容分析 文本分类 机器学习 Automatic Citation Classification Natural Language Processing Citation Content Analysis Text Classification Machine Learning
  • 相关文献

参考文献4

二级参考文献134

  • 1董建军.参考文献引用分类标注与科技期刊和论文的评价[J].编辑学报,2006,18(6):406-409. 被引量:13
  • 2叶继元,袁培国,吴向东.引文数据中的负面引用初探[J].新世纪图书馆,2007(6):22-23. 被引量:8
  • 3Liu Y,Rousseau R.Interestingness and the essence of citation[J].Journal of Documentation,2013,69(4):580-589.
  • 4Hirsch J E.An index to quantify an individual's scientific research output[J].Proceedings of the National academy of Sciences of the United States of America,2005,102(46):16569-16572.
  • 5Garfield E.Citation analysis as a tool in journal evaluation[J].Science,1972,178(4060):471-479.
  • 6Zhang G,Ding Y,Milojevi(c) S.Citation content analysis (CCA):a framework for syntactic and semantic analysis of citation content[J].Journal of the American Society for Information Science and Technology,2013,64(7):1490-1503.
  • 7Small H.Citation context analysis[J].Progress in communication sciences,1982,3:287-310.
  • 8Oppenheim C,Renn S P.Highly cited old papers and the reasons why they continue to be cited[J].Journal of the American Society for Information Science,1978,29(5):225-231.
  • 9McCain K W,Turner K.Citation context analysis and aging pattems of journal articles in molecular genetics[J].Scientometrics,1989,17(1):127-163.
  • 10Hanney S,Frame I,Grant J,et al.Using categorisations of citations when assessing the outcomes from health research[J].Scientometrics,2005,65 (3):357-379.

共引文献89

同被引文献23

引证文献2

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部