摘要
【目的】梳理和总结自然语言处理和机器学习技术在自动引文分类中的应用现状。【文献范围】在Scopus数据库以citation classification、citation polarity、citation function、feature selection等关键词为基础构建检索策略,筛选出代表性文献共46篇。【方法】从引文分类流程、引文分类任务、技术方法等角度对当前研究进行分析和评述,并探讨研究趋势和挑战。【结果】引文功能分类研究有从多分类向二分类转移的趋势;深度学习模型可以同时实现引文情感和功能分类;自动引文分类面临语料库学科单一、引用语境界定存在争议、分类数据不平衡性等问题。【局限】主要基于文献对自动引文分类研究进行评述,对产业界的分类系统和平台的调研覆盖不够。【结论】建议制定和完善关于代码、数据、语料等科研数据重用的评价方式,鼓励开放共享;结合引文分类和引文频次构建多维度的评价模型;基于用户的检索结果,智能化推荐支持该研究的文献或观点冲突的文献供进一步阅读。
[Objective]This paper summarizes the application of natural language processing and machine learning technology in automatic citation classification.[Coverage]We searched“citation classification”,“citation polarity”,“citation function”and“feature selection”with Scopus database,and retrieved a total of 46 representative literature.[Methods]These research was reviewed from the perspectives of citation classification process,tasks and methods.Then,we discussed their future development trends and challenges.[Results]The research of citation classification is shifting from multi-class to binary class.Deep learning model can classify sentiments and functions of citations simultaneously.The challenges facing automatic citation classification include single discipline corpus,controversial definition of citation contexts and unbalanced classification data.[Limitations]This review does not discuss many classification systems in the industry.[Conclusions]We need to develop the evaluation method for re-using scientific research data such as codes,data and corpus,which could help to build open science.Combining citation classification and counts could establish a multi-dimensional evaluation model.Based on the user’s search results,the system could recommend documents supporting or objecting the related research for further reading.
作者
周志超
Zhou Zhichao(Health Science Library,Peking University,Beijing 100191,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2021年第12期14-24,共11页
Data Analysis and Knowledge Discovery
基金
CALIS全国医学文献信息中心项目(项目编号:CALIS-2020-01-003)的研究成果之一。
关键词
自动引文分类
自然语言处理
引文内容分析
文本分类
机器学习
Automatic Citation Classification
Natural Language Processing
Citation Content Analysis
Text Classification
Machine Learning