摘要
高被引论文具有较高的学术话语权与参考价值,判别其核心影响因素对于学术论文获得持续吸引引文能力,建立并强化高被引竞争优势至关重要。通过文献提取与问卷调查的主客观相结合方法提取、筛选并形成学术论文内外部影响因素集,通过逻辑回归方法探究内外部影响因素对高被引论文的线性和非线性影响,最后运用机器学习多种经典分类算法来检验上述结果的稳健性。研究发现,参考文献质量、参考文献年龄均对形成高被引论文具有显著的正向线性影响,且随着变量值的增大,二次项对线性影响具有较强的叠加效应;期刊质量对形成高被引论文近似线性影响;而作者声誉、使用次数及初始被引量等因素对形成高被引论文具有显著的正向线性影响,随着变量值的增大,二次项逐渐削弱其线性影响,呈现先增大后趋于平缓的半倒U型趋势;机器学习决策树、朴素贝叶斯、随机森林等经典分类算法均对高被引论文具有较好的预测效果,研究结果具有较强的稳健性。
Highly cited papers have high academic discourse and reference values.The research on the identification of the core factors of highly cited papers is very important for academic papers to obtain citations and to establish and strengthen the competitive advantage.This paper extracts,screens,and forms a set of internal and external influencing factors of academic papers through the combination of literature extraction and questionnaire.It then explores the linear and nonlinear influence of these factors on highly cited papers by means of logistic regression.Finally,this paper uses various classical classification algorithms of machine learning to test the robustness of the above results.The results show that the quality and age of references both have a significant positive linear effect on the formation of highly cited papers,and with the increase of variable values,the quadratic coefficient has a strong superposition effect.In addition,journal reputation has an approximate linear effect on the formation of highly cited papers.However,the indicators such as the author’s reputation,usage and initial citation have a significant positive linear influence on the formation of highly cited papers.With the increase of variable values,the linear effect of the quadratic coefficient gradually weakens,showing a semi-"inverted U"trend of first increasing and then leveling off.Machine learning classical classification algorithms such as decision tree,naive bayes and random forest all show good prediction results for highly cited papers,which shows that the research results of this paper are robust.
作者
许林玉
Xu Linyu(School of Management,Xuzhou Medical University,Xuzhou,221004)
出处
《信息资源管理学报》
CSSCI
2023年第5期137-148,共12页
Journal of Information Resources Management
关键词
高被引论文
核心影响因素
择优依附
逻辑回归
机器学习分类算法
Highly cited papers
Core factors
Preferential attachment
Logical regression
Machine learning classification algorithm