摘要
当前,一些学术期刊在利益的驱使下,通过大量自引和结成“互引同盟”的方式快速提高被引频次和影响因子等指标,影响了引文分析的公平性。基于此,本文首先利用数据挖掘中的CART分类算法构建期刊操纵引用行为的识别模型,设计了识别操纵引用行为的4个评价指标:白引率、被引年代分布、被引密度比和引用密度比。并采用国内某引文数据库中的50本综合性社会科学期刊作为实验样本,采集该期刊群2009年的引文数据作为训练数据集,2008年的引文数据作为验证数据集。最后,运用2010年的引文数据对期刊操控行为识别模型的有效性进行验证,实验结果证明,本文构建的分类模型可以有效地对期刊引用操纵行为进行识别。
Now some academic journals are driven by the interests in order to improve their cited frequencies and the impact factors of journals quickly by a large number of self-citations or by forming a citation alliance, which affects the fairness of citation analysis. According to the background, the paper first constructs a journal citation manipulation behavior recognition model by CART classification algorithm in data mining, and designs four evaluation indexes for recognizing the manipulation behavior: self-citation rate, cited era distribution, cited density ratio, citation density ratio. Then an experiment was carried out to verify the model with the data collected from a citation database of China. The experiment takes 50 journals in the field of comprehensive social science as its experimental sample, collects the citation data of these journals in 2009 as the training data set and takes the citation data of these journals in 2008 as the validation data set. Finally, the article chooses the citation data of these journals in 2010 to identify the validity of the journal manipulation behavior recognition model. The experiment result showed that the model can effectively recognize the journal citation manipulation behavior.
出处
《情报学报》
CSSCI
北大核心
2013年第10期1058-1067,共10页
Journal of the China Society for Scientific and Technical Information
关键词
期刊引用操纵行为
CART算法
自引率
被引年代分布
被引密度比
引用密度比
journal citation manipulation behavior
CART algorithm
self-citation rate
cited era distribution
citeddensity ratio
citation density ratio