摘要
通过属性规约、空缺值处理及异常值检测,对中国科技论文在线2003-2009年37 898篇首发论文数据进行数据预处理。对数据预处理后的37 348篇首发论文,构建下载次数的回归树模型,通过模型结果分析,得出下载次数的影响因素依次为首发论文的发表时间、所属学科以及首发论文的星级评定结果,并分析下载次数在这三个方面的典型特征。
Based on the data preprocessing of attribute-oriented induction and missing value to the basic properties of starting papers, and outlier detection to 37 898 papers from 2003 to 2009 of Sciencepaper Online, regression tree model of downloads was constructed according to preprocessed 37 348 papers. Through the analysis of constructed model, the primary affecting factors of downloads were published time, subject and star. And the typical characteristics of downloads on these three aspects were analyzed.
出处
《图书情报工作》
CSSCI
北大核心
2011年第10期83-87,共5页
Library and Information Service
基金
教育部科技发展中心网络时代的科技论文快速共享专项研究资助课题"基于数据挖掘的中国科技论文在线文献定量分析"(项目编号:20090061110084)研究成果之一