摘要
[目的/意义]互联网的迅猛发展使得信息量增速不断加快。作为从海量数据中发现有用知识的有力手段,数据挖掘成为近年的研究热点。然而在数据挖掘过程中,输出值和真实值之间往往存在着一定的差距,即数据挖掘偏差。[方法/过程]通过相关研究综述了数据挖掘偏差的概念、研究进展以及发展方向,并以数据挖掘的基本步骤类比文献计量的基本步骤,进而提出文献计量偏差的基本概念。从文献来源选取、文献数据预处理、文献计量方法选取和计量结果解读4个角度重点论述了文献计量偏差的主要表现方式和解决办法。[结果/结论]文章旨在呼吁未来该领域的相关研究关注文献计量偏差及其带来的负向影响,并期望相关研究能够通过科学方法避免文献计量偏差,从而得到更为准确和可靠的结论。
[ Purpose/significance ] The amount of information increases greatly with the rapid development of the Internet. As an effective method to discover knowledge from the oceans of data, data mining has become a research hotspot in recent years. However, there are always gaps between the outputs and the actual values in the process of data mining, which is called data mining bias. [ Method/process] Through detailed literature review about the conception, research breakthroughs, and future de- velopment directions of data mining bias, this paper compares the basic steps of data mining and bibliometrics to put up the concept of bibliometrics bias. The main presentation forms and solutions of bibliometrics are discussed from the perspectives of bibliographic data resources, pre-processing, method selections, and result interpretations. [ Result/conclusion] The paper aims to call for re- searchers in this domain to pay more attentions to bibliometrics bias and its negative influences, expecting that related studies can avoid these biases through scientific methods to get more accurate and reliable results.
出处
《情报理论与实践》
CSSCI
北大核心
2017年第10期41-46,共6页
Information Studies:Theory & Application