摘要
近年来“大数据”崛起,成为一种改变商业、科学和社会的“破坏性力量”。而对大数据及其价值,人们既抱有极大的热情和期盼,也存在质疑。在笔者看来,这种质疑源于对大数据利用目的的根本性混淆:是更好的科学,还是更好的工程?质疑者对摈弃传统数据采集、分析方法,混淆相关、因果关系,建构单一解释力模型等做法提出了批评。然而,基于发展社会科学的考量,这些观点又有存在的价值。但笔者仍然认为,如果要利用大数据革新计算方法以改善效率,所设计的测量指标就应该是客观、公正的。那些听起来科学、有用的说法不一定能够优化工程工艺。厘清了科学与工程之间的异同,也就能够明白并解决围绕大数据产生的诸种论争,从而有助于设计测量贡献率的指标。
Over the past few years,we have seen the emergence of“big data”:disruptive technologies that have transformed commerce,science,and many aspects of society.Despite the tremendous enthusiasm for big data,there is no shortage of detractors.This article argues that many criticisms stem from a fundamental confusion over goals:whether the desired outcome of big data use is“better science”or“better engineering.”Critics point to the rejection of traditional data collection and analysis methods,confusion between correlation and causation,and an indifference to models with explanatory power.From the perspective of advancing social science,these are valid reservations.I contend,however,that if the end goal of big data use is to engineer computational artifacts that are more effective according to well-defined metrics,then whatever improves those metrics should be exploited without prejudice.Sound scientific reasoning,while helpful,is not necessary to improve engineering.Understanding the distinction between science and engineering resolves many of the apparent controversies surrounding big data and helps to clarify the criteria by which contributions should be assessed.
关键词
大数据
计算社会科学
机器学习
数据挖掘
日志分析
Big Data
Computational
Social Science
Machine Learning
Data Mining
Log Analysis