摘要
UGC大数据中有许多前互联网时代难以获取甚至无法获取的资料,并且避免了通过传统方法获取数据的诸多缺陷,因而被作为一种新的资源运用于社会科学研究。开展这类研究,UGC大数据的质量是必须首先考虑的问题。本文从信息生产特点、信息传播过程、信息检测识别技术等多个角度,对UGC大数据质量存在的问题及其形成原因进行探析。研究发现:UGC大数据存在自身特有的缺陷,网络信息空间并不能完美地映射社会现实空间;将UGC大数据运用于社会科学研究,难以保证数据的真实性、自然性和准确性。因此,具体研究需要对数据质量进行预判和检验,并采用与传统研究方法相结合等策略来应对。
UGC big data contains many materials that were difficult or impossible to obtain in the pre-Internet era,and it avoids many shortcomings of obtaining data through traditional methods,so it is used as a new resource for social science research.To carry out this kind of research,the quality of UGC big data must be the first consideration.This article analyzes the problems of UGC big data quality and their causes from multiple perspectives such as the characteristics of information production,information dissemination process,and information detection and identification technology.The study found that:UGC big data has its own unique flaws,and the network information space cannot perfectly map the social reality space;applying UGC big data to social science research can hardly guarantee the authenticity,naturalness,and accuracy of the data.Therefore,specific research needs to predict and test data quality,and adopt strategies such as combining with traditional research methods to deal with it.
作者
陈峥
Chen Zheng(Hubei Open University;School of Sociology,Wuhan University)
出处
《图书馆》
CSSCI
北大核心
2021年第3期90-98,共9页
Library
基金
2016年度国家社科基金重大项目“大数据时代计算社会科学的产生、现状与发展前景研究”(项目编号:16ZDA086)研究成果之一。
关键词
大数据
用户生成内容
数据质量
检测识别技术
Big data
User generated content
Data quality
Detection and recognition technology