摘要
多方主体参与、多种手段并存的数据采集、数据汇聚离不开数据融合。但基于遥感、互联网信息、问卷调查等数据收集方式的变化,使得大数据融合质量评价问题变得更为困难。文章基于对数线性与双系统估计方法建立数据融合质量评价模型,并以两个数据库的融合为例,深入分析数据融合中过涵盖误差估计方法。基于对数线性与双系统估计方法的数据融合质量评价模型中,一个只包含不足涵盖误差的抽样调查是必须的,并且该质量评价模型可扩展至多个数据库融合的情形。该模型易于操作,能为数据整合汇聚、建立大数据云和重点领域专题数据库提供质量保证。
Data fusion is indispensable for data collection and data aggregation with multiple subjects and multiple means. However, the change of data collection methods based on remote sensing, Internet information and questionnaire survey makes it more difficult to evaluate the quality of big data fusion. This paper establishes an evaluation model of data fusion quality based on log-linear and dual system estimation method, and takes the fusion of two databases as an example to analyse the overcoverage er- ror estimation method in data fusion. In the evaluation model of data fusion quality based on log-linear and dual system estimation method, a sampling survey that contains only inadequately covered errors is necessary, and the quality evaluation model can be ex- tended to multi-database integration. The proposed model is easy to operate and can provide a good quality assurance for data in- tegration and aggregation and the establishment of thematic databases in big data cloud and key fields.
作者
李红
牛成英
孙秋碧
林嘉燕
Li Hong;Niu Chengying;Sun Qiubi;Lin Jiayan(College of Economics and Management,Fuzhou University,Fuzhou 350116,China;School of Statistics,Lanzhou University of Finance and Economics,Lanzhou 730101,China;Fujian Polytechnic of Information Technology,Fuzhou 350001,China)
出处
《统计与决策》
CSSCI
北大核心
2018年第21期10-14,共5页
Statistics & Decision
基金
全国统计科学研究重点项目(2015LZ05
2017LZ41)
福建省社会科学规划项目(FJ2015C206)
福州大学“大数据时代统计学的未来与应用”项目(036050009473)