摘要
将电商交易数据纳入价格指数架构是目前统计工作关注的焦点。应用大数据技术,将Nutch爬虫搭建在分布式集群上,构建分布式网络数据抓取系统,同时结合最新的AP聚类算法对数据进行预处理,然后对网上数据进行价格指数建模,进行价格指数试算。试算结果表明:基于分布式集群下的Nutch网络爬虫技术较好地完成了网络交易数据抓取任务。因此,计算的网上交易数据价格指数可较好地反映市场价格变化趋势。
The integration of electric business transaction data into the price index structure is the focus of statistical work.In this paper,we focus on large data technology,build Nutch reptile on distributed cluster to build distributed network data capture system,and combine the latest AP clustering algorithm to preprocess the data;then we will carry out price index construction of online data,and conduct the price index trial.The results show that the Nutch web crawler based on the distributed cluster can complete the task of network transaction data fetching,and the price index of the online transaction data calculated by the data can reflect the market price trend.
作者
阳黎明
苏理云
YANG Li-ming SU Li-yun(College of Science, Chongqing University of Technology, Chongqing 400054, Chin)
出处
《重庆理工大学学报(自然科学)》
CAS
2017年第1期152-157,共6页
Journal of Chongqing University of Technology:Natural Science
基金
重庆市教委资助项目(15SKG136)
重庆理工大学研究生创新基金资助项目(YCX2015228)
重庆理工大学高等教育教学改革研究项目(2014ZD03)
全国统计科学研究资助项目(2014LY069)
关键词
电商交易数据
分布式集群
NUTCH
价格指数
electric business transaction data
distributed cluster
Nutch
price index