期刊文献+

面向商品筛选应用的大数据处理优化技术

Massive structured data storage and query technology for commodity screening application
下载PDF
导出
摘要 随着电子商务的飞速发展,商品数量越来越多,商品筛选应用对海量商品信息进行分布式存储和处理时,现有模型需要分布式系统中各个节点全部并行工作,再将每个节点结果整合得到最终结果,其过程会产生很多无效查询。提出一种基于关键列预处理的分布式数据存储与查询技术。该技术通过对表的历史查询进行统计分析,选定高频或核心字段作为关键列,根据关键列和分布式系统节点结构来进行数据存储;当有数据查询请求时,对于含有关键列的查询,可只针对部分节点生成有效查询任务。结果证明,该技术在不带来额外存储开销的情况下,可减少分布式系统中各节点的总工作任务数,有效提高系统吞吐率。 With the rapid development of electronic commerce and the increasing quantity of the goods,all nodes of distribute system are needed to work for massive data storage and distribute processing in the existing model,but a lot of invalid query is produced in the process of integration of each results. This paper proposes a distributed data storage and query technology based on key column pre-processing. The statistic analysis is carried out on history search results and the high frequency or core columns is chosen to be key columns. The data can be stored based on key columns and distribute system architecture. When there is a data query request,for queries that contain key columns,valid query tasks are generated only for some nodes. The results show that this method can reduce the tasks and improve the throughout without extra storage consumption.
作者 张伟 马利民 智昊 ZHANG Wei;MA Limin;ZHI Hao(Computer School,Beijing Information Science & Technology University,Beijing 100101,China)
出处 《北京信息科技大学学报(自然科学版)》 2018年第4期1-9,共9页 Journal of Beijing Information Science and Technology University
基金 北京市未来芯片技术高精尖创新中心科研基金资助项目(KYJJ2016005) 北京市青年拔尖人才培育项目(CIT&TCD201504057)
关键词 关键列 预处理 海量结构化数据 分布式数据存储 key column pre-processing massive structured data distributed data storage
  • 相关文献

参考文献6

二级参考文献163

  • 1周松,王意洁.EXPyramid:一种灵活的基于阵列结构的高容错低修复成本编码方案[J].计算机研究与发展,2011,48(S1):30-36. 被引量:5
  • 2董新华,李瑞轩,周湾湾,王聪,薛正元,廖东杰.Hadoop系统性能优化与功能增强综述[J].计算机研究与发展,2013,50(S2):1-15. 被引量:69
  • 3Deelman E,Chervenak A.Data management challenges of data-intensive scientific workflows//Proceedings of the IEEE International Symposium on Cluster Computing and the Grid(CCGRID).Lyon,France,2008:687-692.
  • 4Deelman E,Blythe J,Gil Y,Kesselman C,Mehta G,Patil S,Su M H,Vahi K,Livny M.Pegasus:Mapping scientific workflows onto the grid//Proceedings of the European Across Grids Conference(AxGrids).Nicosia,Cyprus,2004:11-20.
  • 5Ludascher B,Altintas I,Berkley C,Higgins D,Jaeger E,Jones M,Lee E A.Scientific workflow management and the Kepler system.Concurrency and Computation:Practice and Experience,2005,18(10):1039-1065.
  • 6Oinn T,Addis M,Ferris J,Marvin D,Senger M,Greenwood M,Carver T,Glover K,Pocock M R,Wipat A,Li P.Taverna:A tool for the composition and enactment of bioinformatics workflows.Bioinformatics,2004,20(17):3045-3054.
  • 7Ghemawat S,Gobioff H,Leung S T.The google file system.ACM SIGOPS Operating Systems Review,2003,37(5):29-43.
  • 8Wang L,Tao J,Kunze M,Castellanos A C,Kramer D,Karl W.Scientific cloud computing:Early definition and experience//Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications(HPCC).Dalian,China,2008:825-830.
  • 9Wieczorek M,Prodan R,Fahringer T.Scheduling of scientific workflows in the ASKALON grid environment.SIGMOD Record,2005,34(3):56-62.
  • 10Baru C,Moore R,Rajasekar A,Wan M.The SDSC storage resource broker//Proceedings of the IBMCentre for Advanced Studies Conference.Toronto,Canada,1998:1-12.

共引文献283

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部