大数据基准测试程序包构建方法研究

An Approach to Build a Big Data Benchmark Suite

下载PDF

导出

摘要基准测试程序是评估计算机系统的关键测试工具。然而,大数据时代的到来使得开发大数据系统基准测试程序面临着更加严峻的挑战,当前学术界和产业界还不存在得到广泛认可的大数据基准测试程序包。文章利用实际的交通大数据系统构建了一个基于Hadoop平台的交通大数据基准测试程序包SIAT-Bench。通过选取多个层次属性量化了程序行为特征,采用聚类算法分析了不同程序-输入数据集对的相似性。根据聚类结果,为SIATBench选取了有代表性的程序和输入数据集。实验结果表明,SIAT-Bench在满足程序行为多样性的同时消除了基准测试集中的冗余。 Benchmarks are important tools to evaluate the performance of a variety of computing systems. However, benchmarks for big data systems are lacking as big data is relatively new and researchers are interested in understanding how big data systems including hardware and software work but do not have data. In this paper, an approach to develop big data benchmarks was devised at first. Then a big data benchmark suite named SIAT-Bench, which contains five representative workloads from Shenzhen urban transportation system, was presented. To this end, the program behavior was characterized and the impact of input data sets was qualiifed by observing metrics from multiple levels such as microarchitecture, OS and application layer. Then statistical techniques such as Principal Component Analysis （PCA） and Clustering were employed to perform similarity analysis between different workload-input pairs. Finally, we built SIAT-Bench by selecting representative workloads and associated input sets according to the clustering results. Experimental results show that SIAT-Bench properly satisifes the requirements of a benchmark suite.

作者熊文喻之斌须成忠

机构地区中国科学院深圳先进技术研究院云计算技术研究中心

出处《集成技术》 2014年第4期1-9,共9页 Journal of Integration Technology

关键词大数据基准测试程序输入数据集程序相似性城市交通系统 GPS轨迹数据 big data benchmark workload-input pairs similarity urban traffic systems GPS trajectory data

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献1

1孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,50(1):146-169. 被引量：2371

二级参考文献167

1Nature. Big Data [EB/OL]. [2012-10-02]. http,//www. nature, com/news/specials/bigdata/index, html.
2Bryant R E, Katz R H, Lazowska E D. Big-Data computing : Creating revolutionary breakthroughs in commerce, science, and society [R]. [2012-10-02]. http:// www. cra. org/ccc/docs/init/Big_Data, pdf.
3Science. Special online collection: Dealing with data [EB/OL]. [2012-10-02]. http://www, sciencemag, org/site/ special/data/, 2011.
4Agrawal D, Bernstein P, Bertino E, et al. Challenges and opportunities with big data A community white paper developed by leading researchers across the United States [R/OL]. [2012-10-02]. http://cra, org/ccc/docs/init/bigdata whitepaper, pdf.
5Manyika J, Chui M, Brown B, et al. Big data: The next frontier for innovation, competition, and productivity [R/OL]. [ 2012-10-02 ]. http://www, mekinsey, corn/ Insights]MGI[Research/Teehnology _ and _ Innovation]Big _ data The next frontier for innovation.
6World Economic Forum. Big data, big impact: New possibilities for international development [R/OL]. [2012- 10-02]. http://www3, weforum, org/docs/WEF TC MFS BigDataBigImpact_Briefing 2012. pdf.
7Big Data Across the Federal Government [EB/OL]. [2012-10-02]. http://www, whitehouse, gov/sites/default/ files/microsites/ostp/big_data fact sheet_final_ 1. pdf.
8UN Global Pulse. Big Data for Development:Challenges Opportunities [R/OL]. [ 2012-10-02 ]. http://www. unglobalpulse, org/proj ects/BigDataforDevelopment.
9Times N Y. The age of big data fEB/OLd. [2012-10 -02]. http://www, nytimes, com/2012/02/12/sunday review/big- datas-impact in-the-world, html?pagewanted=all.
10Grobelnik M. Big-data computing: Creating revolutionary breakthroughs in commerce, science, and society [R/OL]. [2012-10 -02]. http://videolectures, net/cswc2012_grobelnik_ big_data/.

共引文献2370

1韩莹莹,钟专,褚月娇,康春阳,李东霓,王志佳,刘晓阳,张白羽.基于大数据智能化背景下神经病学实践教学体系构建的探索[J].中国实验诊断学,2023,27(8):1006-1009.
2李坪.大数据赋权正当性证成[J].中山大学法律评论,2020(1):3-21. 被引量：1
3孙昊鹏.大数据在新冠肺炎疫情中的应用和缺失[J].郑州师范教育,2020,9(3):91-96. 被引量：1
4闫妍.刍议大数据时代背景下全面预算管理对提升项目储备精益化管理水平的价值[J].质量与市场,2020,0(1):19-21. 被引量：6
5叶青.违法立案的检察监督机制研究[J].国家检察官学院学报,2024,32(1):53-68.
6刘厚营.大数据在安保工作情报分析中的应用[J].工程技术研究,2018,3(1):243-244. 被引量：1
7肖楠,陈红梅.从融媒体到智媒体:一种技术驱动下的传媒经济发展路径[J].新闻知识,2020(9):19-22. 被引量：3
8杨东,郑清洋.从TikTok事件看数字人民币的路径选择:从流量入口到金融优势的转化[J].新疆师范大学学报（哲学社会科学版）,2021,42(4):126-135. 被引量：6
9刘生龙,张晓明,杨竺松.互联网使用对农村居民收入的影响[J].数量经济技术经济研究,2021,38(4):103-119. 被引量：54
10李跃先,殷传涛,魏亿钢.基于本体与中间件的科技资源数据集成方法[J].标准科学,2021(5):21-28. 被引量：2

1熊浩,晏海华,赫建营,赵长海.一种基于静态词法树的程序相似性检测方法[J].计算机应用研究,2009,26(4):1316-1319. 被引量：4
2刘颖,孙冲武.基于停留点聚类的多粒度热点区域分析方法[J].微计算机信息,2012(9):295-297. 被引量：1
3朱猛,孙剑.基于MBR的GPS轨迹数据压缩算法[J].信阳农林学院学报,2016,26(1):117-120. 被引量：1
4林树宽,于伶姿,乔建忠,张百合.基于GPS轨迹数据的拥堵路段预测[J].东北大学学报（自然科学版）,2015,36(11):1530-1534. 被引量：8
5蒋鸿玲,张楠,李克,田昊,葛伟.基于MapReduce的出租车停泊点智能推荐算法[J].计算机应用与软件,2016,33(2):254-258. 被引量：3
6应明.处理计算机程序相似性版权纠纷必须遵循版权法规（上）[J].电子知识产权,2000(1):32-34.
7应明.处理计算机程序相似性版权纠纷必须遵循版权法规(下)[J].电子知识产权,2000(2):34-36.
8赵长海,晏海华,金茂忠.基于编译优化和反汇编的程序相似性检测方法[J].北京航空航天大学学报,2008,34(6):711-715. 被引量：28
9陈滨,王平,施文灶,徐世武.GPS轨迹数据的综合地图匹配算法研究[J].电子科技,2014,27(12):20-23. 被引量：6
10杨旭华,汪向飞.基于低采样率浮动车数据的全局投票地图匹配算法[J].浙江工业大学学报,2015,43(3):318-325. 被引量：1

集成技术

2014年第4期

浏览历史

内容加载中请稍等...

大数据基准测试程序包构建方法研究

参考文献1

二级参考文献167

共引文献2370

相关作者

相关机构

相关主题

浏览历史