摘要
基准测试程序是评估计算机系统的关键测试工具。然而,大数据时代的到来使得开发大数据系统基准测试程序面临着更加严峻的挑战,当前学术界和产业界还不存在得到广泛认可的大数据基准测试程序包。文章利用实际的交通大数据系统构建了一个基于Hadoop平台的交通大数据基准测试程序包SIAT-Bench。通过选取多个层次属性量化了程序行为特征,采用聚类算法分析了不同程序-输入数据集对的相似性。根据聚类结果,为SIATBench选取了有代表性的程序和输入数据集。实验结果表明,SIAT-Bench在满足程序行为多样性的同时消除了基准测试集中的冗余。
Benchmarks are important tools to evaluate the performance of a variety of computing systems. However, benchmarks for big data systems are lacking as big data is relatively new and researchers are interested in understanding how big data systems including hardware and software work but do not have data. In this paper, an approach to develop big data benchmarks was devised at first. Then a big data benchmark suite named SIAT-Bench, which contains five representative workloads from Shenzhen urban transportation system, was presented. To this end, the program behavior was characterized and the impact of input data sets was qualiifed by observing metrics from multiple levels such as microarchitecture, OS and application layer. Then statistical techniques such as Principal Component Analysis (PCA) and Clustering were employed to perform similarity analysis between different workload-input pairs. Finally, we built SIAT-Bench by selecting representative workloads and associated input sets according to the clustering results. Experimental results show that SIAT-Bench properly satisifes the requirements of a benchmark suite.
出处
《集成技术》
2014年第4期1-9,共9页
Journal of Integration Technology