Virtualization and distributed parallel architecture are typical cloud computing technologies. In the area of virtuatization technology, this article discusses physical resource pooling, resource pool management and u...Virtualization and distributed parallel architecture are typical cloud computing technologies. In the area of virtuatization technology, this article discusses physical resource pooling, resource pool management and use, cluster fault location and maintenance, resource pool grouping, and construction and application of heterogeneous virtualization platforms. In the area of distributed technology, distributed file system and KeyNalue storage engine are discussed. A solution is proposed for the host bottleneck problem, and a standard storage interface is proposed for the distributed file system. A directory-based storage scheme for Key/Value storage engine is also proposed.展开更多
Big data processing is becoming a standout part of data center computation. However, latest research has indicated that big data workloads cannot make full use of modern memory systems. We find that the dramatic ineff...Big data processing is becoming a standout part of data center computation. However, latest research has indicated that big data workloads cannot make full use of modern memory systems. We find that the dramatic inefficiency of the big data processing is from the enormous amount of cache misses and stalls of the depended memory accesses. In this paper, we introduce two optimizations to tackle these problems. The first one is the slice-and-merge strategy, which reduces the cache miss rate of the sort procedure. The second optimization is direct-memory-access, which reforms the data structure used in key/value storage. These optimizations are evaluated with both micro-benchmarks and the real-world benchmark HiBench. The results of our micro-benchmarks clearly demonstrate the effectiveness of our optimizations in terms of hardware event counts; and the additional results of HiBench show the 1.21X average speedup on the application-level. Both results illustrate that careful hardware/software co-design will improve the memory efficiency of big data processing. Our work has already been integrated into Intel distribution for Apache Hadoop.展开更多
文摘Virtualization and distributed parallel architecture are typical cloud computing technologies. In the area of virtuatization technology, this article discusses physical resource pooling, resource pool management and use, cluster fault location and maintenance, resource pool grouping, and construction and application of heterogeneous virtualization platforms. In the area of distributed technology, distributed file system and KeyNalue storage engine are discussed. A solution is proposed for the host bottleneck problem, and a standard storage interface is proposed for the distributed file system. A directory-based storage scheme for Key/Value storage engine is also proposed.
文摘Big data processing is becoming a standout part of data center computation. However, latest research has indicated that big data workloads cannot make full use of modern memory systems. We find that the dramatic inefficiency of the big data processing is from the enormous amount of cache misses and stalls of the depended memory accesses. In this paper, we introduce two optimizations to tackle these problems. The first one is the slice-and-merge strategy, which reduces the cache miss rate of the sort procedure. The second optimization is direct-memory-access, which reforms the data structure used in key/value storage. These optimizations are evaluated with both micro-benchmarks and the real-world benchmark HiBench. The results of our micro-benchmarks clearly demonstrate the effectiveness of our optimizations in terms of hardware event counts; and the additional results of HiBench show the 1.21X average speedup on the application-level. Both results illustrate that careful hardware/software co-design will improve the memory efficiency of big data processing. Our work has already been integrated into Intel distribution for Apache Hadoop.