Big data analytics is emerging as one kind of the most important workloads in modern data centers. Hence,it is of great interest to identify the method of achieving the best performance for big data analytics workload...Big data analytics is emerging as one kind of the most important workloads in modern data centers. Hence,it is of great interest to identify the method of achieving the best performance for big data analytics workloads running on state-of-the-art SMT( simultaneous multithreading) processors,which needs comprehensive understanding to workload characteristics. This paper chooses the Spark workloads as the representative big data analytics workloads and performs comprehensive measurements on the POWER8 platform,which supports a wide range of multithreading. The research finds that the thread assignment policy and cache contention have significant impacts on application performance. In order to identify the potential optimization method from the experiment results,this study performs micro-architecture level characterizations by means of hardware performance counters and gives implications accordingly.展开更多
In today's data centers supporting Internet-scale computing and input/output (I/0) services, increasingly more network-intensive applications are deployed on the network as a service. To this end, it is critical fo...In today's data centers supporting Internet-scale computing and input/output (I/0) services, increasingly more network-intensive applications are deployed on the network as a service. To this end, it is critical for the applications to quickly retrieve requests from the network and send their responses to the network. To facilitate this network func- tion, operating system usually provides an event notification mechanism so that the applications (or the library) know if the network is ready to supply data for them to read or to receive data for them to write. As a widely used and repre- sentative notification mechanism, epoll in Linux provides a scalable and high-performance implementation by allowing applications to specifically indicate which connections and what events on them need to be watched. As epoll has been used in some major systems, includ- ing key-value (KV) systems, such as Redis and Memcached, and web server systems such as NGINX, we have identified a substantial performance issue in its use. For the sake of efficiency, applications usually use epoll's system calls to inform the kernel exactly of what events they are interested in and always keep the information up-to-date. However, in a system with demanding network traffic, such a rigid main- tenance of the information is not necessary and the excess number of system calls for this purpose can substantially de- grade the system's performance. In this paper, we use Redis as an example to explore the issue. We propose a strategy of informing the kernel of the interest events in a manner adaptive to the current network load, so that the epoll system calls can be reduced and the events can be efficiently deliv- ered. We have implemented an event-polling library, named as FlexPoll, purely in user-level without modifying any ker- nel code. Our evaluation on Redis shows that the query through- put can be improved by up to 46.9% on micro-benchmarks, and even up to 67.8% on workloads emulating real-world ac- cess patterns. FlexPoll is a generic mechanism thus it can be adopted by other applications in a straightforward manner, such as NG1NX and Memcached.展开更多
基金Supported by the National High Technology Research and Development Program of China(No.2015AA015308)the State Key Development Program for Basic Research of China(No.2014CB340402)
文摘Big data analytics is emerging as one kind of the most important workloads in modern data centers. Hence,it is of great interest to identify the method of achieving the best performance for big data analytics workloads running on state-of-the-art SMT( simultaneous multithreading) processors,which needs comprehensive understanding to workload characteristics. This paper chooses the Spark workloads as the representative big data analytics workloads and performs comprehensive measurements on the POWER8 platform,which supports a wide range of multithreading. The research finds that the thread assignment policy and cache contention have significant impacts on application performance. In order to identify the potential optimization method from the experiment results,this study performs micro-architecture level characterizations by means of hardware performance counters and gives implications accordingly.
基金This work was supported by National High Tech- nology Research and Development Program of China (863) (2011AA01A204), and the National Natural Science Foundation of China (Grant No. 61272167).
文摘In today's data centers supporting Internet-scale computing and input/output (I/0) services, increasingly more network-intensive applications are deployed on the network as a service. To this end, it is critical for the applications to quickly retrieve requests from the network and send their responses to the network. To facilitate this network func- tion, operating system usually provides an event notification mechanism so that the applications (or the library) know if the network is ready to supply data for them to read or to receive data for them to write. As a widely used and repre- sentative notification mechanism, epoll in Linux provides a scalable and high-performance implementation by allowing applications to specifically indicate which connections and what events on them need to be watched. As epoll has been used in some major systems, includ- ing key-value (KV) systems, such as Redis and Memcached, and web server systems such as NGINX, we have identified a substantial performance issue in its use. For the sake of efficiency, applications usually use epoll's system calls to inform the kernel exactly of what events they are interested in and always keep the information up-to-date. However, in a system with demanding network traffic, such a rigid main- tenance of the information is not necessary and the excess number of system calls for this purpose can substantially de- grade the system's performance. In this paper, we use Redis as an example to explore the issue. We propose a strategy of informing the kernel of the interest events in a manner adaptive to the current network load, so that the epoll system calls can be reduced and the events can be efficiently deliv- ered. We have implemented an event-polling library, named as FlexPoll, purely in user-level without modifying any ker- nel code. Our evaluation on Redis shows that the query through- put can be improved by up to 46.9% on micro-benchmarks, and even up to 67.8% on workloads emulating real-world ac- cess patterns. FlexPoll is a generic mechanism thus it can be adopted by other applications in a straightforward manner, such as NG1NX and Memcached.