摘要
As the demand for the development of cloud computing grows, more and more organizations have outsourced their data and query services to the cloud for cost-saving and flexibility. Suppose an organization that has a great number of users querying the cloud-deployed multiple proxy servers to achieve cost efficiency and load balancing. Given n queries, each of which is expressed as several keywords, and k proxy servers, the problem to be solved is how to classify n queries into k groups, in order to minimize the difference between each group and the number of distinct keywords in all groups. Since this problem is NP-hard, it is solved in mathematic and heuristic ways. Mathematic grouping uses a local optimization method, and heuristic grouping is based on k-means. Specifically, two extensions are provided: the first one focuses on robustness, i.e., each user obtains search results even if some proxy servers fail; the second one focuses on benefit, i.e., each user can retrieve as many files as possible that may be of interest without increasing the sum. Extensive evaluations have been conducted on both a synthetic dataset and real query traces to verify the effectiveness of our strategies.
As the demand for the development of cloud computing grows, more and more organizations have outsourced their data and query services to the cloud for cost-saving and flexibility. Suppose an organization that has a great number of users querying the cloud-deployed multiple proxy servers to achieve cost efficiency and load balancing. Given n queries, each of which is expressed as several keywords, and k proxy servers, the problem to be solved is how to classify n queries into k groups, in order to minimize the difference between each group and the number of distinct keywords in all groups. Since this problem is NP-hard, it is solved in mathematic and heuristic ways. Mathematic grouping uses a local optimization method, and heuristic grouping is based on k-means. Specifically, two extensions are provided: the first one focuses on robustness, i.e., each user obtains search results even if some proxy servers fail; the second one focuses on benefit, i.e., each user can retrieve as many files as possible that may be of interest without increasing the sum. Extensive evaluations have been conducted on both a synthetic dataset and real query traces to verify the effectiveness of our strategies.
基金
This research was supported in part by the National Science Foundation of USA under Grant Nos. CNS-1449860, CNS-1461932, CNS-460971, CNS-1439672, CNS-1301774, and ECCS-1231461, the National Natural Science Foundation of China under Grant Nos. 61632009, 61472451, 61402161, 61472131, 61272151, and 61272546, the Hunan Provincial Natural Science Foundation of China under Grant No. 2015JJ3046, and the Open Foundation of State Key Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications) under Grant No. SKLNST-2016-2-20.