In this paper,we present a novel approach utilizing attributes correlation for the sampling task on nonuniform hidden databases. We propose the method of calculating the attributes dependency and construct the samplin...In this paper,we present a novel approach utilizing attributes correlation for the sampling task on nonuniform hidden databases. We propose the method of calculating the attributes dependency and construct the sampling template according to the attributes dependency. Then,we use the sampling template to gen-erate initial sampling queries and propose a bottom-up algorithm to search the sampling template. We also conduct extensive ex-periments over real deep Web sites and controlled databases to illustrate that our sampling method has good performance both on the quality and efficiency.展开更多
基金Supported by the National Natural Science Foundation of China (60970018)
文摘In this paper,we present a novel approach utilizing attributes correlation for the sampling task on nonuniform hidden databases. We propose the method of calculating the attributes dependency and construct the sampling template according to the attributes dependency. Then,we use the sampling template to gen-erate initial sampling queries and propose a bottom-up algorithm to search the sampling template. We also conduct extensive ex-periments over real deep Web sites and controlled databases to illustrate that our sampling method has good performance both on the quality and efficiency.