摘要
skyline操作就是找出数据集中不被其他数据点支配的点的集合,但是随着数据属性维度的不断增多,通常人们只对数据集的某几个属性感兴趣,高维空间子空间skyline计算就是发现数据集中在某几个特定维度上不被其他点支配的点的集合,skyline计算在数据量大时其时间花销是非常大的,快速的返回结果才是人们能接受的.基于此提出了一个RSky算法,在原有CSky算法的基础上,指出并改进了其存在的3处明显不足,并根据InvertS索引的特性提出了一个压缩扫描策略,通过设置每个维度的下限来控制要处理的桶,除去不必要处理的桶和不可能是skyline的点,从而减少了点与点之间的比较次数.实验结果表明了RSky算法的有效性.
The skyline computaion is to find the points that are not dominated by any other points in the data set.However,with the growing number of dimensions of the data attributes,usually people are only interested in a few attributes of the data set.The subspace skyline computation for high dimensional space is to find a set of points that are not dominated by any other points on a few specific dimensions in the data set.When deal with the large amount of data,the skyline computation can consume a lot of time,but return result fast is which users can accept.In this paper,we proposed an improved algorithm called RSky which is based on the existing CSky to address the problem of skyline computation's high time complexity.In particular,we pointed out three shortcomings of CSky and proposed a compression method to reduce the points to be processed based on the characters of the InvertS index.The number of comparisons between point and point can be reduced by the lower bounds of each dimension.With the lower bounds,RSky can also remove the unnecessary buckets and the points which are impossible to be skyline points.The experimental results have show the efficiency of the proposed RSky algorithm.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2013年第S1期101-108,共8页
Journal of Computer Research and Development
基金
国家自然科学基金项目(61070005)
中央高校基本科研业务费项目(111gpy63)
广东省科技计划项目(2010B080701062)