云环境下基于超球面投影分区的Skyline计算被引量：5

Distributed Skyline Processing Based on Hypersphere Projection Partitioning on Cloud Environments

下载PDF

导出

摘要目前,Skyline查询在集中式数据库、分布式数据库、数据流及分类属性数据集上的良好应用前景,使其成为当前数据库界研究的重点和热点之一,受到了学术界和工业界的广泛关注,它作为一种重要的数据挖掘技术广泛应用于多目标优化、城市导航系统、用户偏好查询及约束决策、智能防御系统以及地理信息系统等领域。随着人类可以采集和利用的数据信息的急剧增长,如何处理大数据的Skyline查询成为急需解决的问题。针对云计算环境,在Map-Reduce框架下设计并实现了基于超球面投影分区的分布式Skyline算法HSPD-Skyline,其主要思想是通过对高维数据点的超平面投影映射,即由空间坐标转换为超球面坐标,可以有效提高分区内数据点的平均减枝力度,降低Skyline的计算代价。同时,使用基于空间分区树的启发式策略HA-SPT,进一步提高了HSPD-Skyline算法的处理效率。通过详细的理论分析和实验验证表明,在不考虑数据分布和进一步优化算法的条件下,提出的HSPD-Skyline算法的总体性能(可扩展性、Skyline查询时间等)优于同类算法。 Recently, skyline processing has been receiving considerable attention due to its potential applications in many fields, including traditional database, distributed database, data stream and even the categorical database and so on. Both the academic and the industrial have paid much attention to it. As an important data mining technique, skyline proces- sing is of great significance for multi-objective optimization, urban navigation, multi-criteria decision making and prefe- rence query, trip planning, defense and intelligence systems and geographic information systems. In addition, the amount of data collected and used by human is developing at an astonishing speed. Therefore, how to process Skyline query of massive data is an urgent problem. Aiming at cloud computing applications, this paper designed and implemented dis- tributed Skyline processing based on hypersphere projection partitioning under the Map-Reduce framework, HSPD-Sky- line. It is showed that partitioning the data according to the hyperspherieal coordinates can increase the average pruning power of points within a partition, and reduce the cost of Skyline processing. The HSPD-Skyline algorithm also uses a heuristic strategy based on space partitioning tree, HA-SPT,to further improve the processing efficiency of the HSPD- Skyline algorithm. Finally, the theoretical analysis and experiment results illustrate that the HSPD-Skyline algorithm （Distributed Skyline Processing based on Hypersphere Projection Partitioning） consistently outperforms similar approa- ches for distributed skyline computation, regardless of data distribution, and further optimization strategies.

作者雷婷王涛曲武韩晓光

机构地区成都工业学院通信工程系成都湖南城市学院信息科学与工程学院益阳清华大学知识工程研究室北京北京启明星辰信息技术股份有限公司北京中关村科技园区海淀园企业博士后科研工作站北京北京科技大学计算机与通信工程学院北京

出处《计算机科学》 CSCD 北大核心 2013年第6期164-171,共8页 Computer Science

基金基于大规模复杂结构知识库的知识发现机理、模型与算法研究(60875029) 多关系频繁模式挖掘模型、方法与一般架构的研究(60675030) 基于多关系的模糊认知图挖掘模型、算法与评价机制研究(61175048)资助

关键词分布式Skyline计算 Map-Reduce框架分区策略 HSPD-Skyline算法 Distributed Skyline processing, Map-Reduce frame, Partitioning strategy, HSPD-Skyline

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献34

1Kung H T,Luccio F,Preparata F P.On finding the maxima of a set of vectors[J].Journal of the ACM,1975,22(4):469-476.
2Borzsonyi S,Kossmann D,Stocker K.The skyline operator[C]//Proc.of the 17th Int'l Conf.on Data Engineering.Heidelberg,IEEE Computer Society Press,2001:421-430.
3Chomicki J,Godfrey P,Gryz J,et al.Skyline with presorting[C]//Proc.of the 19th International Conference on Data Engineering (ICDE 2003).2003:717-816.
4Tan K-L,Eng P-K,Ooi B C.Efficient progressive skyline com putation[C]// Proc.of the 27th International Conference on Very Large Data Bases(VLDB 2001).2001:301-310.
5Godfrey P,Shipley R,Gryz J.Maximal vector computation in large data sets[C]//Proc.of the 31st international conference on Very large data bases(VLDB 2005).2005:229-240.
6Kossmann D,Ramsak F,Rost S.Shooting stars in the sky:an online algorithm for skyline queries[C]//Proceedings of the 28th International Conference on Very Large Data Bases.2002:275-286.
7周红福,宫学庆,郑凯,周傲英.基于高维空间的在线高效子空间Skyline算法——CSky[J].计算机学报,2007,30(8):1409-1417. 被引量：8
8程文聪,邹鹏,贾焰.基于小波概要的区间差分skyline研究[J].计算机科学,2010,37(11):160-165. 被引量：1
9付世昌,董一鸿,陈华辉,钱江波.基于道路网络不确定移动对象的连续概率Skyline查询[J].计算机科学,2011,38(7):152-156. 被引量：5
10Balke W-T,Güntzer U,Zheng J X.Efficient distributed skylining for web information systems[C]//Proc.of the 9th International Conference on Extending Database Technology (EDBT 2004).2004:256-273.

二级参考文献61

1周红福,宫学庆,郑凯,周傲英.基于高维空间的在线高效子空间Skyline算法——CSky[J].计算机学报,2007,30(8):1409-1417. 被引量：8
2Chomicki J, Godfrey P, Gryz J, et al. Skyline with pre- sorting[C]//Proceedings of the 19th International Confer- ence on Data Engineering (ICDE), Los Alamitos, CA, USA, 2003. Washington, DC, USA: IEEE Computer Society, 2003: 717-719.
3Tan K L, Eng P K, Ooi B C. Efficient progressive Skyline computation[C]//Proceedings of the 27th International Conference on Very Large Data Bases (VLDB), 2001. San Francisco, CA, USA: Morgan Kaufmann, 2001:301-310.
4Kossmann D, Ramsak F, Rost S. Shooting stars in the sky an online algorithm for Skyline queries[C]//Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), Hong Kong, China, 2002. San Francisco, CA, USA: Morgan Kaufmann, 2002: 275-286.
5Papadias D, Tao Y, Fu G, et al. Progressive Skyline com- putation in database systems[J]. ACM Transactions on Database Systems, 2005, 30(1): 41-82.
6Chan C Y, Jagadish H V, Tan K L, et al. Finding k-dominant Skylines in high dimensional space[C]//Pro- ceedings of the 25th ACM SIGMOD International Con- ference on Management of Data, Chicago, Illinois, USA, 2006. New York, NY, USA: ACM, 2006: 503-514.
7Lin X, Yuan Y, Wang W, et al. Stabbing the sky: efficient Skyline computation over sliding windows[C]//Procee- dings of the 21st International Conference on Data Engi- neering (ICDE), Tokyo, Japan, 2005. Washington, DC, USA: IEEE Computer Society, 2005:502-513.
8Balke W T, Guntzer U, Zheng J X. Efficient distributed skylining for Web information systems[C]//Proceedings of the 9th International Conference on Extending Data- base Technology (EDBT), Heraklion, Crete, Greece, 2004 [S.l.]: Springer, 2004: 256-273.
9Wang S, Beng Chin Ooi, Tung A K H, et al. Efficient Skyline query processing on peer-to-peer networks[C]// Proceedings of the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, 2007. Washington, DC, USA: IEEE Computer Society, 2007:1126-1135.
10Deng K, Zhou X, Shen H. Multi-source Skyline query processing in road networks[C]//Proceedings of the 23rd International Conference on Data Engineering (ICDE), lstanbul, Turkey, 2007. Washington, DC, USA: IEEE Computer Society, 2007: 796-805.

共引文献59

1吴广君,王树鹏,陈明,李超.海量结构化数据存储检索系统[J].计算机研究与发展,2012,49(S1):1-5. 被引量：31
2孟熠,刘玉葆,李启睿.一种基于压缩策略的高维空间子空间skyline查询算法[J].计算机研究与发展,2013,50(S1):101-108. 被引量：1
3魏晶晶,林锦贤.数据流上约束的子空间Skyline计算[J].计算机与数字工程,2008,36(10):39-44.
4向剑平,郑皎凌.Skyline计算在多维排序问题上的分析[J].太原师范学院学报（自然科学版）,2009,8(2):82-84. 被引量：2
5付世昌,董一鸿,唐燕琳,陈华辉,钱江波.基于事件的位置不确定移动对象连续概率Skyline查询[J].自动化学报,2011,37(7):836-848. 被引量：8
6丁琳琳,信俊昌,王国仁,黄山.基于Map-Reduce的海量数据高效Skyline查询处理[J].计算机学报,2011,34(10):1785-1796. 被引量：44
7曹金凤,董一鸿,王勇,钱江波,钟才明.不确定移动对象概率Skyline集的查询更新[J].计算机科学与探索,2012,6(5):443-455. 被引量：1
8徐云峰,Rudolf Fleischer.求解区间图K-连接最短路径问题的在线算法[J].计算机工程,2012,38(11):51-52.
9王晟,赵壁芳.面向云计算的数据管理技术研究[J].电脑知识与技术,2012,8(5):3209-3211. 被引量：4
10丘晓平,黄小兵.非确定性数据处理技术发展现状与挑战[J].现代计算机,2012,18(18):9-14.

同被引文献55

1Borzsonyi S,Kossmann D,Stocker K.The Skyline operator.Proceedings of the International Conference on Data Engineering(ICDE),Heidelberg,Germany,2001:421-430
2Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters.Communications of the ACM,2005,51(1):107-113
3Tan K L,Eng P K,Ooi B C.Efficient progressive skyline computation.Proceedings of the VLDB,Roma,Italy,2001:301-310
4Kossmann D,Ramsak F,Rost S.Shooting stars in the sky:an online algorithm for skyline queries.Proceedings of the Very Large Data Bases(VLDB),Hong Kong,China,2002:275-286
5Papadias D,Tao Y F,Fu G,et al.An optimal and progressive algorithm for skyline queries.Proceedings of ACM Management of Data(SIGMOD),San Diego,California,USA,2003:467-478
6Balke W T,Güntzer U,Zheng J X.Efficient distributed Skylining for web information systems.Proceedings of International Conference on Extending Database Technology(EDBT),Heraklion,Crete,Greece,2004:256-273
7Cui B,Lu H,Xu Q Q,et al.Parallel distributed processing of constrained Skyline queries by ltering.Proceedings of International Conference on Data Engineering(ICDE),Cancun,Mexico,2008:546-555
8Huang Z Y,Lu H,Ooi B C,et al.Continuous skyline queries for moving objects.IEEE Transactions on Knowledge and Data Engineering,2006,18(12):1645-1658
9Tian L,Wang L,Zou P,et al.Continuous monitoring of skyline query over highly dynamic moving objects.Proceedings of the AC M International Workshop on Data Engineering for Wireless and Mobile Access,Beijing,China,2007:59-66
10Vlachou A,Doulkeridis C,Kotidis Y.Angle-based space partitioning for efficient parallel skyline computation.Proceedings of ACM Management of Data(SIGMOD),Vancouver,BC,Canada,2008:227-238

引证文献5

1单观敏,董一鸿,何贤芒.基于MapReduce的连续Skyline查询[J].电信科学,2014,30(5):94-104.
2单观敏,董一鸿,何贤芒.基于MapReduce的连续概率Skyline查询[J].计算机科学与探索,2016,10(2):182-193.
3郑志蕴,李青,张行进,李全民,李钝.RDF数据的Skyline优化查询[J].计算机工程与设计,2016,37(4):933-937. 被引量：1
4林荫,石林,杨长春.重复投影数据库下的优化挖掘方法研究与仿真[J].计算机仿真,2016,33(5):318-321.
5张书旋,康海燕,闫涵.基于Skyline计算的社交网络关系数据隐私保护[J].计算机应用,2019,39(5):1394-1399. 被引量：7

二级引证文献8

1李晓林,严柯,陈灯,徐雅琴.地理本体高效转换和查询效率[J].计算机工程与设计,2018,39(3):721-726.
2牛志梅.基于多维量化评价的高私密性大数据访问控制[J].计算机仿真,2020,37(6):401-405. 被引量：5
3彭宁波.国内数据隐私保护研究综述[J].图书馆,2021(11):69-75. 被引量：16
4方世敏,朱建华.高效区块链用户隐私数据网故障溯源算法研究[J].现代电子技术,2022,45(2):162-166. 被引量：2
5方佳佳,李阳,郑泽敏.基于ADMM算法的网络连接数据变量选择[J].计算机系统应用,2022,31(1):11-20.
6徐敏.基于K-means聚类的电力大数据脱敏技术研究[J].电子设计工程,2022,30(19):175-178. 被引量：1
7苏美红.面向网络数据的Elastic Net回归模型[J].山西大学学报（自然科学版）,2023,46(3):604-616.
8芦伟,邵峰,黄明瑞.基于等保2.0的网络隐私数据联动防护控制方法[J].微型电脑应用,2024,40(8):216-219.

1印鉴,姚树宇,薛少锷,杨文新,刘玉葆.一种基于索引的高效k-支配Skyline算法[J].计算机学报,2010,33(7):1236-1245. 被引量：14
2崔文相,肖迎元,郝刚,王洪亚,邓华锋.基于MapReduce的Skyline查询处理算法[J].计算机科学,2016,43(6):35-38.
3朱琳,关佶红,周水庚.基于结构化对等网络的skyline计算[J].计算机应用与软件,2009,26(4):5-7.
4于红斌,李孝安.基于分区策略的蚂蚁算法[J].微处理机,2007,28(3):78-80. 被引量：1
5罗云,陈佳瑜.Map-Reduce并行计算框架下的Skyline查询及优化算法[J].网络安全技术与应用,2017(2):60-61.
6段静珊,周彦晖.基于近似的skyline算法的Web服务组合方法[J].信息通信,2014,27(3):1-5.
7林文亮,王勇,何倩.基于QoS感知的S-Skyline服务选择算法[J].桂林电子科技大学学报,2014,34(6):464-468. 被引量：1
8如何将C盘空间分区扩大[J].电脑爱好者（普及版）,2010(A02):83-83.
9黄伯虎,段振华,张金磊,聂鹏程.一种采用预排序策略的多核并行skyline算法[J].华中科技大学学报（自然科学版）,2010,38(10):31-34.
10单观敏,董一鸿,何贤芒.基于MapReduce的连续Skyline查询[J].电信科学,2014,30(5):94-104.

计算机科学

2013年第6期

浏览历史

内容加载中请稍等...

云环境下基于超球面投影分区的Skyline计算被引量：5

参考文献34

二级参考文献61

共引文献59

同被引文献55

引证文献5

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

云环境下基于超球面投影分区的Skyline计算 被引量：5

参考文献34

二级参考文献61

共引文献59

同被引文献55

引证文献5

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

云环境下基于超球面投影分区的Skyline计算被引量：5