云环境下面向数据密集型应用的数据选择策略研究被引量：1

Data Selection Strategy for Data-intensive Applications in Cloud

下载PDF

导出

摘要云环境下独立任务包数据密集型应用已出现在多个领域。鉴于多数据中心环境和"按需付费"的资源使用模式,这类应用在数据选择方面面临着新的挑战,主要表现为如何从内容相同但位置和访问成本均不同的数据集中选择合适的数据资源作为应用的输入。针对该问题,首先构建云环境和数据选择问题模型。在此基础上,将成本最小化的数据选择过程抽象为带权重集合的覆盖问题,提出一种新的数据选择策略,以在执行效率和经济成本间取得平衡。实验结果显示,提出的数据选择策略在保证成本优化的同时兼顾了执行效率,综合性能良好。 Bag-of-Tasks data-intensive applications in cloud have been appeared in many fields.Considering the decentralized data centers and ＂pay-on-demand＂ model for resource usage,these applications now are facing new challenges in data selection.One of the problems is how to choose the appropriate data resources from multiple datasets which have the same content but the different locations and access costs.Firstly,cloud environment and data selection problem were modeled.Based on the model,a cost-minimized data selection process was Abstracted as a weighted set covering problem,and a new data selection strategy was proposed to make a tradeoff between execution efficiency and economic cost.Results of experiments show that the strategy takes into consideration both cost optimization and execution efficiency,and achieves a comprehensive performance.

作者杜薇崔国华刘伟石飞燕位凯志

机构地区华中科技大学计算机科学与技术学院武汉理工大学计算机科学与技术学院南京大学计算机软件新技术国家重点实验室武汉大学软件工程国家重点实验室

出处《计算机科学》 CSCD 北大核心 2012年第6期30-34,71,共6页 Computer Science

基金国家自然科学基金(60703048) 南京大学计算机软件新技术国家重点实验室开放基金(KFKT2009B22) 武汉大学软件工程国家重点实验室开发基金(SKLSE20080720)资助

关键词云计算独立任务包数据密集型应用数据选择带权重集合覆盖 Cloud computing Bag-of-Tasks Data-intensive application Data selection Weighted set covering

分类号 TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献16

1Chervenak A, Foster I, Kesselman C, et al. The data grid: To- wards an architecture for the distributed management and analy- sis of large scientific dalasets[-JT. Journal of Network and Com- puter Applications, 2001,23 (3) : 187-200.
2李乔,郑啸.云计算研究现状综述[J].计算机科学,2011,38(4):32-37. 被引量：434
3郑湃,崔立真,王海洋,徐猛.云计算环境下面向数据密集型应用的数据布局策略与方法[J].计算机学报,2010,33(8):1472-1480. 被引量：122
4Venugopal S, Buyya R, Ramamohanarao K. A taxonomy of data grids for distributed data sharing, management and processing I-J]. ACM Computing Surveys, 2006,38 ( 1 ) : 1-53.
5Vazhkudai S, Tuecke S, Foster I. Replica selection in the Globus data gridEC3//Proeeedings of the 1st International Symposium on Cluster Computing and the Grid. 2001 ]06-113.
6Rahman R M, Barker K, Alhajj R. A predictive technique for replica selection in grid environment [C] ff Proceedings of the 7th International Symposium on Cluster Computing and the Grid. 2007 : 163-170.
7Rahman R M, Alhajj R, Barker K. Replica selection strategies in data grid[J]. Journal of Parallel and Distributed Computing, 2008,68(12) : 1561-157,1,.
8Venugopal S, Buyya R. An SCP-based heuristic approach for sche- duling distributed data-intensive application on global gridsl-J]. Journal of Parallel and Distributed Computing, 2008,68(4) : 471- 487.
9Vazhkudai S. Enabling the co-allocation of grid data transfers [-C // Proceedings of the 4th International Workshop on Grid Computing. 2003 : 44-51.
10Chang R S, Lin C F, Hsi S C. Accessing data from many servers simultaneously and adaptively in data grids[-J3. Future Genera- tion Computer System,2010,26(1):63-71.

二级参考文献64

1Deelman E,Chervenak A.Data management challenges of data-intensive scientific workflows//Proceedings of the IEEE International Symposium on Cluster Computing and the Grid(CCGRID).Lyon,France,2008:687-692.
2Deelman E,Blythe J,Gil Y,Kesselman C,Mehta G,Patil S,Su M H,Vahi K,Livny M.Pegasus:Mapping scientific workflows onto the grid//Proceedings of the European Across Grids Conference(AxGrids).Nicosia,Cyprus,2004:11-20.
3Ludascher B,Altintas I,Berkley C,Higgins D,Jaeger E,Jones M,Lee E A.Scientific workflow management and the Kepler system.Concurrency and Computation:Practice and Experience,2005,18(10):1039-1065.
4Oinn T,Addis M,Ferris J,Marvin D,Senger M,Greenwood M,Carver T,Glover K,Pocock M R,Wipat A,Li P.Taverna:A tool for the composition and enactment of bioinformatics workflows.Bioinformatics,2004,20(17):3045-3054.
5Ghemawat S,Gobioff H,Leung S T.The google file system.ACM SIGOPS Operating Systems Review,2003,37(5):29-43.
6Wang L,Tao J,Kunze M,Castellanos A C,Kramer D,Karl W.Scientific cloud computing:Early definition and experience//Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications(HPCC).Dalian,China,2008:825-830.
7Wieczorek M,Prodan R,Fahringer T.Scheduling of scientific workflows in the ASKALON grid environment.SIGMOD Record,2005,34(3):56-62.
8Baru C,Moore R,Rajasekar A,Wan M.The SDSC storage resource broker//Proceedings of the IBMCentre for Advanced Studies Conference.Toronto,Canada,1998:1-12.
9Churches D,Gombas G,Harrison A,Maassen J,Robinson C,Shields M,Taylor I,Wang I.Programming scientific and distributed workflow with Triana services.Concurrency and Computation:Practice and Experience,2006,18:1021-1037.
10Chervenak A,Deelman E,Foster I,Guy L,Hoschek W,Iamnitchi A,Kesselman C,Kunszt P,Ripeanu M,Schwartzkopf B,Stockinger H,Stockinger K,Tierney B.Giggle:A framework for constructing scalable replica location services//Proceedings of the ACM/IEEE Conference on Supercomputing.Baltimore,Maryland,USA,2002:1-17.

共引文献552

1郭平,但光祥.云计算中的混合加密算法[J].吉林大学学报（工学版）,2012,42(S1):327-331. 被引量：5
2赵鹏,苗高杉,王腾蛟,李红燕.CloudCD:基于云计算平台的交往社区发现系统[J].计算机研究与发展,2011,48(S3):386-390. 被引量：1
3张甜甜,崔立真.基于释放和重构的科学工作流数据布局策略[J].计算机研究与发展,2013,50(S2):71-76. 被引量：3
4朱承璋,张舸.基于网格环境的自适应资源调度策略[J].湖南理工学院学报（自然科学版）,2010,23(4):36-38.
5杨斌,朱承璋,杨红.基于网格的数字化图书馆服务资源注册与发现[J].湖南理工学院学报（自然科学版）,2010,23(4):39-41.
6张家贵,罗龙涛.基于云计算理念构建数字化教学资源平台[J].现代教育技术,2011,21(3):100-102. 被引量：71
7涂小强,陈海莲.浅谈云计算及其发展现状[J].科技广场,2011(3):234-237. 被引量：4
8罗军舟,金嘉晖,宋爱波,东方.云计算:体系架构与关键技术[J].通信学报,2011,32(7):3-21. 被引量：826
9廖志涛.云计算环境下面向数据密集型应用的数据布局探究[J].数字技术与应用,2011,29(8):210-210. 被引量：1
10曹咏春,刘小君.云测试综述[J].现代计算机,2011,17(19):25-29. 被引量：5

同被引文献14

1刘正伟,文中领,张海涛.云计算和云数据管理技术[J].计算机研究与发展,2012,49(S1):26-31. 被引量：170
2Labrinidis A,Jagadish H V.Challenges and Opportunities with Big Data[J].Proceedings of the VLDB Endowment,2012,5(12):2032-2033.
3Abirami S P,Shalini R.Linear Scheduling Strategy for Resource Allocation in Cloud Environment[J].Journal on Cloud Computing:Services and Architecture,2012,2(1):9-17.
4Dean J,Ghemawat S.MapReduce:Simplified Data Processing on Large Clusters[J].Communications of the ACM,2008,51(1):107-113.
5Venugopal S,Buyya R.An SCP-based Heuristic Approach for Scheduling Distributed Data-intensive Application on Global Grids[J].Journal of Parallel and Distributed Computing,2008,68(4):471-487.
6Yuan Dong,Yang Yun,Liu Xiao,et al.A Data Placement Strategy in Scientific Cloud Workflows[J].Future Generation Computer Systems,2010,26(8):1200-1214.
7Mc Cormick W T,Sehweitzer P J,White T W.Problem Decomposition and Data Reorganization by a Clustering Technique[J].Operations Research,1972,20(5):993-1009.
8郑湃,崔立真,王海洋,徐猛.云计算环境下面向数据密集型应用的数据布局策略与方法[J].计算机学报,2010,33(8):1472-1480. 被引量：122
9刘之家.一种基于云计算的负载均衡技术的研究[J].广西师范学院学报（自然科学版）,2011,28(2):93-96. 被引量：5
10刘少伟,孔令梅,任开军,宋君强,邓科峰,冷洪泽.云环境下优化科学工作流执行性能的两阶段数据放置与任务调度策略[J].计算机学报,2011,34(11):2121-2130. 被引量：65

引证文献1

1张晋芳,王清心,丁家满,刘彦君,黄心.一种云计算环境下大数据动态迁移策略[J].计算机工程,2016,42(5):13-17. 被引量：12

二级引证文献12

1薛健.数字图书馆异构数据自动迁移技术研究[J].自动化与仪器仪表,2019(1):15-17. 被引量：2
2史宝鹏,段迅,孔广黔,吴云.医疗云平台资源调度策略研究[J].计算机工程,2017,34(8):44-48. 被引量：8
3张杰,宋晓霞.云计算环境下网络异构数据节点失效概率密度分布计算[J].软件,2017,38(12):61-65.
4朱炜,王俊,周迅钊.基于负载均衡的医院云计算系统资源调度方案[J].计算机工程,2018,44(3):37-41. 被引量：10
5何金,黄海,李妍,周振亮.云计算环境下大数据视频图像的尺度空间融合算法[J].科学技术与工程,2018,18(8):243-248. 被引量：3
6张文,苏玉.云计算环境下的大数据特征挖掘技术研究[J].现代电子技术,2018,41(20):161-164. 被引量：7
7曾毅,马琳娟,鱼明.基于群体智能算法的大数据迁移策略研究[J].重庆理工大学学报（自然科学）,2019,33(6):122-127. 被引量：2
8金孟兰,罗恩韬,潘学文.基于C#程序语言数据迁移工具的设计与实现[J].大众科技,2019,21(10):1-3. 被引量：1
9陈志勇.智能电网的大数据处理技术应用[J].集成电路应用,2020,37(2):78-79. 被引量：5
10马庆功.区间犹豫模糊M-对称平均及其群决策模型[J].计算机工程与应用,2020,56(11):156-163. 被引量：1

1马廷淮,葛荐,王亚里.动作识别训练数据的扩展研究[J].计算机与数字工程,2010,38(11):22-25. 被引量：1
2董永贵,孙照焱,贾惠波.时间序列中异常值检测的负向选择算法[J].机械工程学报,2004,40(10):30-34. 被引量：15
3林艺东,张燕兰,林梦雷.覆盖族动态变化时粗集计算的矩阵方法[J].计算机应用,2015,35(11):3208-3212. 被引量：1
4孙大元,闫冬.基于FPGA的冗余CAN总线通信节点设计[J].无线电工程,2016,46(7):71-75. 被引量：1

计算机科学

2012年第6期

浏览历史

内容加载中请稍等...

云环境下面向数据密集型应用的数据选择策略研究被引量：1

参考文献16

二级参考文献64

共引文献552

同被引文献14

引证文献1

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

云环境下面向数据密集型应用的数据选择策略研究 被引量：1

参考文献16

二级参考文献64

共引文献552

同被引文献14

引证文献1

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

云环境下面向数据密集型应用的数据选择策略研究被引量：1