摘要
日益旺盛的跨云存算联调需求对跨云数据访问速度提出较高要求.因此,跨云数据访问速度较高的基于数据冗余技术(纠删码和多副本)的跨云数据访问方法逐渐受到关注.其中,基于纠删码的跨云数据访问方法因其存储开销较低、容错性较高而成为当前研究热点.为通过缩短编码块传输用时以提高数据访问速度,现有基于纠删码的跨云数据访问方法尝试引入缓存技术并优化编码数据访问方案.然而,由于现有方法的缓存管理粒度较粗且未协同优化缓存管理与编码数据访问方案,导致其存在缓存命中量低、缓存命中增效低、低传输速度编码块访问量大等问题,使得其编码块传输用时仍较长.为此,首先提出了一种基于星际文件系统(interplanetary file system,IPFS)的跨云存储系统框架(IPFS-based cross-cloud storage system framework,IBCS),可基于IPFS数据分片管理机制实现细粒度的缓存管理,从而可提高缓存命中量.然后,提出一种面向存算联调的跨云纠删码自适应数据访问方法(adaptive erasure-coded data access method for cross-cloud collaborative scheduling of storage and computation,AECAM).AECAM以编码块(含缓存编码块)与数据访问节点的分布为依据评估数据访问过程中各编码块的传输速度,并据此制定可避免访问低传输速度编码块的编码数据访问方案.此外,AECAM可识别出其制定编码数据访问方案时易选中且实际传输速度较低的编码块,并将其缓存在数据访问节点附近,从而可同时提高缓存命中量和命中增效.最后,基于IBCS和AECAM构建了面向跨云存算联调的存储系统(cross-cloud storage system for collaborative scheduling of storage and computation,C2S2).跨云环境下的实验表明,相较于现有引入缓存的基于纠删码的存储系统,C2S2可以将数据访问速度提高75.22%~81.29%.
Nowadays,the increasing demand for cross-cloud collaborative scheduling of storage and computation puts high demands on cross-cloud data access speed.Therefore,cross-cloud data access methods based on data redundancy techniques(erasure coding and multiple-duplicate)with high cross-cloud data access speed are gaining attention.Among them,the cross-cloud data access method based on erasure coding has become a hot research topic because of its low storage overhead and high fault tolerance.In order to improve the data access speed by shortening the transmission time of coded blocks,existing cross-cloud data access methods based on erasure coding introduce caching techniques and optimize the coded data access scheme.However,due to the coarse granularity of cache management and the lack of coordinated optimization of cache management and coded data access scheme,the existing methods suffer from low cache hits,low cache hit efficiency,and high access volume of coded blocks with low transmission speed,which prolong the coded block transmission time.To this end,we first propose an IPFS-based cross-cloud storage system framework(IBCS)that can realize fine-grained cache management based on IPFS data slice management mechanism,and thus can improve cache hits.Then,we propose an adaptive erasure-coded data access method for cross-cloud collaborative scheduling of storage and computation(AECAM)that evaluates the transmission speed of each coded block during data access based on the distribution of coded blocks(including cached coded blocks)and data access nodes,and accordingly formulates a coded data access scheme that can avoid accessing low transmission speed coded blocks.In addition,AECAM identifies coded blocks that are easily selected in the coded data access scheme and have low transmission speed,and caches them near the data access nodes,thus improving both cache hits and hit efficiency.We build a cross-cloud storage system for collaborative scheduling of storage and computation(C2S2)based on IBCS and AECAM.Compared with existing erasure-coded storage systems that introduce caching,experiments in a cross-cloud environment show that C2S2 can improve data access speed by 75.22%−81.29%.
作者
张凯鑫
王意洁
包涵
阚浚晖
Zhang Kaixin;Wang Yijie;Bao Han;Kan Junhui(National Key Laboratory of Parallel and Distributed Computing(National University of Defense Technology),Changsha 410073;College of Computer Science and Technology,National University of Defense Technology,Changsha 410073)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2024年第3期571-588,共18页
Journal of Computer Research and Development
基金
科技创新2030——“新一代人工智能”重大项目(2022ZD0115302)
国家自然科学基金项目(61379052)
国家教育部科研创新基金项目(2018A02002)
湖南省自然科学杰出青年基金项目(14JJ1026)。
关键词
跨云存算联调
纠删码
数据访问技术
星际文件系统
缓存
cross-cloud collaborative scheduling of storage and computation
erasure coding
data access technology
IPFS
cache