期刊文献+
共找到29篇文章
< 1 2 >
每页显示 20 50 100
Health Data Deduplication Using Window Chunking-Signature Encryption in Cloud
1
作者 G.Neelamegam P.Marikkannu 《Intelligent Automation & Soft Computing》 SCIE 2023年第4期1079-1093,共15页
Due to the development of technology in medicine,millions of health-related data such as scanning the images are generated.It is a great challenge to store the data and handle a massive volume of data.Healthcare data ... Due to the development of technology in medicine,millions of health-related data such as scanning the images are generated.It is a great challenge to store the data and handle a massive volume of data.Healthcare data is stored in the cloud-fog storage environments.This cloud-Fog based health model allows the users to get health-related data from different sources,and duplicated informa-tion is also available in the background.Therefore,it requires an additional sto-rage area,increase in data acquisition time,and insecure data replication in the environment.This paper is proposed to eliminate the de-duplication data using a window size chunking algorithm with a biased sampling-based bloomfilter and provide the health data security using the Advanced Signature-Based Encryp-tion(ASE)algorithm in the Fog-Cloud Environment(WCA-BF+ASE).This WCA-BF+ASE eliminates the duplicate copy of the data and minimizes its sto-rage space and maintenance cost.The data is also stored in an efficient and in a highly secured manner.The security level in the cloud storage environment Win-dows Chunking Algorithm(WSCA)has got 86.5%,two thresholds two divisors(TTTD)80%,Ordinal in Python(ORD)84.4%,Boom Filter(BF)82%,and the proposed work has got better security storage of 97%.And also,after applying the de-duplication process,the proposed method WCA-BF+ASE has required only less storage space for variousfile sizes of 10 KB for 200,400 MB has taken only 22 KB,and 600 MB has required 35 KB,800 MB has consumed only 38 KB,1000 MB has taken 40 KB of storage spaces. 展开更多
关键词 Health data ENCRYPTION chunks CLOUD FOG deduplication bloomfilter Algorithm 3:Generation of Key
下载PDF
Hash-Indexing Block-Based Deduplication Algorithm for Reducing Storage in the Cloud
2
作者 D.Viji S.Revathy 《Computer Systems Science & Engineering》 SCIE EI 2023年第7期27-42,共16页
Cloud storage is essential for managing user data to store and retrieve from the distributed data centre.The storage service is distributed as pay a service for accessing the size to collect the data.Due to the massiv... Cloud storage is essential for managing user data to store and retrieve from the distributed data centre.The storage service is distributed as pay a service for accessing the size to collect the data.Due to the massive amount of data stored in the data centre containing similar information and file structures remaining in multi-copy,duplication leads to increase storage space.The potential deduplication system doesn’t make efficient data reduction because of inaccuracy in finding similar data analysis.It creates a complex nature to increase the storage consumption under cost.To resolve this problem,this paper proposes an efficient storage reduction called Hash-Indexing Block-based Deduplication(HIBD)based on Segmented Bind Linkage(SBL)Methods for reducing storage in a cloud environment.Initially,preprocessing is done using the sparse augmentation technique.Further,the preprocessed files are segmented into blocks to make Hash-Index.The block of the contents is compared with other files through Semantic Content Source Deduplication(SCSD),which identifies the similar content presence between the file.Based on the content presence count,the Distance Vector Weightage Correlation(DVWC)estimates the document similarity weight,and related files are grouped into a cluster.Finally,the segmented bind linkage compares the document to find duplicate content in the cluster using similarity weight based on the coefficient match case.This implementation helps identify the data redundancy efficiently and reduces the service cost in distributed cloud storage. 展开更多
关键词 Cloud computing deduplication hash indexing relational content analysis document clustering cloud storage record linkage
下载PDF
Homogeneous Batch Memory Deduplication Using Clustering of Virtual Machines
3
作者 N.Jagadeeswari V.Mohan Raj 《Computer Systems Science & Engineering》 SCIE EI 2023年第1期929-943,共15页
Virtualization is the backbone of cloud computing,which is a developing and widely used paradigm.Byfinding and merging identical memory pages,memory deduplication improves memory efficiency in virtualized systems.Kern... Virtualization is the backbone of cloud computing,which is a developing and widely used paradigm.Byfinding and merging identical memory pages,memory deduplication improves memory efficiency in virtualized systems.Kernel Same Page Merging(KSM)is a Linux service for memory pages sharing in virtualized environments.Memory deduplication is vulnerable to a memory disclosure attack,which uses covert channel establishment to reveal the contents of other colocated virtual machines.To avoid a memory disclosure attack,sharing of identical pages within a single user’s virtual machine is permitted,but sharing of contents between different users is forbidden.In our proposed approach,virtual machines with similar operating systems of active domains in a node are recognised and organised into a homogenous batch,with memory deduplication performed inside that batch,to improve the memory pages sharing efficiency.When compared to memory deduplication applied to the entire host,implementation details demonstrate a significant increase in the number of pages shared when memory deduplication applied batch-wise and CPU(Central processing unit)consumption also increased. 展开更多
关键词 Kernel same page merging memory deduplication virtual machine sharing content-based sharing
下载PDF
Using multi-threads to hide deduplication I/O latency with low synchronization overhead 被引量:1
4
作者 朱锐 秦磊华 +1 位作者 周敬利 郑寰 《Journal of Central South University》 SCIE EI CAS 2013年第6期1582-1591,共10页
Data deduplication, as a compression method, has been widely used in most backup systems to improve bandwidth and space efficiency. As data exploded to be backed up, two main challenges in data deduplication are the C... Data deduplication, as a compression method, has been widely used in most backup systems to improve bandwidth and space efficiency. As data exploded to be backed up, two main challenges in data deduplication are the CPU-intensive chunking and hashing works and the I/0 intensive disk-index access latency. However, CPU-intensive works have been vastly parallelized and speeded up by multi-core and many-core processors; the I/0 latency is likely becoming the bottleneck in data deduplication. To alleviate the challenge of I/0 latency in multi-core systems, multi-threaded deduplication (Multi-Dedup) architecture was proposed. The main idea of Multi-Dedup was using parallel deduplication threads to hide the I/0 latency. A prefix based concurrent index was designed to maintain the internal consistency of the deduplication index with low synchronization overhead. On the other hand, a collisionless cache array was also designed to preserve locality and similarity within the parallel threads. In various real-world datasets experiments, Multi-Dedup achieves 3-5 times performance improvements incorporating with locality-based ChunkStash and local-similarity based SiLo methods. In addition, Multi-Dedup has dramatically decreased the synchronization overhead and achieves 1.5-2 times performance improvements comparing to traditional lock-based synchronization methods. 展开更多
关键词 MULTI-THREAD MULTI-CORE parallel data deduplication
下载PDF
Secured Data Storage Using Deduplication in Cloud Computing Based on Elliptic Curve Cryptography 被引量:1
5
作者 N.Niyaz Ahamed N.Duraipandian 《Computer Systems Science & Engineering》 SCIE EI 2022年第4期83-94,共12页
The tremendous development of cloud computing with related technol-ogies is an unexpected one.However,centralized cloud storage faces few chal-lenges such as latency,storage,and packet drop in the network.Cloud storag... The tremendous development of cloud computing with related technol-ogies is an unexpected one.However,centralized cloud storage faces few chal-lenges such as latency,storage,and packet drop in the network.Cloud storage gets more attention due to its huge data storage and ensures the security of secret information.Most of the developments in cloud storage have been positive except better cost model and effectiveness,but still data leakage in security are billion-dollar questions to consumers.Traditional data security techniques are usually based on cryptographic methods,but these approaches may not be able to with-stand an attack from the cloud server's interior.So,we suggest a model called multi-layer storage(MLS)based on security using elliptical curve cryptography(ECC).The suggested model focuses on the significance of cloud storage along with data protection and removing duplicates at the initial level.Based on divide and combine methodologies,the data are divided into three parts.Here,thefirst two portions of data are stored in the local system and fog nodes to secure the data using the encoding and decoding technique.The other part of the encrypted data is saved in the cloud.The viability of our model has been tested by research in terms of safety measures and test evaluation,and it is truly a powerful comple-ment to existing methods in cloud storage. 展开更多
关键词 Cloud storage deduplication fog computing and elliptic curve cryptography
下载PDF
Privacy-Enhanced Data Deduplication Computational Intelligence Technique for Secure Healthcare Applications
6
作者 Jinsu Kim Sungwook Ryu Namje Park 《Computers, Materials & Continua》 SCIE EI 2022年第2期4169-4184,共16页
A significant number of cloud storage environments are already implementing deduplication technology.Due to the nature of the cloud environment,a storage server capable of accommodating large-capacity storage is requi... A significant number of cloud storage environments are already implementing deduplication technology.Due to the nature of the cloud environment,a storage server capable of accommodating large-capacity storage is required.As storage capacity increases,additional storage solutions are required.By leveraging deduplication,you can fundamentally solve the cost problem.However,deduplication poses privacy concerns due to the structure itself.In this paper,we point out the privacy infringement problemand propose a new deduplication technique to solve it.In the proposed technique,since the user’s map structure and files are not stored on the server,the file uploader list cannot be obtained through the server’s meta-information analysis,so the user’s privacy is maintained.In addition,the personal identification number(PIN)can be used to solve the file ownership problemand provides advantages such as safety against insider breaches and sniffing attacks.The proposed mechanism required an additional time of approximately 100 ms to add a IDRef to distinguish user-file during typical deduplication,and for smaller file sizes,the time required for additional operations is similar to the operation time,but relatively less time as the file’s capacity grows. 展开更多
关键词 Computational intelligence CLOUD MULTIMEDIA data deduplication
下载PDF
Implementation and Validation of the Optimized Deduplication Strategy in Federated Cloud Environment
7
作者 Nipun Chhabra Manju Bala Vrajesh Sharma 《Computers, Materials & Continua》 SCIE EI 2022年第4期2019-2035,共17页
Cloud computing technology is the culmination of technical advancements in computer networks,hardware and software capabilities that collectively gave rise to computing as a utility.It offers a plethora of utilities t... Cloud computing technology is the culmination of technical advancements in computer networks,hardware and software capabilities that collectively gave rise to computing as a utility.It offers a plethora of utilities to its clients worldwide in a very cost-effective way and this feature is enticing users/companies to migrate their infrastructure to cloud platform.Swayed by its gigantic capacity and easy access clients are uploading replicated data on cloud resulting in an unnecessary crunch of storage in datacenters.Many data compression techniques came to rescue but none could serve the purpose for the capacity as large as a cloud,hence,researches were made to de-duplicate the data and harvest the space from exiting storage capacity which was going in vain due to duplicacy of data.For providing better cloud services through scalable provisioning of resources,interoperability has brought many Cloud Service Providers(CSPs)under one umbrella and termed it as Cloud Federation.Many policies have been devised for private and public cloud deployment models for searching/eradicating replicated copies using hashing techniques.Whereas the exploration for duplicate copies is not restricted to any one type of CSP but to a set of public or private CSPs contributing to the federation.It was found that even in advanced deduplication techniques for federated clouds,due to the different nature of CSPs,a single file is stored at private as well as public group in the same cloud federation which can be handled if an optimized deduplication strategy be rendered for addressing this issue.Therefore,this study has been aimed to further optimize a deduplication strategy for federated cloud environment and suggested a central management agent for the federation.It was perceived that work relevant to this is not existing,hence,in this paper,the concept of federation agent has been implemented and deduplication technique following file level has been used for the accomplishment of this approach. 展开更多
关键词 Federation agent deduplication in federated cloud central management agent for cloud federation interoperability in cloud computing bloom filters cloud computing cloud data storage
下载PDF
SRSC: Improving Restore Performance for Deduplication-Based Storage Systems
8
作者 ZUO Chunxue WANG Fang +2 位作者 TANG Xiaolan ZHANG Yucheng FENG Dan 《ZTE Communications》 2019年第2期59-66,共8页
Modern backup systems exploit data deduplication technology to save stor-age space whereas suffering from the fragmentation problem caused by deduplication.Fragmentation degrades the restore performance because of res... Modern backup systems exploit data deduplication technology to save stor-age space whereas suffering from the fragmentation problem caused by deduplication.Fragmentation degrades the restore performance because of restoring the chunks thatare scattered all over different containers. To improve the restore performance, thestate-of-the-art History Aware Rewriting Algorithm(HAR) is proposed to collect frag-mented chunks in the last backup and rewrite them in the next backup. However, dueto rewriting fragmented chunks in the next backup, HAR fails to eliminate internalfragmentation caused by self-referenced chunks(that exist more than two times in abackup) in the current backup, thus degrading the restore performance. In this paper,we propose Selectively Rewriting Self-Referenced Chunks(SRSC), a scheme that de-signs a buffer to simulate a restore cache, identify internal fragmentation in the cacheand selectively rewrite them. Our experimental results based on two real-world datas-ets show that SRSC improves the restore performance by 45% with an acceptable sac-rifice of the deduplication ratio. 展开更多
关键词 DATA deduplication FRAGMENTATION RESTORE PERFORMANCE
下载PDF
Differentially Authorized Deduplication System Based on Blockchain
9
作者 ZHAO Tian LI Hui +4 位作者 YANG Xin WANG Han ZENG Ming GUO Haisheng WANG Dezheng 《ZTE Communications》 2021年第2期67-76,共10页
In architecture of cloud storage, the deduplication technology encrypted with theconvergent key is one of the important data compression technologies, which effectively improvesthe utilization of space and bandwidth. ... In architecture of cloud storage, the deduplication technology encrypted with theconvergent key is one of the important data compression technologies, which effectively improvesthe utilization of space and bandwidth. To further refine the usage scenarios for varioususer permissions and enhance user’s data security, we propose a blockchain-based differentialauthorized deduplication system. The proposed system optimizes the traditionalProof of Vote (PoV) consensus algorithm and simplifies the existing differential authorizationprocess to realize credible management and dynamic update of authority. Based on thedecentralized property of blockchain, we overcome the centralized single point fault problemof traditional differentially authorized deduplication system. Besides, the operations oflegitimate users are recorded in blocks to ensure the traceability of behaviors. 展开更多
关键词 convergent key deduplication blockchain differential authorization
下载PDF
dCACH: Content Aware Clustered and Hierarchical Distributed Deduplication
10
作者 Girum Dagnaw Ke Zhou Hua Wang 《Journal of Software Engineering and Applications》 2019年第11期460-490,共31页
In deduplication, index-lookup disk bottleneck is a major obstacle which limits the throughput of backup processes. One way to minimize the effect of this issue and boost speed is to use very high course-grained chunk... In deduplication, index-lookup disk bottleneck is a major obstacle which limits the throughput of backup processes. One way to minimize the effect of this issue and boost speed is to use very high course-grained chunks for deduplication at a cost of low storage saving and limited scalability. Another way is to distribute the deduplication process among multiple nodes but this approach introduces storage node island effect and also incurs high communication cost. In this paper, we explore dCACH, a content-aware clustered and hierarchical deduplication system, which implements a hybrid of inline course grained and offline fine-grained distributed deduplication where routing decisions are made for a set of files instead of single files. It utilizes bloom filters for detecting similarity between a data stream and previous data streams and performs stateful routing which solves the storage node island problem. Moreover, it exploits the negligibly small amount of content shared among chunks from different file types to create groups of files and deduplicate each group in their own fingerprint index space. It implements hierarchical deduplication to reduce the size of fingerprint indexes at the global level, where only files and big sized segments are deduplicated. Locality is created and exploited first using the big sized segments deduplicated at the global level and second by routing a set of consecutive files together to one storage node. Furthermore, the use of bloom filter for similarity detection between streams has low communication and computation cost while it enables to achieve duplicate elimination performance comparable to single node deduplication. dCACH is evaluated using a prototype deployed on a server environment distributed over four separate machines. It is shown to have 10× the speed of Extreme_Binn with a minimal communication overhead, while its duplicate elimination effectiveness is on a par with a single node deduplication system. 展开更多
关键词 Clustered deduplication Content Aware GROUPING HIERARCHICAL deduplication Stateful Routing SIMILARITY BLOOM FILTERS
下载PDF
Threat Model and Defense Scheme for Side-Channel Attacks in Client-Side Deduplication 被引量:2
11
作者 Guanxiong Ha Hang Chen +1 位作者 Chunfu Jia Mingyue Li 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第1期1-12,共12页
In cloud storage,client-side deduplication is widely used to reduce storage and communication costs.In client-side deduplication,if the cloud server detects that the user’s outsourced data have been stored,then clien... In cloud storage,client-side deduplication is widely used to reduce storage and communication costs.In client-side deduplication,if the cloud server detects that the user’s outsourced data have been stored,then clients will not need to reupload the data.However,the information on whether data need to be uploaded can be used as a side-channel,which can consequently be exploited by adversaries to compromise data privacy.In this paper,we propose a new threat model against side-channel attacks.Different from existing schemes,the adversary could learn the approximate ratio of stored chunks to unstored chunks in outsourced files,and this ratio will affect the probability that the adversary compromises the data privacy through side-channel attacks.Under this threat model,we design two defense schemes to minimize privacy leakage,both of which design interaction protocols between clients and the server during deduplication checks to reduce the probability that the adversary compromises data privacy.We analyze the security of our schemes,and evaluate their performances based on a real-world dataset.Compared with existing schemes,our schemes can better mitigate data privacy leakage and have a slightly lower communication cost. 展开更多
关键词 cloud storage deduplication side-channel PRIVACY
原文传递
A Lookahead Read Cache: Improving Read Performance for Deduplication Backup Storage 被引量:4
12
作者 Dongchul Park Ziqi Fan +1 位作者 Young Jin Nam David H. C. Du 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第1期26-40,共15页
Data deduplication (dedupe for short) is a special data compression technique. It has been widely adopted to save backup time as well as storage space, particularly in backup storage systems. Therefore, most dedupe ... Data deduplication (dedupe for short) is a special data compression technique. It has been widely adopted to save backup time as well as storage space, particularly in backup storage systems. Therefore, most dedupe research has primarily focused on improving dedupe write performance. However, backup storage dedupe read performance is also a crucial problem for storage recovery. This paper designs a new dedupe storage read cache for backup applications that improves read performance by exploiting a special characteristic: the read sequence is the same as the write sequence. Consequently, for better cache utilization, by looking ahead for future references within a moving window, it evicts victims from the cache having the smallest future access. Moreover~ to further improve read cache performance. it maintains a small log buffer to judiciously cache future access data chunks. Extensive experiments with real-world backup workloads demonstrate that the proposed read cache scheme improves read performance by up to 64.3% 展开更多
关键词 deduplication dedupe read cache BACKUP
原文传递
Metadata Feedback and Utilization for Data Deduplication Across WAN 被引量:2
13
作者 Bing Zhou Jiang-Tao Wen 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第3期604-623,共20页
Data deduplication for file communication across wide area network (WAN) in the applications such as file synchronization and mirroring of cloud environments usually achieves significant bandwidth saving at the cost... Data deduplication for file communication across wide area network (WAN) in the applications such as file synchronization and mirroring of cloud environments usually achieves significant bandwidth saving at the cost of significant time overheads of data deduplication. The time overheads include the time required for data deduplication at two geographi- cally distributed nodes (e.g., disk access bottleneck) and the duplication query/answer operations between the sender and the receiver, since each query or answer introduces at least one round-trip time (RTT) of latency. In this paper, we present a data deduplication system across WAN with metadata feedback and metadata utilization (MFMU), in order to harness the data deduplication related time overheads. In the proposed MFMU system, selective metadata feedbacks from the receiver to the sender are introduced to reduce the number of duplication query/answer operations. In addition, to harness the metadata related disk I/O operations at the receiver, as well as the bandwidth overhead introduced by the metadata feedbacks, a hysteresis hash re-chunking mechanism based metadata utilization component is introduced. Our experimental results demonstrated that MFMU achieved an average of 20%~40% deduplication acceleration with the bandwidth saving ratio not reduced by the metadata feedbacks, as compared with the "baseline" content defined chunking (CDC) used in LBFS (Low-bandwith Network File system) and exiting state-of-the-art Bimodal chunking algorithms based data deduplication solutions. 展开更多
关键词 data deduplication wide area network (WAN) metadata feedback metadata utilization
原文传递
Leach: an automatic learning cache for inline primary deduplication system 被引量:2
14
作者 Bin LIN Shanshan LI Xiangke LIAO Jing ZHANG Xiaodong LIU 《Frontiers of Computer Science》 SCIE EI CSCD 2014年第2期175-183,共9页
Deduplication technology has been increasingly used to reduce storage costs. Though it has been successfully applied to backup and archival systems, existing techniques can hardly be deployed in primary storage system... Deduplication technology has been increasingly used to reduce storage costs. Though it has been successfully applied to backup and archival systems, existing techniques can hardly be deployed in primary storage systems due to the associated latency cost of detecting duplicated data, where every unit has to be checked against a substantially large fin- gerprint index before it is written. In this paper we introduce Leach, for inline primary storage, a self-learning in-memory fingerprints cache to reduce the writing cost in deduplica- tion system. Leach is motivated by the characteristics of real- world I/O workloads: highly data skew exist in the access patterns of duplicated data. Leach adopts a splay tree to or- ganize the on-disk fingerprint index, automatically learns the access patterns and maintains hot working sets in cache mem- ory, with a goal to service a majority of duplicated data de- tection. Leveraging the working set property, Leach provides optimization to reduce the cost of splay operations on the fin- gerprint index and cache updates. In comprehensive experi- ments on several real-world datasets, Leach outperforms con- ventional LRU (least recently used) cache policy by reducing the number of cache misses, and significantly improves write performance without great impact to cache hits. 展开更多
关键词 deduplication duplicate detection splay tree cache
原文传递
AR-Dedupe: An Efficient Deduplication Approach for Cluster Deduplication System 被引量:2
15
作者 邢玉轩 肖侬 +2 位作者 刘芳 孙振 何晚辉 《Journal of Shanghai Jiaotong university(Science)》 EI 2015年第1期76-81,共6页
As data are growing rapidly in data centers,inline cluster deduplication technique has been widely used to improve storage efficiency and data reliability.However,there are some challenges faced by the cluster dedupli... As data are growing rapidly in data centers,inline cluster deduplication technique has been widely used to improve storage efficiency and data reliability.However,there are some challenges faced by the cluster deduplication system:the decreasing data deduplication rate with the increasing deduplication server nodes,high communication overhead for data routing,and load balance to improve the throughput of the system.In this paper,we propose a well-performed cluster deduplication system called AR-Dedupe.The experimental results of two real datasets demonstrate that AR-Dedupe can achieve a high data deduplication rate with a low communication overhead and keep the system load balancing well at the same time through a new data routing algorithm.In addition,we utilize application-aware mechanism to speed up the index of handprints in the routing server which has a 30%performance improvement. 展开更多
关键词 cluster deduplication system routing algorithm application-aware
原文传递
Updatable block-level deduplication of encrypted data with efficient auditing in cloud storage 被引量:1
16
作者 Dang Qianlong Xie Ying +1 位作者 Li Donghao Hu Gongcheng 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2019年第3期56-72,共17页
Updatable block-level message-locked encryption(MLE) can efficiently update encrypted data, and public auditing can verify the integrity of cloud storage data by utilizing a third party auditor(TPA). However, there ar... Updatable block-level message-locked encryption(MLE) can efficiently update encrypted data, and public auditing can verify the integrity of cloud storage data by utilizing a third party auditor(TPA). However, there are seldom schemes supporting both updatable block-level deduplication and public auditing. In this paper, an updatable block-level deduplication scheme with efficient auditing is proposed based on a tree-based authenticated structure. In the proposed scheme, the cloud server(CS) can perform block-level deduplication, and the TPA achieves integrity auditing tasks. When a data block is updated, the ciphertext and auditing tags could be updated efficiently. The security analysis demonstrates that the proposed scheme can achieve privacy under chosen distribution attacks in secure deduplication and resist uncheatable chosen distribution attacks(UNC-CDA) in proof of ownership(PoW). Furthermore, the integrity auditing process is proven secure under adaptive chosen-message attacks. Compared with previous relevant schemes, the proposed scheme achieves better functionality and higher efficiency. 展开更多
关键词 data update operation block-level deduplication EFFICIENT AUDITING tree-based authenticated structure proof of OWNERSHIP
原文传递
Prefetch-aware fingerprint cache management for data deduplication systems 被引量:1
17
作者 Mei LI Hongjun ZHANG +1 位作者 Yanjun WU Chen ZHAO 《Frontiers of Computer Science》 SCIE EI CSCD 2019年第3期500-515,共16页
Data deduplication has been widely utilized in large-scale storage systems, particularly backup systems. Data deduplication systems typically divide data streams into chunks and identify redundant chunks by comparing ... Data deduplication has been widely utilized in large-scale storage systems, particularly backup systems. Data deduplication systems typically divide data streams into chunks and identify redundant chunks by comparing chunk fingerprints. Maintaining all fingerprints in memory is not cost-effective because fingerprint indexes are typically very large. Many data deduplication systems maintain a fingerprint cache in memory and exploit fingerprint prefetching to accelerate the deduplication process. Although fingerprint prefetching can improve the performance of data deduplication systems by leveraging the locality of workloads, inaccurately prefetched fingerprints may pollute the cache by evicting useful fingerprints. We observed that most of the prefetched fingerprints in a wide variety of applications are never used or used only once, which severely limits the performance of data deduplication systems. We introduce a prefetch-aware fingerprint cache management scheme for data deduplication systems (PreCache) to alleviate prefetch-related cache pollution. We propose three prefetch-aware fingerprint cache replacement policies (PreCache-UNU, PreCache-UOO, and PreCache-MIX) to handle different types of cache pollution. Additionally, we propose an adaptive policy selector to select suitable policies for prefetch requests. We implement PreCache on two representative data deduplication systems (Block Locality Caching and SiLo) and evaluate its performance utilizing three real-world workloads (Kernel, MacOS, and Homes). The experimental results reveal that PreCache improves deduplication throughput by up to 32.22% based on a reduction of on-disk fingerprint index lookups and improvement of the deduplication ratio by mitigating prefetch-related fingerprint cache pollution. 展开更多
关键词 DATA deduplication FINGERPRINT prefetch FINGERPRINT CACHE
原文传递
Public Auditing for Encrypted Data with Client-Side Deduplication in Cloud Storage 被引量:4
18
作者 HE Kai HUANG Chuanhe +3 位作者 ZHOU Hao SHI Jiaoli WANG Xiaomao DAN Feng 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2015年第4期291-298,共8页
Storage auditing and client-side deduplication techniques have been proposed to assure data integrity and improve storage efficiency, respectively. Recently, a few schemes start to consider these two different aspects... Storage auditing and client-side deduplication techniques have been proposed to assure data integrity and improve storage efficiency, respectively. Recently, a few schemes start to consider these two different aspects together. However, these schemes either only support plaintext data file or have been proved insecure. In this paper, we propose a public auditing scheme for cloud storage systems, in which deduplication of encrypted data and data integrity checking can be achieved within the same framework. The cloud server can correctly check the ownership for new owners and the auditor can correctly check the integrity of deduplicated data. Our scheme supports deduplication of encrypted data by using the method of proxy re-encryption and also achieves deduplication of data tags by aggregating the tags from different owners. The analysis and experiment results show that our scheme is provably secure and efficient. 展开更多
关键词 public auditing data integrity storage deduplication cloud storage
原文传递
Endurable SSD-Based Read Cache for Improving the Performance of Selective Restore from Deduplication Systems
19
作者 Jian Liu Yun-Peng Chai +1 位作者 Xiao Qin Yao-Hong Liu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2018年第1期58-78,共21页
Deduplication has been commonly used in both enterprise storage systems and cloud storage. To overcome the performance challenge for the selective restore operations of deduplication systems, solid-state-drive-based ... Deduplication has been commonly used in both enterprise storage systems and cloud storage. To overcome the performance challenge for the selective restore operations of deduplication systems, solid-state-drive-based (i.e., SSD-based) re^d cache cm, be deployed for speeding up by caching popular restore contents dynamically. Unfortunately, frequent data updates induced by classical cache schemes (e.g., LRU and LFU) significantly shorten SSDs' lifetime while slowing down I/O processes in SSDs. To address this problem, we propose a new solution -- LOP-Cache to greatly improve tile write durability of SSDs as well as I/O performance by enlarging the proportion of long-term popular (LOP) data among data written into SSD-based cache. LOP-Cache keeps LOP data in the SSD cache for a long time period to decrease the number of cache replacements. Furthermore, it prevents unpopular or unnecessary data in deduplication containers from being written into the SSD cache. We implemented LOP-Cache in a prototype deduplication system to evaluate its pertbrmance. Our experimental results indicate that LOP-Cache shortens the latency of selective restore by an average of 37.3% at the cost of a small SSD-based cache with only 5.56% capacity of the deduplicated data. Importantly, LOP-Cache improves SSDs' lifetime by a factor of 9.77. The evidence shows that LOP-Cache offers a cost-efficient SSD-based read cache solution to boost performance of selective restore for deduplication systems. 展开更多
关键词 data deduplication solid state drive (SSD) flash CACHE ENDURANCE
原文传递
Hybrid cloud approach for block-level deduplication and searchable encryption in large universe
20
作者 Liu Zhenhua Kang Yaqian +1 位作者 Li Chen Fan Yaqing 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2017年第5期23-34,共12页
Ciphertext-policy attribute-based searchable encryption (CP-ABSE) can achieve fine-grained access control for data sharing and retrieval, and secure deduplication can save storage space by eliminating duplicate copi... Ciphertext-policy attribute-based searchable encryption (CP-ABSE) can achieve fine-grained access control for data sharing and retrieval, and secure deduplication can save storage space by eliminating duplicate copies. However, there are seldom schemes supporting both searchable encryption and secure deduplication. In this paper, a large universe CP-ABSE scheme supporting secure block-level deduplication are proposed under a hybrid cloud mechanism. In the proposed scheme, after the ciphertext is inserted into bloom filter tree (BFT), private cloud can perform fine-grained deduplication efficiently by matching tags, and public cloud can search efficiently using homomorphic searchable method and keywords matching. Finally, the proposed scheme can achieve privacy under chosen distribution attacks block-level (PRV-CDA-B) secure deduplication and match-concealing (MC) searchable security. Compared with existing schemes, the proposed scheme has the advantage in supporting fine-grained access control, block-level deduplication and efficient search, simultaneously. 展开更多
关键词 block-level deduplication searchable encryption large tmiverse BFT
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部