期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Hash Table Assisted Efficient File Level De-Duplication Scheme in SD-IoV Assisted Sensing Devices
1
作者 Ghawar Said Ata Ullah +4 位作者 Anwar Ghani Muhammad Azeem Khalid Yahya Muhammad Bilal Sayed Chhattan Shah 《Intelligent Automation & Soft Computing》 2023年第10期83-99,共17页
The Internet of Things(IoT)and cloud technologies have encouraged massive data storage at central repositories.Software-defined networks(SDN)support the processing of data and restrict the transmission of duplicate va... The Internet of Things(IoT)and cloud technologies have encouraged massive data storage at central repositories.Software-defined networks(SDN)support the processing of data and restrict the transmission of duplicate values.It is necessary to use a data de-duplication mechanism to reduce communication costs and storage overhead.Existing State of the art schemes suffer from computational overhead due to deterministic or random tree-based tags generation which further increases as the file size grows.This paper presents an efficient file-level de-duplication scheme(EFDS)where the cost of creating tags is reduced by employing a hash table with key-value pair for each block of the file.Further,an algorithm for hash table-based duplicate block identification and storage(HDBIS)is presented based on fingerprints that maintain a linked list of similar duplicate blocks on the same index.Hash tables normally have a consistent time complexity for lookup,generating,and deleting stored data regardless of the input size.The experiential results show that the proposed EFDS scheme performs better compared to its counterparts. 展开更多
关键词 Hash table de-duplication linked list IoT sensing devices
下载PDF
Evidence-based literature review:De-duplication a cornerstone for quality
2
作者 Barbara Hammer Elettra Virgili Federico Bilotta 《World Journal of Methodology》 2023年第5期390-398,共9页
Evidence-based literature reviews play a vital role in contemporary research,facilitating the synthesis of knowledge from multiple sources to inform decisionmaking and scientific advancements.Within this framework,de-... Evidence-based literature reviews play a vital role in contemporary research,facilitating the synthesis of knowledge from multiple sources to inform decisionmaking and scientific advancements.Within this framework,de-duplication emerges as a part of the process for ensuring the integrity and reliability of evidence extraction.This opinion review delves into the evolution of de-duplication,highlights its importance in evidence synthesis,explores various de-duplication methods,discusses evolving technologies,and proposes best practices.By addressing ethical considerations this paper emphasizes the significance of deduplication as a cornerstone for quality in evidence-based literature reviews. 展开更多
关键词 Duplicate publications as topic Databases BIBLIOGRAPHIC Artificial intelligence Systematic reviews as topic Review literature as topic de-duplication Duplicate references Reference management software
下载PDF
A content aware chunking scheme for data de-duplication in archival storage systems
3
作者 Nie Xuejun Qin Leihua Zhou Jingli 《High Technology Letters》 EI CAS 2012年第1期45-50,共6页
Based on variable sized chunking, this paper proposes a content aware chunking scheme, called CAC, that does not assume fully random file contents, but tonsiders the characteristics of the file types. CAC uses a candi... Based on variable sized chunking, this paper proposes a content aware chunking scheme, called CAC, that does not assume fully random file contents, but tonsiders the characteristics of the file types. CAC uses a candidate anchor histogram and the file-type specific knowledge to refine how anchors are determined when performing de- duplication of file data and enforces the selected average chunk size. CAC yields more chunks being found which in turn produces smaller average chtmks and a better reduction in data. We present a detailed evaluation of CAC and the experimental results show that this scheme can improve the compression ratio chunking for file types whose bytes are not randomly distributed (from 11.3% to 16.7% according to different datasets), and improve the write throughput on average by 9.7%. 展开更多
关键词 data de-duplicate content aware chunking (CAC) candidate anchor histogram (CAH)
下载PDF
Scalable high performance de-duplication backup via hash join
4
作者 Tian-ming YANG Dan FENG Zhong-ying NIU Ya-ping WAN 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2010年第5期315-327,共13页
Apart from high space efficiency,other demanding requirements for enterprise de-duplication backup are high performance,high scalability,and availability for large-scale distributed environments.The main challenge is ... Apart from high space efficiency,other demanding requirements for enterprise de-duplication backup are high performance,high scalability,and availability for large-scale distributed environments.The main challenge is reducing the significant disk input/output(I/O) overhead as a result of constantly accessing the disk to identify duplicate chunks.Existing inline de-duplication approaches mainly rely on duplicate locality to avoid disk bottleneck,thus suffering from degradation under poor duplicate locality workload.This paper presents Chunkfarm,a post-processing de-duplication backup system designed to improve capacity,throughput,and scalability for de-duplication.Chunkfarm performs de-duplication backup using the hash join algorithm,which turns the notoriously random and small disk I/Os of fingerprint lookups and updates into large sequential disk I/Os,hence achieving high write throughput not influenced by workload locality.More importantly,by decentralizing fingerprint lookup and update,Chunkfarm supports a cluster of servers to perform de-duplication backup in parallel;it hence is conducive to distributed implementation and thus applicable to large-scale and distributed storage systems. 展开更多
关键词 Backup system de-duplication POST-PROCESSING Fingerprint lookup Scalability
原文传递
A Scalable Double-Chain Storage Module for Blockchain 被引量:1
5
作者 Hui Han Wunan Wan +4 位作者 Jinquan Zhang Zhi Qin Xiaofang Qiu Shibin Zhang Jinyue Xia 《Computers, Materials & Continua》 SCIE EI 2022年第11期2651-2662,共12页
With the growing maturity of blockchain technology,its peer-topeer model and fully duplicated data storage pattern enable blockchain to act as a distributed ledger in untrustworthy environments.Blockchain storage has ... With the growing maturity of blockchain technology,its peer-topeer model and fully duplicated data storage pattern enable blockchain to act as a distributed ledger in untrustworthy environments.Blockchain storage has also become a research hotspot in industry,finance,and academia due to its security,and its unique data storage management model is gradually becoming a key technology to play its value in various fields’applications.However,with the increasing amount of data written into the blockchain,the blockchain system faces many problems in its actual implementation of the application,such as high storage space occupation,low data flexibility and availability,low retrieval efficiency,poor scalability,etc.To improve the above problems,this paper combines off-chain storage technology and deduplication technology to optimize the blockchain storage model.Firstly,this paper adopts the double-chain model to reduce the data storage of the major chain system,which stores a small amount of primary data and supervises the vice chain through an Application Programming Interface(API).The vice chain stores a large number of copies of data as well as non-transactional data.Our model divides the vice chain storage system into two layers,including a storage layer and a processing layer.In the processing layer,deduplication technology is applied to reduce the redundancy of vice chain data.Our doublechain storage model with high scalability enhances data flexibility,is more suitable as a distributed storage system,and performs well in data retrieval. 展开更多
关键词 Blockchain storage model off-chain storage de-duplication
下载PDF
Frequency and Similarity-Aware Partitioning for Cloud Storage Based on Space-Time Utility Maximization Model 被引量:4
6
作者 Jianjiang Li Jie Wu Zhanning Ma 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2015年第3期233-245,共13页
With the rise of various cloud services, the problem of redundant data is more prominent in the cloud storage systems. How to assign a set of documents to a distributed file system, which can not only reduce storage s... With the rise of various cloud services, the problem of redundant data is more prominent in the cloud storage systems. How to assign a set of documents to a distributed file system, which can not only reduce storage space, but also ensure the access efficiency as much as possible, is an urgent problem which needs to be solved. Space-efficiency mainly uses data de-duplication technologies, while access-efficiency requires gathering the files with high similarity on a server. Based on the study of other data de-duplication technologies, especially the Similarity-Aware Partitioning (SAP) algorithm, this paper proposes the Frequency and Similarity-Aware Partitioning (FSAP) algorithm for cloud storage. The FSAP algorithm is a more reasonable data partitioning algorithm than the SAP algorithm. Meanwhile, this paper proposes the Space-Time Utility Maximization Model (STUMM), which is useful in balancing the relationship between space-efficiency and access-efficiency. Finally, this paper uses 100 web files downloaded from CNN for testing, and the results show that, relative to using the algorithms associated with the SAP algorithm (including the SAP-Space-Delta algorithm and the SAP-Space-Dedup algorithm), the FSAP algorithm based on STUMM reaches higher compression ratio and a more balanced distribution of data blocks. 展开更多
关键词 de-duplication cloud storage REDUNDANCY FREQUENCY
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部