In this paper, we analyze the complexity and entropy of different methods of data compression algorithms: LZW, Huffman, Fixed-length code (FLC), and Huffman after using Fixed-length code (HFLC). We test those algorith...In this paper, we analyze the complexity and entropy of different methods of data compression algorithms: LZW, Huffman, Fixed-length code (FLC), and Huffman after using Fixed-length code (HFLC). We test those algorithms on different files of different sizes and then conclude that: LZW is the best one in all compression scales that we tested especially on the large files, then Huffman, HFLC, and FLC, respectively. Data compression still is an important topic for research these days, and has many applications and uses needed. Therefore, we suggest continuing searching in this field and trying to combine two techniques in order to reach a best one, or use another source mapping (Hamming) like embedding a linear array into a Hypercube with other good techniques like Huffman and trying to reach good results.展开更多
The fast growing market of mobile device adoption and cloud computing has led to exploitation of mobile devices utilizing cloud services. One major chal-lenge facing the usage of mobile devices in the cloud environmen...The fast growing market of mobile device adoption and cloud computing has led to exploitation of mobile devices utilizing cloud services. One major chal-lenge facing the usage of mobile devices in the cloud environment is mobile synchronization to the cloud, e.g., synchronizing contacts, text messages, imag-es, and videos. Owing to the expected high volume of traffic and high time complexity required for synchronization, an appropriate synchronization algo-rithm needs to be developed. Delta synchronization is one method of synchro-nizing compressed files that requires uploading the whole file, even when no changes were made or if it was only partially changed. In the present study, we proposed an algorithm, based on Delta synchronization, to solve the problem of synchronizing compressed files under various forms of modification (e.g., not modified, partially modified, or completely modified). To measure the effi-ciency of our proposed algorithm, we compared it to the Dropbox application algorithm. The results demonstrated that our algorithm outperformed the regular Dropbox synchronization mechanism by reducing the synchronization time, cost, and traffic load between clients and the cloud service provider.展开更多
In this paper, we deal with the problem of improving backup and recovery performance by compressing redundancies in large disk-based backup system. We analyze some general compression algorithms; evaluate their scalab...In this paper, we deal with the problem of improving backup and recovery performance by compressing redundancies in large disk-based backup system. We analyze some general compression algorithms; evaluate their scalability and applicability. We investigate the distribution features of the redundant data in whole system range, and propose a multi-resolution distributed compression algorithm which can discern duplicated data at granularity of file level, block level or byte level to reduce the redundancy in backup environment. In order to accelerate recovery, we propose a synthetic backup solution which stores data in a recovery-oriented way and can compose the final data in back-end backup server. Experiments show that this algorithm can greatly reduce bandwidth consumption, save storage cost, and shorten the backup and recovery time. We implement these technologies in our product, called H-info backup system, which is capable of achieving over 10x compression ratio in both network utilization and data storage during backup.展开更多
文摘In this paper, we analyze the complexity and entropy of different methods of data compression algorithms: LZW, Huffman, Fixed-length code (FLC), and Huffman after using Fixed-length code (HFLC). We test those algorithms on different files of different sizes and then conclude that: LZW is the best one in all compression scales that we tested especially on the large files, then Huffman, HFLC, and FLC, respectively. Data compression still is an important topic for research these days, and has many applications and uses needed. Therefore, we suggest continuing searching in this field and trying to combine two techniques in order to reach a best one, or use another source mapping (Hamming) like embedding a linear array into a Hypercube with other good techniques like Huffman and trying to reach good results.
文摘The fast growing market of mobile device adoption and cloud computing has led to exploitation of mobile devices utilizing cloud services. One major chal-lenge facing the usage of mobile devices in the cloud environment is mobile synchronization to the cloud, e.g., synchronizing contacts, text messages, imag-es, and videos. Owing to the expected high volume of traffic and high time complexity required for synchronization, an appropriate synchronization algo-rithm needs to be developed. Delta synchronization is one method of synchro-nizing compressed files that requires uploading the whole file, even when no changes were made or if it was only partially changed. In the present study, we proposed an algorithm, based on Delta synchronization, to solve the problem of synchronizing compressed files under various forms of modification (e.g., not modified, partially modified, or completely modified). To measure the effi-ciency of our proposed algorithm, we compared it to the Dropbox application algorithm. The results demonstrated that our algorithm outperformed the regular Dropbox synchronization mechanism by reducing the synchronization time, cost, and traffic load between clients and the cloud service provider.
基金Supported by the National Natural Science Foun-dation of China (60473023) the National Innovation Foundationfor Small Technology-Based Firms (04C26214201280)
文摘In this paper, we deal with the problem of improving backup and recovery performance by compressing redundancies in large disk-based backup system. We analyze some general compression algorithms; evaluate their scalability and applicability. We investigate the distribution features of the redundant data in whole system range, and propose a multi-resolution distributed compression algorithm which can discern duplicated data at granularity of file level, block level or byte level to reduce the redundancy in backup environment. In order to accelerate recovery, we propose a synthetic backup solution which stores data in a recovery-oriented way and can compose the final data in back-end backup server. Experiments show that this algorithm can greatly reduce bandwidth consumption, save storage cost, and shorten the backup and recovery time. We implement these technologies in our product, called H-info backup system, which is capable of achieving over 10x compression ratio in both network utilization and data storage during backup.