By using wireless communication technology, when wired network of grassroots station is failure, wireless backup system of meteorological data transmission can automatically switch, and transmit the data at the statio...By using wireless communication technology, when wired network of grassroots station is failure, wireless backup system of meteorological data transmission can automatically switch, and transmit the data at the station with line fault to the destination by short message way. The system has simple operation and low operation cost, and transmission fault of real-time meteorological data due to line problem at the station can be effectively decreased. At present, the system has been tested successfully and put into business operation.展开更多
Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input stream...Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input streams are restricted to finite memory due to sliding window constraints. So far, non-indexed and indexed stream equijoin algorithms based on sliding windows have been proposed in many literatures. However, none of them takes non-equijoin into consideration. In many eases, non-equijoin queries occur frequently. Hence, it is worth to discuss how to process non-equijoin queries effectively and efficiently. In this paper, we propose an indexed join algorithm for supporting non-equijoin queries. The experimental results show that our indexed non-equijoin techniques are more efficient than those without index.展开更多
VERITAS Backup Exec9.1 for Windows Servers是先进的Windows数据保护解决方案,能为Microsoft Windows服务器环境提供通过Microsoft认证的,全面的、经济的、高效的保护。它提供基于Web的管理控制台和直观的类似浏览器的图形用户界面,...VERITAS Backup Exec9.1 for Windows Servers是先进的Windows数据保护解决方案,能为Microsoft Windows服务器环境提供通过Microsoft认证的,全面的、经济的、高效的保护。它提供基于Web的管理控制台和直观的类似浏览器的图形用户界面,加上使用方便的向导功能,适用于任何水平的用户和任何规模网络的数据保护和恢复。另外。展开更多
Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, ...Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from 0(2IR(~'d)l) to O(Ik.R(e, d)l), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.展开更多
为了解决数据丢失或损毁,经过对容灾技术的研究,提出基于Oracle Data Guard容灾技术的解决方案,介绍Data Guard容灾技术的工作原理和相关特性,并实现物理备份数据库的构建和角色切换。结果表明,该方案能有效地解决Oracle数据库的数据丢...为了解决数据丢失或损毁,经过对容灾技术的研究,提出基于Oracle Data Guard容灾技术的解决方案,介绍Data Guard容灾技术的工作原理和相关特性,并实现物理备份数据库的构建和角色切换。结果表明,该方案能有效地解决Oracle数据库的数据丢失或损毁问题,为数据的保护提供更有效的保障。展开更多
This paper describes a method for building hot snapshot copy based on windows-file system (HSCF). The architecture and running mechanism of HSCF are discussed after giving a comparison with other on-line backup tecb...This paper describes a method for building hot snapshot copy based on windows-file system (HSCF). The architecture and running mechanism of HSCF are discussed after giving a comparison with other on-line backup tecbnology. HSCF, based on a file system filter driver, protects computer data and ensures their integrity and consistency with following three steps: access to open files, synchronization and copy on-write. Its strategies for improving system performance are analyzed including priority setting, incremental snapshot and load balance. HSCF is a new kind of snapshot technology to solve the data integrity and consistency problem in online backup, which is different from other storage-level snapshot and Open File Solution.展开更多
Cloud backup has been an important issue ever since large quantities of valuable data have been stored on the personal computing devices. Data reduction techniques, such as deduplication, delta encoding, and Lempel-Z...Cloud backup has been an important issue ever since large quantities of valuable data have been stored on the personal computing devices. Data reduction techniques, such as deduplication, delta encoding, and Lempel-Ziv (LZ) compression, performed at the client side before data transfer can help ease cloud backup by saving network bandwidth and reducing cloud storage space. However, client-side data reduction in cloud backup services faces efficiency and privacy challenges. In this paper, we present Pangolin, a secure and efficient cloud backup service for personal data storage by exploiting application awareness. It can speedup backup operations by application-aware client-side data reduction technique, and mitigate data security risks by integrating selective encryption into data reduction for sensitive applications. Our experimental evaluation, based on a prototype implementation, shows that our scheme can improve data reduction efficiency over the state-of-the-art methods by shortening the backup window size to 33%-75%, and its security mechanism for' sensitive applications has negligible impact on backup window size.展开更多
Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, a...Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, and, on the other hand, the elements in data streams are always time sensitive. These make it particular significant approximately detecting duplicates among newly arrived elements of a data stream within a fixed time frame. In this paper, we present a novel data structure, Decaying Bloom Filter (DBF), as an extension of the Counting Bloom Filter, that effectively removes stale elements as new elements continuously arrive over sliding windows. On the DBF basis we present an efficient algorithm to approximately detect duplicates over sliding windows. Our algorithm may produce false positive errors, but not false negative errors as in many previous results. We analyze the time complexity and detection accuracy, and give a tight upper bound of false positive rate. For a given space G bits and sliding window size W, our algorithm has an amortized time complexity of O(√G/W). Both analytical and experimental results on synthetic data demonstrate that our algorithm is superior in both execution time and detection accuracy to the previous results.展开更多
In this paper, we deal with the problem of improving backup and recovery performance by compressing redundancies in large disk-based backup system. We analyze some general compression algorithms; evaluate their scalab...In this paper, we deal with the problem of improving backup and recovery performance by compressing redundancies in large disk-based backup system. We analyze some general compression algorithms; evaluate their scalability and applicability. We investigate the distribution features of the redundant data in whole system range, and propose a multi-resolution distributed compression algorithm which can discern duplicated data at granularity of file level, block level or byte level to reduce the redundancy in backup environment. In order to accelerate recovery, we propose a synthetic backup solution which stores data in a recovery-oriented way and can compose the final data in back-end backup server. Experiments show that this algorithm can greatly reduce bandwidth consumption, save storage cost, and shorten the backup and recovery time. We implement these technologies in our product, called H-info backup system, which is capable of achieving over 10x compression ratio in both network utilization and data storage during backup.展开更多
This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constru...This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constructs subwindows and deletes expired sub-windows periodically in sliding window, and each sub-window maintains a summary data structure. The first algorithm outputs at most 1/ε + 1 elements for frequency queries over the most recent N elements. The second algorithm adapts multiple levels method to deal with data stream. Once the sketch of the most recent N elements has been constructed, the second algorithm can provides the answers to the frequency queries over the most recent n ( n≤N) elements. The second algorithm outputs at most 1/ε + 2 elements. The analytical and experimental results show that our algorithms are accurate and effective.展开更多
Continuous response of range query on steaming data provides useful information for many practical applications as well as the risk of privacy disclosure.The existing research on differential privacy streaming data pu...Continuous response of range query on steaming data provides useful information for many practical applications as well as the risk of privacy disclosure.The existing research on differential privacy streaming data publication mostly pay close attention to boosting query accuracy,but pay less attention to query efficiency,and ignore the effect of timeliness on data weight.In this paper,we propose an effective algorithm of differential privacy streaming data publication under exponential decay mode.Firstly,by introducing the Fenwick tree to divide and reorganize data items in the stream,we achieve a constant time complexity for inserting a new item and getting the prefix sum.Meanwhile,we achieve time complicity linear to the number of data item for building a tree.After that,we use the advantage of matrix mechanism to deal with relevant queries and reduce the global sensitivity.In addition,we choose proper diagonal matrix further improve the range query accuracy.Finally,considering about exponential decay,every data item is weighted by the decay factor.By putting the Fenwick tree and matrix optimization together,we present complete algorithm for differentiate private real-time streaming data publication.The experiment is designed to compare the algorithm in this paper with similar algorithms for streaming data release in exponential decay.Experimental results show that the algorithm in this paper effectively improve the query efficiency while ensuring the quality of the query.展开更多
文摘By using wireless communication technology, when wired network of grassroots station is failure, wireless backup system of meteorological data transmission can automatically switch, and transmit the data at the station with line fault to the destination by short message way. The system has simple operation and low operation cost, and transmission fault of real-time meteorological data due to line problem at the station can be effectively decreased. At present, the system has been tested successfully and put into business operation.
基金Supported by the National Natural Science Foun-dation of China (60473073)
文摘Processing a join over unbounded input streams requires unbounded memory, since every tuple in one infinite stream must be compared with every tuple in the other. In fact, most join queries over unbounded input streams are restricted to finite memory due to sliding window constraints. So far, non-indexed and indexed stream equijoin algorithms based on sliding windows have been proposed in many literatures. However, none of them takes non-equijoin into consideration. In many eases, non-equijoin queries occur frequently. Hence, it is worth to discuss how to process non-equijoin queries effectively and efficiently. In this paper, we propose an indexed join algorithm for supporting non-equijoin queries. The experimental results show that our indexed non-equijoin techniques are more efficient than those without index.
文摘VERITAS Backup Exec9.1 for Windows Servers是先进的Windows数据保护解决方案,能为Microsoft Windows服务器环境提供通过Microsoft认证的,全面的、经济的、高效的保护。它提供基于Web的管理控制台和直观的类似浏览器的图形用户界面,加上使用方便的向导功能,适用于任何水平的用户和任何规模网络的数据保护和恢复。另外。
基金supported by the National Natural Science Foundation of China under Grant Nos. 60973020, 60828004,and 60933001the Program for New Century Excellent Talents in University of China under Grant No. NCET-06-0290the Fundamental Research Funds for the Central Universities under Grant No. N090504004
文摘Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from 0(2IR(~'d)l) to O(Ik.R(e, d)l), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.
文摘为了解决数据丢失或损毁,经过对容灾技术的研究,提出基于Oracle Data Guard容灾技术的解决方案,介绍Data Guard容灾技术的工作原理和相关特性,并实现物理备份数据库的构建和角色切换。结果表明,该方案能有效地解决Oracle数据库的数据丢失或损毁问题,为数据的保护提供更有效的保障。
基金Supported by the National Natural Science Foun-dation of China (60473023) National Innovation Foundation forSmall Technology Based Firms(04C26214201280)
文摘This paper describes a method for building hot snapshot copy based on windows-file system (HSCF). The architecture and running mechanism of HSCF are discussed after giving a comparison with other on-line backup tecbnology. HSCF, based on a file system filter driver, protects computer data and ensures their integrity and consistency with following three steps: access to open files, synchronization and copy on-write. Its strategies for improving system performance are analyzed including priority setting, incremental snapshot and load balance. HSCF is a new kind of snapshot technology to solve the data integrity and consistency problem in online backup, which is different from other storage-level snapshot and Open File Solution.
基金supported in part by the National High Technology Research and Development 863 Program of China under Grant No.2013AA013201the National Natural Science Foundation of China under Grant Nos.61025009,61232003,61120106005,61170288,and 61379146
文摘Cloud backup has been an important issue ever since large quantities of valuable data have been stored on the personal computing devices. Data reduction techniques, such as deduplication, delta encoding, and Lempel-Ziv (LZ) compression, performed at the client side before data transfer can help ease cloud backup by saving network bandwidth and reducing cloud storage space. However, client-side data reduction in cloud backup services faces efficiency and privacy challenges. In this paper, we present Pangolin, a secure and efficient cloud backup service for personal data storage by exploiting application awareness. It can speedup backup operations by application-aware client-side data reduction technique, and mitigate data security risks by integrating selective encryption into data reduction for sensitive applications. Our experimental evaluation, based on a prototype implementation, shows that our scheme can improve data reduction efficiency over the state-of-the-art methods by shortening the backup window size to 33%-75%, and its security mechanism for' sensitive applications has negligible impact on backup window size.
基金supported by the "Hundred Talents Program" of CAS and the National Natural Science Foundation of China under Grant No. 60772034.
文摘Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, and, on the other hand, the elements in data streams are always time sensitive. These make it particular significant approximately detecting duplicates among newly arrived elements of a data stream within a fixed time frame. In this paper, we present a novel data structure, Decaying Bloom Filter (DBF), as an extension of the Counting Bloom Filter, that effectively removes stale elements as new elements continuously arrive over sliding windows. On the DBF basis we present an efficient algorithm to approximately detect duplicates over sliding windows. Our algorithm may produce false positive errors, but not false negative errors as in many previous results. We analyze the time complexity and detection accuracy, and give a tight upper bound of false positive rate. For a given space G bits and sliding window size W, our algorithm has an amortized time complexity of O(√G/W). Both analytical and experimental results on synthetic data demonstrate that our algorithm is superior in both execution time and detection accuracy to the previous results.
基金Supported by the National Natural Science Foun-dation of China (60473023) the National Innovation Foundationfor Small Technology-Based Firms (04C26214201280)
文摘In this paper, we deal with the problem of improving backup and recovery performance by compressing redundancies in large disk-based backup system. We analyze some general compression algorithms; evaluate their scalability and applicability. We investigate the distribution features of the redundant data in whole system range, and propose a multi-resolution distributed compression algorithm which can discern duplicated data at granularity of file level, block level or byte level to reduce the redundancy in backup environment. In order to accelerate recovery, we propose a synthetic backup solution which stores data in a recovery-oriented way and can compose the final data in back-end backup server. Experiments show that this algorithm can greatly reduce bandwidth consumption, save storage cost, and shorten the backup and recovery time. We implement these technologies in our product, called H-info backup system, which is capable of achieving over 10x compression ratio in both network utilization and data storage during backup.
基金Supported by the National Natural Science Foun-dation of China (60403027)
文摘This paper presents two one-pass algorithms for dynamically computing frequency counts in sliding window over a data stream-computing frequency counts exceeding user-specified threshold ε. The first algorithm constructs subwindows and deletes expired sub-windows periodically in sliding window, and each sub-window maintains a summary data structure. The first algorithm outputs at most 1/ε + 1 elements for frequency queries over the most recent N elements. The second algorithm adapts multiple levels method to deal with data stream. Once the sketch of the most recent N elements has been constructed, the second algorithm can provides the answers to the frequency queries over the most recent n ( n≤N) elements. The second algorithm outputs at most 1/ε + 2 elements. The analytical and experimental results show that our algorithms are accurate and effective.
基金This work is supported,in part,by the National Natural Science Foundation of China under grant numbers 61300026in part,by the Natural Science Foundation of Fujian Province under grant numbers 2017J01754, 2018J01797.
文摘Continuous response of range query on steaming data provides useful information for many practical applications as well as the risk of privacy disclosure.The existing research on differential privacy streaming data publication mostly pay close attention to boosting query accuracy,but pay less attention to query efficiency,and ignore the effect of timeliness on data weight.In this paper,we propose an effective algorithm of differential privacy streaming data publication under exponential decay mode.Firstly,by introducing the Fenwick tree to divide and reorganize data items in the stream,we achieve a constant time complexity for inserting a new item and getting the prefix sum.Meanwhile,we achieve time complicity linear to the number of data item for building a tree.After that,we use the advantage of matrix mechanism to deal with relevant queries and reduce the global sensitivity.In addition,we choose proper diagonal matrix further improve the range query accuracy.Finally,considering about exponential decay,every data item is weighted by the decay factor.By putting the Fenwick tree and matrix optimization together,we present complete algorithm for differentiate private real-time streaming data publication.The experiment is designed to compare the algorithm in this paper with similar algorithms for streaming data release in exponential decay.Experimental results show that the algorithm in this paper effectively improve the query efficiency while ensuring the quality of the query.