A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,wh...A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing.展开更多
A novel Hilbert-curve is introduced for parallel spatial data partitioning, with consideration of the huge-amount property of spatial information and the variable-length characteristic of vector data items. Based on t...A novel Hilbert-curve is introduced for parallel spatial data partitioning, with consideration of the huge-amount property of spatial information and the variable-length characteristic of vector data items. Based on the improved Hilbert curve, the algorithm can be designed to achieve almost-uniform spatial data partitioning among multiple disks in parallel spatial databases. Thus, the phenomenon of data imbalance can be significantly avoided and search and query efficiency can be enhanced.展开更多
To enable quality sealability and further improve the reconstructed video quallty m rate shaping, a rate-distortion optimized packet dropping scheme for H. 264 data partitioned video bitstream is proposed in this pape...To enable quality sealability and further improve the reconstructed video quallty m rate shaping, a rate-distortion optimized packet dropping scheme for H. 264 data partitioned video bitstream is proposed in this paper. Some side information is generated for each video bitstream in advance, while streaming such side information is exploited by a greedy algorithm to optimally drop partitions in a rate-distortion optimized way. Quality sealability is supported by adopting data partition instead of whole frame as the dropping unit. Simulation resuhs show that the proposed scheme achieves a great gain in the reconstructed video quality over two typical frame dropping schemes, with the help of the fine granularity in dropping unit as well as rate-distortion optimization.展开更多
Data partitioning techniques are pivotal for optimal data placement across storage devices,thereby enhancing resource utilization and overall system throughput.However,the design of effective partition schemes faces m...Data partitioning techniques are pivotal for optimal data placement across storage devices,thereby enhancing resource utilization and overall system throughput.However,the design of effective partition schemes faces multiple challenges,including considerations of the cluster environment,storage device characteristics,optimization objectives,and the balance between partition quality and computational efficiency.Furthermore,dynamic environments necessitate robust partition detection mechanisms.This paper presents a comprehensive survey structured around partition deployment environments,outlining the distinguishing features and applicability of various partitioning strategies while delving into how these challenges are addressed.We discuss partitioning features pertaining to database schema,table data,workload,and runtime metrics.We then delve into the partition generation process,segmenting it into initialization and optimization stages.A comparative analysis of partition generation and update algorithms is provided,emphasizing their suitability for different scenarios and optimization objectives.Additionally,we illustrate the applications of partitioning in prevalent database products and suggest potential future research directions and solutions.This survey aims to foster the implementation,deployment,and updating of high-quality partitions for specific system scenarios.展开更多
Computer clusters with the shared-nothing architecture are the major computing platforms for big data processing and analysis.In cluster computing,data partitioning and sampling are two fundamental strategies to speed...Computer clusters with the shared-nothing architecture are the major computing platforms for big data processing and analysis.In cluster computing,data partitioning and sampling are two fundamental strategies to speed up the computation of big data and increase scalability.In this paper,we present a comprehensive survey of the methods and techniques of data partitioning and sampling with respect to big data processing and analysis.We start with an overview of the mainstream big data frameworks on Hadoop clusters.The basic methods of data partitioning are then discussed including three classical horizontal partitioning schemes:range,hash,and random partitioning.Data partitioning on Hadoop clusters is also discussed with a summary of new strategies for big data partitioning,including the new Random Sample Partition(RSP)distributed model.The classical methods of data sampling are then investigated,including simple random sampling,stratified sampling,and reservoir sampling.Two common methods of big data sampling on computing clusters are also discussed:record-level sampling and blocklevel sampling.Record-level sampling is not as efficient as block-level sampling on big distributed data.On the other hand,block-level sampling on data blocks generated with the classical data partitioning methods does not necessarily produce good representative samples for approximate computing of big data.In this survey,we also summarize the prevailing strategies and related work on sampling-based approximation on Hadoop clusters.We believe that data partitioning and sampling should be considered together to build approximate cluster computing frameworks that are reliable in both the computational and statistical respects.展开更多
Clustering is a crucial method for deciphering data structure and producing new information.Due to its significance in revealing fundamental connections between the human brain and events,it is essential to utilize cl...Clustering is a crucial method for deciphering data structure and producing new information.Due to its significance in revealing fundamental connections between the human brain and events,it is essential to utilize clustering for cognitive research.Dealing with noisy data caused by inaccurate synthesis from several sources or misleading data production processes is one of the most intriguing clustering difficulties.Noisy data can lead to incorrect object recognition and inference.This research aims to innovate a novel clustering approach,named Picture-Neutrosophic Trusted Safe Semi-Supervised Fuzzy Clustering(PNTS3FCM),to solve the clustering problem with noisy data using neutral and refusal degrees in the definition of Picture Fuzzy Set(PFS)and Neutrosophic Set(NS).Our contribution is to propose a new optimization model with four essential components:clustering,outlier removal,safe semi-supervised fuzzy clustering and partitioning with labeled and unlabeled data.The effectiveness and flexibility of the proposed technique are estimated and compared with the state-of-art methods,standard Picture fuzzy clustering(FC-PFS)and Confidence-weighted safe semi-supervised clustering(CS3FCM)on benchmark UCI datasets.The experimental results show that our method is better at least 10/15 datasets than the compared methods in terms of clustering quality and computational time.展开更多
Online Transaction Processing(OLTP)gets support from data partitioning to achieve better performance and scalability.The primary objective of database and application developers is to provide scalable and reliable dat...Online Transaction Processing(OLTP)gets support from data partitioning to achieve better performance and scalability.The primary objective of database and application developers is to provide scalable and reliable database systems.This research presents a novel method for data partitioning and load balancing for scalable transactions.Data is efficiently partitioned using the hybrid graph partitioning method.Optimized load balancing(OLB)approach is applied to calculate the weight factor,average workload,and partition efficiency.The presented approach is appropriate for various online data transaction applications.The quality of the proposed approach is examined using OLTP database benchmark.The performance of the proposed methodology significantly outperformed with respect to metrics like throughput,response time,and CPU utilization.展开更多
The rapid growth of modern mobile devices leads to a large number of distributed data,which is extremely valuable for learning models.Unfortunately,model training by collecting all these original data to a centralized...The rapid growth of modern mobile devices leads to a large number of distributed data,which is extremely valuable for learning models.Unfortunately,model training by collecting all these original data to a centralized cloud server is not applicable due to data privacy and communication costs concerns,hindering artificial intelligence from empowering mobile devices.Moreover,these data are not identically and independently distributed(Non-IID)caused by their different context,which will deteriorate the performance of the model.To address these issues,we propose a novel Distributed Learning algorithm based on hierarchical clustering and Adaptive Dataset Condensation,named ADC-DL,which learns a shared model by collecting the synthetic samples generated on each device.To tackle the heterogeneity of data distribution,we propose an entropy topsis comprehensive tiering model for hierarchical clustering,which distinguishes clients in terms of their data characteristics.Subsequently,synthetic dummy samples are generated based on the hierarchical structure utilizing adaptive dataset condensation.The procedure of dataset condensation can be adjusted adaptively according to the tier of the client.Extensive experiments demonstrate that the performance of our ADC-DL is more outstanding in prediction accuracy and communication costs compared with existing algorithms.展开更多
Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Pee...Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Peer(P2P) environment was proposed to achieve both low processing cost in terms of the number of peers accessed and search messages and balanced query loads among peers. The system is based on a balanced tree structured P2P network. By partitioning the query space intelligently, the amount of query forwarding is effectively controlled, and the number of peers involved and search messages are also limited. Dynamic load balancing can be achieved during space partitioning and query resolving. Extensive experiments confirm the effectiveness and scalability of our algorithms on P2P networks.展开更多
The dissociation between data management and data ownership makes it difficult to protect data security and privacy in cloud storage systems.Traditional encryption technologies are not suitable for data protection in ...The dissociation between data management and data ownership makes it difficult to protect data security and privacy in cloud storage systems.Traditional encryption technologies are not suitable for data protection in cloud storage systems.A novel multi-authority proxy re-encryption mechanism based on ciphertext-policy attribute-based encryption(MPRE-CPABE) is proposed for cloud storage systems.MPRE-CPABE requires data owner to split each file into two blocks,one big block and one small block.The small block is used to encrypt the big one as the private key,and then the encrypted big block will be uploaded to the cloud storage system.Even if the uploaded big block of file is stolen,illegal users cannot get the complete information of the file easily.Ciphertext-policy attribute-based encryption(CPABE)is always criticized for its heavy overload and insecure issues when distributing keys or revoking user's access right.MPRE-CPABE applies CPABE to the multi-authority cloud storage system,and solves the above issues.The weighted access structure(WAS) is proposed to support a variety of fine-grained threshold access control policy in multi-authority environments,and reduce the computational cost of key distribution.Meanwhile,MPRE-CPABE uses proxy re-encryption to reduce the computational cost of access revocation.Experiments are implemented on platforms of Ubuntu and CloudSim.Experimental results show that MPRE-CPABE can greatly reduce the computational cost of the generation of key components and the revocation of user's access right.MPRE-CPABE is also proved secure under the security model of decisional bilinear Diffie-Hellman(DBDH).展开更多
The Internet of Things (IoT) implies a worldwide network of interconnected objects uniquely addressable, via standard communication protocols. The prevalence of IoT is bound to generate large amounts of multisource,...The Internet of Things (IoT) implies a worldwide network of interconnected objects uniquely addressable, via standard communication protocols. The prevalence of IoT is bound to generate large amounts of multisource, heterogeneous, dynamic, and sparse data. However, IoT offers inconsequential practical benefits without the ability to integrate, fuse, and glean useful information from such massive amounts of data. Accordingly, preparing us for the imminent invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve process efficiency and provide advanced intelligence. In order to determine an acceptable quality of intelligence, diverse and voluminous data have to be combined and fused. Therefore, it is imperative to improve the computational efficiency for fusing and mining multidimensional data. In this paper, we propose an efficient multidimensional fusion algorithm for IoT data based on partitioning. The basic concept involves the partitioning of dimensions (attributes), i.e., a big data set with higher dimensions can be transformed into certain number of relatively smaller data subsets that can be easily processed. Then, based on the partitioning of dimensions, the discernible matrixes of all data subsets in rough set theory are computed to obtain their core attribute sets. Furthermore, a global core attribute set can be determined. Finally, the attribute reduction and rule extraction methods are used to obtain the fusion results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is illustrated.展开更多
We present an unequal decoding power allocation (UDPA) approach for minimization of the receiver power consumption subject to a given quality of service (QoS), by exploiting data partitioning and turbo decoding. W...We present an unequal decoding power allocation (UDPA) approach for minimization of the receiver power consumption subject to a given quality of service (QoS), by exploiting data partitioning and turbo decoding. We assign unequal decoding power of forward error correction (FEC) to data partitions with different priority by jointly considering the source coding, channel coding and receiver power consumption. The proposed scheme is applied to H.264 video over additive white Gaussion noise (AWGN) channel, and achieves excellent tradeoff between video delivery quality and power consumption, and yields significant power saving compared with the conventional equal decoding power allocation (EDPA) approach in wireless video transmission.展开更多
Robust video streaming through high error prone wireless channel has attracted much attention. In this paper the authors introduce an effective algorithm by joining the Unequal Error Protection ability of the channel ...Robust video streaming through high error prone wireless channel has attracted much attention. In this paper the authors introduce an effective algorithm by joining the Unequal Error Protection ability of the channel multiplexing protocol H.223 Annex D, and the new H.263++ Annex V Data Partition together. Based on the optimal trade off of these two technologies, the Joint Source and Channel Coding algorithm can result in stronger error resilience. The simulation results have shown its superiority against separate coding mode and some Unequal Error Protection mode under recommended wireless channel error patterns.展开更多
A partition checkpoint strategy based on data segment priority is presented to meet the timing constraints of the data and the transaction in embedded real-time main memory database systems(ERTMMDBS) as well as to r...A partition checkpoint strategy based on data segment priority is presented to meet the timing constraints of the data and the transaction in embedded real-time main memory database systems(ERTMMDBS) as well as to reduce the number of the transactions missing their deadlines and the recovery time.The partition checkpoint strategy takes into account the characteristics of the data and the transactions associated with it;moreover,it partitions the database according to the data segment priority and sets the corresponding checkpoint frequency to each partition for independent checkpoint operation.The simulation results show that the partition checkpoint strategy decreases the ratio of trans-actions missing their deadlines.展开更多
The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mi...The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper, several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.展开更多
String similarity join(SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-and...String similarity join(SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-andrefine framework. They cannot catch the dissimilarity between string subsets, and do not fully exploit the statistics such as the frequencies of characters. We investigate to develop a partition-based algorithm by using such statistics.The frequency vectors are used to partition datasets into data chunks with dissimilarity between them being caught easily. A novel algorithm is designed to accelerate SSJ via the partitioned data. A new filter is proposed to leverage the statistics to avoid computing edit distances for a noticeable proportion of candidate pairs which survive the existing filters. Our algorithm outperforms alternative methods notably on real datasets.展开更多
Abstract Big data has received great attention in research and application. However, most of the current efforts focus on system and application to handle the challenges of "volume" and "velocity", and not much ha...Abstract Big data has received great attention in research and application. However, most of the current efforts focus on system and application to handle the challenges of "volume" and "velocity", and not much has been done on the theoreti- cal foundation and to handle the challenge of "variety". Based on metric-space indexing and computationalcomplexity the- ory, we propose a parallel computing framework for big data. This framework consists of three components, i.e., universal representation of big data by abstracting various data types into metric space, partitioning of big data based on pair-wise distances in metric space, and parallel computing of big data with the NC-class computing theory.展开更多
The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalabilit...The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalability. Previous work on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect system performance. Specifically, a good partitioning solution would greatly reduce or even to- tally avoid cross-node joins, and significantly cut down the cost in query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture, where Map/Reduce takes charge of the computing tasks, and RDF query engines like RDF-3X store the data and execute join operations. According to the analysis of query workloads, this work proposes a novel algorithm for automatically parti- tioning RDF data and an approximate solution to physically place the partitions in order to reduce data redundancy. It also discusses how to make a good trade-off between query evaluation efficiency and data redundancy. All of these pro- posed approaches have been evaluated by extensive experiments over large RDF data sets.展开更多
基金The High Technology Research Plan of Jiangsu Prov-ince (No.BG2004034)the Foundation of Graduate Creative Program ofJiangsu Province (No.xm04-36).
文摘A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing.
基金Funded by the National 863 Program of China (No. 2005AA113150), and the National Natural Science Foundation of China (No.40701158).
文摘A novel Hilbert-curve is introduced for parallel spatial data partitioning, with consideration of the huge-amount property of spatial information and the variable-length characteristic of vector data items. Based on the improved Hilbert curve, the algorithm can be designed to achieve almost-uniform spatial data partitioning among multiple disks in parallel spatial databases. Thus, the phenomenon of data imbalance can be significantly avoided and search and query efficiency can be enhanced.
基金Supported by the National Natural Science Foundation of China ( No. 60702031 )the National High Technology Research and Development Programme of China (No. 2008AA01Z217A)
文摘To enable quality sealability and further improve the reconstructed video quallty m rate shaping, a rate-distortion optimized packet dropping scheme for H. 264 data partitioned video bitstream is proposed in this paper. Some side information is generated for each video bitstream in advance, while streaming such side information is exploited by a greedy algorithm to optimally drop partitions in a rate-distortion optimized way. Quality sealability is supported by adopting data partition instead of whole frame as the dropping unit. Simulation resuhs show that the proposed scheme achieves a great gain in the reconstructed video quality over two typical frame dropping schemes, with the help of the fine granularity in dropping unit as well as rate-distortion optimization.
基金supported by the National Key Research and Development Program of China under Grant No.2023YFB4503603the National Natural Science Foundation of China under Grant Nos.62072460,62076245,and 62172424the Beijing Natural Science Foundation under Grant No.4212022.
文摘Data partitioning techniques are pivotal for optimal data placement across storage devices,thereby enhancing resource utilization and overall system throughput.However,the design of effective partition schemes faces multiple challenges,including considerations of the cluster environment,storage device characteristics,optimization objectives,and the balance between partition quality and computational efficiency.Furthermore,dynamic environments necessitate robust partition detection mechanisms.This paper presents a comprehensive survey structured around partition deployment environments,outlining the distinguishing features and applicability of various partitioning strategies while delving into how these challenges are addressed.We discuss partitioning features pertaining to database schema,table data,workload,and runtime metrics.We then delve into the partition generation process,segmenting it into initialization and optimization stages.A comparative analysis of partition generation and update algorithms is provided,emphasizing their suitability for different scenarios and optimization objectives.Additionally,we illustrate the applications of partitioning in prevalent database products and suggest potential future research directions and solutions.This survey aims to foster the implementation,deployment,and updating of high-quality partitions for specific system scenarios.
基金Supported in part by the National Natural Science Foundation of China(No.61972261)the National Key R&D Program of China(No.2017YFC0822604-2)
文摘Computer clusters with the shared-nothing architecture are the major computing platforms for big data processing and analysis.In cluster computing,data partitioning and sampling are two fundamental strategies to speed up the computation of big data and increase scalability.In this paper,we present a comprehensive survey of the methods and techniques of data partitioning and sampling with respect to big data processing and analysis.We start with an overview of the mainstream big data frameworks on Hadoop clusters.The basic methods of data partitioning are then discussed including three classical horizontal partitioning schemes:range,hash,and random partitioning.Data partitioning on Hadoop clusters is also discussed with a summary of new strategies for big data partitioning,including the new Random Sample Partition(RSP)distributed model.The classical methods of data sampling are then investigated,including simple random sampling,stratified sampling,and reservoir sampling.Two common methods of big data sampling on computing clusters are also discussed:record-level sampling and blocklevel sampling.Record-level sampling is not as efficient as block-level sampling on big distributed data.On the other hand,block-level sampling on data blocks generated with the classical data partitioning methods does not necessarily produce good representative samples for approximate computing of big data.In this survey,we also summarize the prevailing strategies and related work on sampling-based approximation on Hadoop clusters.We believe that data partitioning and sampling should be considered together to build approximate cluster computing frameworks that are reliable in both the computational and statistical respects.
基金This research is funded by Graduate University of Science and Technology under grant number GUST.STS.DT2020-TT01。
文摘Clustering is a crucial method for deciphering data structure and producing new information.Due to its significance in revealing fundamental connections between the human brain and events,it is essential to utilize clustering for cognitive research.Dealing with noisy data caused by inaccurate synthesis from several sources or misleading data production processes is one of the most intriguing clustering difficulties.Noisy data can lead to incorrect object recognition and inference.This research aims to innovate a novel clustering approach,named Picture-Neutrosophic Trusted Safe Semi-Supervised Fuzzy Clustering(PNTS3FCM),to solve the clustering problem with noisy data using neutral and refusal degrees in the definition of Picture Fuzzy Set(PFS)and Neutrosophic Set(NS).Our contribution is to propose a new optimization model with four essential components:clustering,outlier removal,safe semi-supervised fuzzy clustering and partitioning with labeled and unlabeled data.The effectiveness and flexibility of the proposed technique are estimated and compared with the state-of-art methods,standard Picture fuzzy clustering(FC-PFS)and Confidence-weighted safe semi-supervised clustering(CS3FCM)on benchmark UCI datasets.The experimental results show that our method is better at least 10/15 datasets than the compared methods in terms of clustering quality and computational time.
文摘Online Transaction Processing(OLTP)gets support from data partitioning to achieve better performance and scalability.The primary objective of database and application developers is to provide scalable and reliable database systems.This research presents a novel method for data partitioning and load balancing for scalable transactions.Data is efficiently partitioned using the hybrid graph partitioning method.Optimized load balancing(OLB)approach is applied to calculate the weight factor,average workload,and partition efficiency.The presented approach is appropriate for various online data transaction applications.The quality of the proposed approach is examined using OLTP database benchmark.The performance of the proposed methodology significantly outperformed with respect to metrics like throughput,response time,and CPU utilization.
基金the General Program of National Natural Science Foundation of China(62072049).
文摘The rapid growth of modern mobile devices leads to a large number of distributed data,which is extremely valuable for learning models.Unfortunately,model training by collecting all these original data to a centralized cloud server is not applicable due to data privacy and communication costs concerns,hindering artificial intelligence from empowering mobile devices.Moreover,these data are not identically and independently distributed(Non-IID)caused by their different context,which will deteriorate the performance of the model.To address these issues,we propose a novel Distributed Learning algorithm based on hierarchical clustering and Adaptive Dataset Condensation,named ADC-DL,which learns a shared model by collecting the synthetic samples generated on each device.To tackle the heterogeneity of data distribution,we propose an entropy topsis comprehensive tiering model for hierarchical clustering,which distinguishes clients in terms of their data characteristics.Subsequently,synthetic dummy samples are generated based on the hierarchical structure utilizing adaptive dataset condensation.The procedure of dataset condensation can be adjusted adaptively according to the tier of the client.Extensive experiments demonstrate that the performance of our ADC-DL is more outstanding in prediction accuracy and communication costs compared with existing algorithms.
基金Supported by the Natural Science Foundation ofJiangsu Province(BG2004034)
文摘Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Peer(P2P) environment was proposed to achieve both low processing cost in terms of the number of peers accessed and search messages and balanced query loads among peers. The system is based on a balanced tree structured P2P network. By partitioning the query space intelligently, the amount of query forwarding is effectively controlled, and the number of peers involved and search messages are also limited. Dynamic load balancing can be achieved during space partitioning and query resolving. Extensive experiments confirm the effectiveness and scalability of our algorithms on P2P networks.
基金supported by the National Natural Science Foundation of China(6120200461472192)+1 种基金the Special Fund for Fast Sharing of Science Paper in Net Era by CSTD(2013116)the Natural Science Fund of Higher Education of Jiangsu Province(14KJB520014)
文摘The dissociation between data management and data ownership makes it difficult to protect data security and privacy in cloud storage systems.Traditional encryption technologies are not suitable for data protection in cloud storage systems.A novel multi-authority proxy re-encryption mechanism based on ciphertext-policy attribute-based encryption(MPRE-CPABE) is proposed for cloud storage systems.MPRE-CPABE requires data owner to split each file into two blocks,one big block and one small block.The small block is used to encrypt the big one as the private key,and then the encrypted big block will be uploaded to the cloud storage system.Even if the uploaded big block of file is stolen,illegal users cannot get the complete information of the file easily.Ciphertext-policy attribute-based encryption(CPABE)is always criticized for its heavy overload and insecure issues when distributing keys or revoking user's access right.MPRE-CPABE applies CPABE to the multi-authority cloud storage system,and solves the above issues.The weighted access structure(WAS) is proposed to support a variety of fine-grained threshold access control policy in multi-authority environments,and reduce the computational cost of key distribution.Meanwhile,MPRE-CPABE uses proxy re-encryption to reduce the computational cost of access revocation.Experiments are implemented on platforms of Ubuntu and CloudSim.Experimental results show that MPRE-CPABE can greatly reduce the computational cost of the generation of key components and the revocation of user's access right.MPRE-CPABE is also proved secure under the security model of decisional bilinear Diffie-Hellman(DBDH).
基金the National High-Tech Research and Development (863) Program of China (No. 2011AA010101)the National Natural Science Foundation of China (Nos. 61103197, 61073009, and 61240029)+5 种基金the Science and Technology Key Project of Jilin Province (No. 2011ZDGG007)the Youth Foundation of Jilin Province of China (No. 201101035)the Fundamental Research Funds for the Central Universities of China (No. 200903179)China Postdoctoral Science Foundation (No. 2011M500611)the 2011 Industrial Technology Research and Development Special Project of Jilin Province (No. 2011006-9)the 2012 National College Students' Innovative Training Program of China, and European Union Framework Program: MONICA Project under the Grant Agreement Number PIRSES-GA-2011-295222
文摘The Internet of Things (IoT) implies a worldwide network of interconnected objects uniquely addressable, via standard communication protocols. The prevalence of IoT is bound to generate large amounts of multisource, heterogeneous, dynamic, and sparse data. However, IoT offers inconsequential practical benefits without the ability to integrate, fuse, and glean useful information from such massive amounts of data. Accordingly, preparing us for the imminent invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve process efficiency and provide advanced intelligence. In order to determine an acceptable quality of intelligence, diverse and voluminous data have to be combined and fused. Therefore, it is imperative to improve the computational efficiency for fusing and mining multidimensional data. In this paper, we propose an efficient multidimensional fusion algorithm for IoT data based on partitioning. The basic concept involves the partitioning of dimensions (attributes), i.e., a big data set with higher dimensions can be transformed into certain number of relatively smaller data subsets that can be easily processed. Then, based on the partitioning of dimensions, the discernible matrixes of all data subsets in rough set theory are computed to obtain their core attribute sets. Furthermore, a global core attribute set can be determined. Finally, the attribute reduction and rule extraction methods are used to obtain the fusion results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is illustrated.
基金supported by the Scientific Research Innovation Project of the Shanghai Municipal Education Commission (Grant No.08YZ18)the Key Project of Natural Science Foundation of China (Grant No.60832003)+2 种基金the National Natural Science Foundation of China (Grant Nos.60972137,60672052)the Innovation Foundation Project of Shanghai Universitythe Special Research Foundation of Shanghai Excellent Youth University Teacher Training
文摘We present an unequal decoding power allocation (UDPA) approach for minimization of the receiver power consumption subject to a given quality of service (QoS), by exploiting data partitioning and turbo decoding. We assign unequal decoding power of forward error correction (FEC) to data partitions with different priority by jointly considering the source coding, channel coding and receiver power consumption. The proposed scheme is applied to H.264 video over additive white Gaussion noise (AWGN) channel, and achieves excellent tradeoff between video delivery quality and power consumption, and yields significant power saving compared with the conventional equal decoding power allocation (EDPA) approach in wireless video transmission.
文摘Robust video streaming through high error prone wireless channel has attracted much attention. In this paper the authors introduce an effective algorithm by joining the Unequal Error Protection ability of the channel multiplexing protocol H.223 Annex D, and the new H.263++ Annex V Data Partition together. Based on the optimal trade off of these two technologies, the Joint Source and Channel Coding algorithm can result in stronger error resilience. The simulation results have shown its superiority against separate coding mode and some Unequal Error Protection mode under recommended wireless channel error patterns.
基金Supported by the National Natural Science Foundation of China (60673128)
文摘A partition checkpoint strategy based on data segment priority is presented to meet the timing constraints of the data and the transaction in embedded real-time main memory database systems(ERTMMDBS) as well as to reduce the number of the transactions missing their deadlines and the recovery time.The partition checkpoint strategy takes into account the characteristics of the data and the transactions associated with it;moreover,it partitions the database according to the data segment priority and sets the corresponding checkpoint frequency to each partition for independent checkpoint operation.The simulation results show that the partition checkpoint strategy decreases the ratio of trans-actions missing their deadlines.
基金This work was supported by the National Natural Science Foundation of China! (No.69743001) the National Doctoral Subject Fou
文摘The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper, several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.
文摘String similarity join(SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-andrefine framework. They cannot catch the dissimilarity between string subsets, and do not fully exploit the statistics such as the frequencies of characters. We investigate to develop a partition-based algorithm by using such statistics.The frequency vectors are used to partition datasets into data chunks with dissimilarity between them being caught easily. A novel algorithm is designed to accelerate SSJ via the partitioned data. A new filter is proposed to leverage the statistics to avoid computing edit distances for a noticeable proportion of candidate pairs which survive the existing filters. Our algorithm outperforms alternative methods notably on real datasets.
文摘Abstract Big data has received great attention in research and application. However, most of the current efforts focus on system and application to handle the challenges of "volume" and "velocity", and not much has been done on the theoreti- cal foundation and to handle the challenge of "variety". Based on metric-space indexing and computationalcomplexity the- ory, we propose a parallel computing framework for big data. This framework consists of three components, i.e., universal representation of big data by abstracting various data types into metric space, partitioning of big data based on pair-wise distances in metric space, and parallel computing of big data with the NC-class computing theory.
文摘The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalability. Previous work on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect system performance. Specifically, a good partitioning solution would greatly reduce or even to- tally avoid cross-node joins, and significantly cut down the cost in query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture, where Map/Reduce takes charge of the computing tasks, and RDF query engines like RDF-3X store the data and execute join operations. According to the analysis of query workloads, this work proposes a novel algorithm for automatically parti- tioning RDF data and an approximate solution to physically place the partitions in order to reduce data redundancy. It also discusses how to make a good trade-off between query evaluation efficiency and data redundancy. All of these pro- posed approaches have been evaluated by extensive experiments over large RDF data sets.