A method has been proposed to classify handwritten Arabic numerals in its compressed form using partitioning approach, Leader algorithm and Neural network. Handwritten numerals are represented in a matrix form. Compre...A method has been proposed to classify handwritten Arabic numerals in its compressed form using partitioning approach, Leader algorithm and Neural network. Handwritten numerals are represented in a matrix form. Compressing the matrix representation by merging adjacent pair of rows using logical OR operation reduces its size in half. Considering each row as a partitioned portion, clusters are formed for same partition of same digit separately. Leaders of clusters of partitions are used to recognize the patterns by Divide and Conquer approach using proposed ensemble neural network. Experimental results show that the proposed method recognize the patterns accurately.展开更多
We decompose the problem of the optimal multi-degree reduction of Bézier curves with corners constraint into two simpler subproblems, namely making high order interpolations at the two endpoints without degree re...We decompose the problem of the optimal multi-degree reduction of Bézier curves with corners constraint into two simpler subproblems, namely making high order interpolations at the two endpoints without degree reduction, and doing optimal degree reduction without making high order interpolations at the two endpoints. Further, we convert the second subproblem into multi-degree reduction of Jacobi polynomials. Then, we can easily derive the optimal solution using orthonormality of Jacobi polynomials and the least square method of unequally accurate measurement. This method of 'divide and conquer' has several advantages including maintaining high continuity at the two endpoints of the curve, doing multi-degree reduction only once, using explicit approximation expressions, estimating error in advance, low time cost, and high precision. More importantly, it is not only deduced simply and directly, but also can be easily extended to the degree reduction of surfaces. Finally, we present two examples to demonstrate the effectiveness of our algorithm.展开更多
Tools for pair-wise bio-sequence alignment have for long played a central role in computation biology. Several algorithms for bio-sequence alignment have been developed. The Smith-Waterman algorithm, based on dynamic ...Tools for pair-wise bio-sequence alignment have for long played a central role in computation biology. Several algorithms for bio-sequence alignment have been developed. The Smith-Waterman algorithm, based on dynamic programming, is considered the most fundamental alignment algorithm in bioinformatics. However the existing parallel Smith-Waterman algorithm needs large memory space, and this disadvantage limits the size of a sequence to be handled. As the data of biological sequences expand rapidly, the memory requirement of the existing parallel Smith- Waterman algorithm has become a critical problem. For solving this problem, we develop a new parallel bio-sequence alignment algorithm, using the strategy of divide and conquer, named PSW-DC algorithm. In our algorithm, first, we partition the query sequence into several subsequences and distribute them to every processor respectively, then compare each subsequence with the whole subject sequence in parallel, using the Smith-Waterman algorithm, and get an interim result, finally obtain the optimal alignment between the query sequence and subject sequence, through the special combination and extension method. Memory space required in our algorithm is reduced significantly in comparison with existing ones. We also develop a key technique of combination and extension, named the C&E method, to manipulate the interim results and obtain the final sequences alignment. We implement the new parallel bio-sequences alignment algorithm, the PSW-DC, in a cluster parallel system.展开更多
In order to effectively program Parallel Computing on NOW (Network of workstation),users must be able to evaluate how well the system performs for a given application.In this paper,we present an framework that can be...In order to effectively program Parallel Computing on NOW (Network of workstation),users must be able to evaluate how well the system performs for a given application.In this paper,we present an framework that can be used to evaluate tree structured computing on NOW.Based on this framework,we derive a model for the famous parallel programming paradigm divide and conquer.We discuss how this model can be used to evaluate performance and how it can be used to restructure the application to improve performance.展开更多
Service composition is a hot and active research area in service-oriented computing which has gained great momentum. An quality of service (QoS) oriented and tree-based approach was proposed to implement service compo...Service composition is a hot and active research area in service-oriented computing which has gained great momentum. An quality of service (QoS) oriented and tree-based approach was proposed to implement service composition efficiently. Firstly, service descriptions were transformed to mapping relations which denote the association between input and output concepts. Then, the service composition problems were resolved by building mapping relation tree dynamically based on the divide and conquer method, and all mapping relation trees were combined without redundant branch to obtain the composition scheme. Finally, the optimal composition scheme was chosen based on quality of service attributes including the preference of service request. Experiment results illustrate that this method can improve the composition efficiency and reduce the searching time by increasing the number of services in repository.展开更多
Average (mean) voter is one of the commonest voting methods suitable for decision making in highly-available and long-missions applications where the availability and the speed of the system are critical.In this pap...Average (mean) voter is one of the commonest voting methods suitable for decision making in highly-available and long-missions applications where the availability and the speed of the system are critical.In this paper,a new generation of average voter based on parallel algorithms and parallel random access machine(PRAM) structure are proposed.The analysis shows that this algorithm is optimal due to its improved time complexity,speed-up,and efficiency and is especially appropriate for applications where the size of input space is large.展开更多
A new parallel algorithm is proposed for the knapsack problem where the method of divide and conquer is adopted. Based on an EREW-SIMD machine with shared memory, the proposed algorithm utilizes O(2 n/4 ) 1-ε ...A new parallel algorithm is proposed for the knapsack problem where the method of divide and conquer is adopted. Based on an EREW-SIMD machine with shared memory, the proposed algorithm utilizes O(2 n/4 ) 1-ε processors, 0≤ ε ≤1, and O(2 n/2 ) memory to find a solution for the n -element knapsack problem in time O(2 n/4 (2 n/4 ) ε) . The cost of the proposed parallel algorithm is O(2 n/2 ) , which is an optimal method for solving the knapsack problem without memory conflicts and an improved result over the past researches.展开更多
Nowadays,researchers are frequently confronted with challenges from massive data computing by a number of limitations of computer primary memory.Modal regression(MR)is a good alternative of the mean regression and lik...Nowadays,researchers are frequently confronted with challenges from massive data computing by a number of limitations of computer primary memory.Modal regression(MR)is a good alternative of the mean regression and likelihood based methods,because of its robustness and high efficiency.To this end,the authors extend MR to massive data analysis and propose a computationally and statistically efficient divide and conquer MR method(DC-MR).The major novelty of this method consists of splitting one entire dataset into several blocks,implementing the MR method on data in each block,and deriving final results through combining these regression results via a weighted average,which provides approximate estimates of regression results on the entire dataset.The proposed method significantly reduces the required amount of primary memory,and the resulting estimator is theoretically as efficient as the traditional MR on the entire data set.The authors also investigate a multiple hypothesis testing variable selection approach to select significant parametric components and prove the approach possessing the oracle property.In addition,the authors propose a practical modified modal expectation-maximization(MEM)algorithm for the proposed procedures.Numerical studies on simulated and real datasets are conducted to assess and showcase the practical and effective performance of our proposed methods.展开更多
As early as in 1975, Shamos and Hoey first gave an O(n lg n)-time divide-and-conquer algorithm (Stt algorithm in short) for the problem of finding the closest pair of points. In one process of combination, the Euc...As early as in 1975, Shamos and Hoey first gave an O(n lg n)-time divide-and-conquer algorithm (Stt algorithm in short) for the problem of finding the closest pair of points. In one process of combination, the Euclidean distances between 3n pairs of points need to be computed, so the overall complexity of computing distance is then 3n lgn. Since the computation of distance is more costly compared with other basic operation, how to improve SH algorithm from the aspect of complexity of computing distance is considered. In 1998, Zhou, Xiong and Zhu improved SH algorithm by reducing this complexity to 2n lg n. In this paper, we make further improvement. The overall complexity of computing distances is reduced to (3n lg n)/2, which is only half that of SH algorithm.展开更多
Performance and scalability are two issues that are becoming increasingly pressing as the resource descrip- tion framework (RDF) data model is applied to real-world ap- plications. Because neither vertical nor flat ...Performance and scalability are two issues that are becoming increasingly pressing as the resource descrip- tion framework (RDF) data model is applied to real-world ap- plications. Because neither vertical nor flat structures of RDF storage can handle frequent schema updates and meanwhile avoid possible long-chain joins, there is no clear winner be- tween the two typical structures. In this paper, we propose an alternative open user schema. The open user schema con- sists of flat tables automatically extracted from RDF query streams. A query is divided into two parts and conquered on the fiat tables in the open user schema and on the vertical ta- ble stored in a backend storage. At the core of this divide and conquer architecture with open user schema, an efficient iso- morphic decision algorithm is introduced to guide a query to related flat tables in the open user schema. Our proposal in essence departs from existing methods in that it can accom- modate schema updates without possible long-chain joins. We implement our approach and provide empirical evalua- tions to demonstrate both the efficiency and effectiveness of our approach in evaluating complex RDF queries.展开更多
Distributed statistical inferences have attracted more and more attention in recent years with the emergence of massive data.We are grateful to the authors for the excellent review of the litera-ture in this active ar...Distributed statistical inferences have attracted more and more attention in recent years with the emergence of massive data.We are grateful to the authors for the excellent review of the litera-ture in this active area.Besides the progress mentioned by the authors,we would like to discuss some additional development in this interesting area.Specifically,we focus on the balance of communication cost and the statistical efficiency of divide-and-conquer(DC)type estimators in linear discriminant analysis and hypothesis testing.It is seen that the DC approach has different behaviours in these problems,which is different from that in estimation problems.Furthermore,we discuss some issues on the statistical inferences under restricted communication budgets.展开更多
In this paper,we study the large-scale inference for a linear expectile regression model.To mitigate the computational challenges in the classical asymmetric least squares(ALS)estimation under massive data,we propose ...In this paper,we study the large-scale inference for a linear expectile regression model.To mitigate the computational challenges in the classical asymmetric least squares(ALS)estimation under massive data,we propose a communication-efficient divide and conquer algorithm to combine the information from sub-machines through confidence distributions.The resulting pooled estimator has a closed-form expression,and its consistency and asymptotic normality are established under mild conditions.Moreover,we derive the Bahadur representation of the ALS estimator,which serves as an important tool to study the relationship between the number of submachines K and the sample size.Numerical studies including both synthetic and real data examples are presented to illustrate the finite-sample performance of our method and support the theoretical results.展开更多
文摘A method has been proposed to classify handwritten Arabic numerals in its compressed form using partitioning approach, Leader algorithm and Neural network. Handwritten numerals are represented in a matrix form. Compressing the matrix representation by merging adjacent pair of rows using logical OR operation reduces its size in half. Considering each row as a partitioned portion, clusters are formed for same partition of same digit separately. Leaders of clusters of partitions are used to recognize the patterns by Divide and Conquer approach using proposed ensemble neural network. Experimental results show that the proposed method recognize the patterns accurately.
基金supported by the National Natural Science Foundation of China (No. 60873111)the National Basic Research Program (973) of China (No. 2004CB719400)
文摘We decompose the problem of the optimal multi-degree reduction of Bézier curves with corners constraint into two simpler subproblems, namely making high order interpolations at the two endpoints without degree reduction, and doing optimal degree reduction without making high order interpolations at the two endpoints. Further, we convert the second subproblem into multi-degree reduction of Jacobi polynomials. Then, we can easily derive the optimal solution using orthonormality of Jacobi polynomials and the least square method of unequally accurate measurement. This method of 'divide and conquer' has several advantages including maintaining high continuity at the two endpoints of the curve, doing multi-degree reduction only once, using explicit approximation expressions, estimating error in advance, low time cost, and high precision. More importantly, it is not only deduced simply and directly, but also can be easily extended to the degree reduction of surfaces. Finally, we present two examples to demonstrate the effectiveness of our algorithm.
文摘Tools for pair-wise bio-sequence alignment have for long played a central role in computation biology. Several algorithms for bio-sequence alignment have been developed. The Smith-Waterman algorithm, based on dynamic programming, is considered the most fundamental alignment algorithm in bioinformatics. However the existing parallel Smith-Waterman algorithm needs large memory space, and this disadvantage limits the size of a sequence to be handled. As the data of biological sequences expand rapidly, the memory requirement of the existing parallel Smith- Waterman algorithm has become a critical problem. For solving this problem, we develop a new parallel bio-sequence alignment algorithm, using the strategy of divide and conquer, named PSW-DC algorithm. In our algorithm, first, we partition the query sequence into several subsequences and distribute them to every processor respectively, then compare each subsequence with the whole subject sequence in parallel, using the Smith-Waterman algorithm, and get an interim result, finally obtain the optimal alignment between the query sequence and subject sequence, through the special combination and extension method. Memory space required in our algorithm is reduced significantly in comparison with existing ones. We also develop a key technique of combination and extension, named the C&E method, to manipulate the interim results and obtain the final sequences alignment. We implement the new parallel bio-sequences alignment algorithm, the PSW-DC, in a cluster parallel system.
文摘In order to effectively program Parallel Computing on NOW (Network of workstation),users must be able to evaluate how well the system performs for a given application.In this paper,we present an framework that can be used to evaluate tree structured computing on NOW.Based on this framework,we derive a model for the famous parallel programming paradigm divide and conquer.We discuss how this model can be used to evaluate performance and how it can be used to restructure the application to improve performance.
基金Project(2007AA01Z126) supported by the National High Technology Research and Development Program of China
文摘Service composition is a hot and active research area in service-oriented computing which has gained great momentum. An quality of service (QoS) oriented and tree-based approach was proposed to implement service composition efficiently. Firstly, service descriptions were transformed to mapping relations which denote the association between input and output concepts. Then, the service composition problems were resolved by building mapping relation tree dynamically based on the divide and conquer method, and all mapping relation trees were combined without redundant branch to obtain the composition scheme. Finally, the optimal composition scheme was chosen based on quality of service attributes including the preference of service request. Experiment results illustrate that this method can improve the composition efficiency and reduce the searching time by increasing the number of services in repository.
文摘Average (mean) voter is one of the commonest voting methods suitable for decision making in highly-available and long-missions applications where the availability and the speed of the system are critical.In this paper,a new generation of average voter based on parallel algorithms and parallel random access machine(PRAM) structure are proposed.The analysis shows that this algorithm is optimal due to its improved time complexity,speed-up,and efficiency and is especially appropriate for applications where the size of input space is large.
文摘A new parallel algorithm is proposed for the knapsack problem where the method of divide and conquer is adopted. Based on an EREW-SIMD machine with shared memory, the proposed algorithm utilizes O(2 n/4 ) 1-ε processors, 0≤ ε ≤1, and O(2 n/2 ) memory to find a solution for the n -element knapsack problem in time O(2 n/4 (2 n/4 ) ε) . The cost of the proposed parallel algorithm is O(2 n/2 ) , which is an optimal method for solving the knapsack problem without memory conflicts and an improved result over the past researches.
基金supported by the Fundamental Research Funds for the Central Universities under Grant No.JBK1806002the National Natural Science Foundation of China under Grant No.11471264。
文摘Nowadays,researchers are frequently confronted with challenges from massive data computing by a number of limitations of computer primary memory.Modal regression(MR)is a good alternative of the mean regression and likelihood based methods,because of its robustness and high efficiency.To this end,the authors extend MR to massive data analysis and propose a computationally and statistically efficient divide and conquer MR method(DC-MR).The major novelty of this method consists of splitting one entire dataset into several blocks,implementing the MR method on data in each block,and deriving final results through combining these regression results via a weighted average,which provides approximate estimates of regression results on the entire dataset.The proposed method significantly reduces the required amount of primary memory,and the resulting estimator is theoretically as efficient as the traditional MR on the entire data set.The authors also investigate a multiple hypothesis testing variable selection approach to select significant parametric components and prove the approach possessing the oracle property.In addition,the authors propose a practical modified modal expectation-maximization(MEM)algorithm for the proposed procedures.Numerical studies on simulated and real datasets are conducted to assess and showcase the practical and effective performance of our proposed methods.
基金This work is supported by the National Natural Science Foundation of China (Grant No. 60496321) and Shanghai Science and Technology Development Fund (Grant No. 025115032).
文摘As early as in 1975, Shamos and Hoey first gave an O(n lg n)-time divide-and-conquer algorithm (Stt algorithm in short) for the problem of finding the closest pair of points. In one process of combination, the Euclidean distances between 3n pairs of points need to be computed, so the overall complexity of computing distance is then 3n lgn. Since the computation of distance is more costly compared with other basic operation, how to improve SH algorithm from the aspect of complexity of computing distance is considered. In 1998, Zhou, Xiong and Zhu improved SH algorithm by reducing this complexity to 2n lg n. In this paper, we make further improvement. The overall complexity of computing distances is reduced to (3n lg n)/2, which is only half that of SH algorithm.
文摘Performance and scalability are two issues that are becoming increasingly pressing as the resource descrip- tion framework (RDF) data model is applied to real-world ap- plications. Because neither vertical nor flat structures of RDF storage can handle frequent schema updates and meanwhile avoid possible long-chain joins, there is no clear winner be- tween the two typical structures. In this paper, we propose an alternative open user schema. The open user schema con- sists of flat tables automatically extracted from RDF query streams. A query is divided into two parts and conquered on the fiat tables in the open user schema and on the vertical ta- ble stored in a backend storage. At the core of this divide and conquer architecture with open user schema, an efficient iso- morphic decision algorithm is introduced to guide a query to related flat tables in the open user schema. Our proposal in essence departs from existing methods in that it can accom- modate schema updates without possible long-chain joins. We implement our approach and provide empirical evalua- tions to demonstrate both the efficiency and effectiveness of our approach in evaluating complex RDF queries.
文摘Distributed statistical inferences have attracted more and more attention in recent years with the emergence of massive data.We are grateful to the authors for the excellent review of the litera-ture in this active area.Besides the progress mentioned by the authors,we would like to discuss some additional development in this interesting area.Specifically,we focus on the balance of communication cost and the statistical efficiency of divide-and-conquer(DC)type estimators in linear discriminant analysis and hypothesis testing.It is seen that the DC approach has different behaviours in these problems,which is different from that in estimation problems.Furthermore,we discuss some issues on the statistical inferences under restricted communication budgets.
文摘In this paper,we study the large-scale inference for a linear expectile regression model.To mitigate the computational challenges in the classical asymmetric least squares(ALS)estimation under massive data,we propose a communication-efficient divide and conquer algorithm to combine the information from sub-machines through confidence distributions.The resulting pooled estimator has a closed-form expression,and its consistency and asymptotic normality are established under mild conditions.Moreover,we derive the Bahadur representation of the ALS estimator,which serves as an important tool to study the relationship between the number of submachines K and the sample size.Numerical studies including both synthetic and real data examples are presented to illustrate the finite-sample performance of our method and support the theoretical results.