Symmetry is a common feature in the real world.It may be used to improve a classification by using the point symmetry-based distance as a measure of clustering.However,it is time consuming to calculate the point symme...Symmetry is a common feature in the real world.It may be used to improve a classification by using the point symmetry-based distance as a measure of clustering.However,it is time consuming to calculate the point symmetry-based distance.Although an efficient parallel point symmetry-based K-means algorithm(ParSym)has been propsed to overcome this limitation,ParSym may get stuck in sub-optimal solutions due to the K-means technique it used.In this study,we proposed a novel parallel point symmetry-based genetic clustering(ParSymG)algorithm for unsupervised classification.The genetic algorithm was introduced to overcome the sub-optimization problem caused by inappropriate selection of initial centroids in ParSym.A message passing interface(MPI)was used to implement the distributed master–slave paradigm.To make the algorithm more time-efficient,a three-phase speedup strategy was adopted for population initialization,image partition,and kd-tree structure-based nearest neighbor searching.The advantages of ParSymG over existing ParSym and parallel K-means(PKM)alogithms were demonstrated through case studies using three different types of remotely sensed images.Results in speedup and time gain proved the excellent scalability of the ParSymG algorithm.展开更多
The similarity search is one of the fundamental components in time series data mining,e.g.clustering,classification,association rules mining.Many methods have been proposed to measure the similarity between time serie...The similarity search is one of the fundamental components in time series data mining,e.g.clustering,classification,association rules mining.Many methods have been proposed to measure the similarity between time series,including Euclidean distance,Manhattan distance,and dynamic time warping(DTW).In contrast,DTW has been suggested to allow more robust similarity measure and be able to find the optimal alignment in time series.However,due to its quadratic time and space complexity,DTW is not suitable for large time series datasets.Many improving algorithms have been proposed for DTW search in large databases,such as approximate search or exact indexed search.Unlike the previous modified algorithm,this paper presents a novel parallel scheme for fast similarity search based on DTW,which is called MRDTW(MapRedcuebased DTW).The experimental results show that our approach not only retained the original accuracy as DTW,but also greatly improved the efficiency of similarity measure in large time series.展开更多
The combined finiteediscrete element method (FDEM) belongs to a family of methods of computationalmechanics of discontinua. The method is suitable for problems of discontinua, where particles aredeformable and can f...The combined finiteediscrete element method (FDEM) belongs to a family of methods of computationalmechanics of discontinua. The method is suitable for problems of discontinua, where particles aredeformable and can fracture or fragment. The applications of FDEM have spread over a number of disciplinesincluding rock mechanics, where problems like mining, mineral processing or rock blasting canbe solved by employing FDEM. In this work, a novel approach for the parallelization of two-dimensional(2D) FDEM aiming at clusters and desktop computers is developed. Dynamic domain decompositionbased parallelization solvers covering all aspects of FDEM have been developed. These have beenimplemented into the open source Y2D software package and have been tested on a PC cluster. Theoverall performance and scalability of the parallel code have been studied using numerical examples. Theresults obtained confirm the suitability of the parallel implementation for solving large scale problems. 2014 Institute of Rock and Soil Mechanics, Chinese Academy of Sciences. Production and hosting byElsevier B.V. All rights reserved.展开更多
The advent of Big Data has led to the rapid growth in the usage of parallel clustering algorithms that work over distributed computing frameworks such as MPI,MapReduce,and Spark.An important step for any parallel clus...The advent of Big Data has led to the rapid growth in the usage of parallel clustering algorithms that work over distributed computing frameworks such as MPI,MapReduce,and Spark.An important step for any parallel clustering algorithm is the distribution of data amongst the cluster nodes.This step governs the methodology and performance of the entire algorithm.Researchers typically use random,or a spatial/geometric distribution strategy like kd-tree based partitioning and grid-based partitioning,as per the requirements of the algorithm.However,these strategies are generic and are not tailor-made for any specific parallel clustering algorithm.In this paper,we give a very comprehensive literature survey of MPI-based parallel clustering algorithms with special reference to the specific data distribution strategies they employ.We also propose three new data distribution strategies namely Parameterized Dimensional Split for parallel density-based clustering algorithms like DBSCAN and OPTICS,Cell-Based Dimensional Split for dGridSLINK,which is a grid-based hierarchical clustering algorithm that exhibits efficiency for disjoint spatial distribution,and Projection-Based Split,which is a generic distribution strategy.All of these preserve spatial locality,achieve disjoint partitioning,and ensure good data load balancing.The experimental analysis shows the benefits of using the proposed data distribution strategies for algorithms they are designed for,based on which we give appropriate recommendations for their usage.展开更多
基金Thiswork was supported by the National Natural Science Foundation of China[grant number 41471313],[grant num-ber 41101356],[grant number 41671391]the Fundamental Research Funds for the Central Universities[grant num-ber 2016XZZX004-02]+1 种基金the Science and Technology Project of Zhejiang Province[grant number 2015C33021],[grant number 2013C33051]Major Program of China High Resolution Earth Observation System[grant number 07-Y30B10-9001].
文摘Symmetry is a common feature in the real world.It may be used to improve a classification by using the point symmetry-based distance as a measure of clustering.However,it is time consuming to calculate the point symmetry-based distance.Although an efficient parallel point symmetry-based K-means algorithm(ParSym)has been propsed to overcome this limitation,ParSym may get stuck in sub-optimal solutions due to the K-means technique it used.In this study,we proposed a novel parallel point symmetry-based genetic clustering(ParSymG)algorithm for unsupervised classification.The genetic algorithm was introduced to overcome the sub-optimization problem caused by inappropriate selection of initial centroids in ParSym.A message passing interface(MPI)was used to implement the distributed master–slave paradigm.To make the algorithm more time-efficient,a three-phase speedup strategy was adopted for population initialization,image partition,and kd-tree structure-based nearest neighbor searching.The advantages of ParSymG over existing ParSym and parallel K-means(PKM)alogithms were demonstrated through case studies using three different types of remotely sensed images.Results in speedup and time gain proved the excellent scalability of the ParSymG algorithm.
基金supported in part by National High-tech R&D Program of China under Grants No.2012AA012600,2011AA010702,2012AA01A401,2012AA01A402National Natural Science Foundation of China under Grant No.60933005+1 种基金National Science and Technology Ministry of China under Grant No.2012BAH38B04National 242 Information Security of China under Grant No.2011A010
文摘The similarity search is one of the fundamental components in time series data mining,e.g.clustering,classification,association rules mining.Many methods have been proposed to measure the similarity between time series,including Euclidean distance,Manhattan distance,and dynamic time warping(DTW).In contrast,DTW has been suggested to allow more robust similarity measure and be able to find the optimal alignment in time series.However,due to its quadratic time and space complexity,DTW is not suitable for large time series datasets.Many improving algorithms have been proposed for DTW search in large databases,such as approximate search or exact indexed search.Unlike the previous modified algorithm,this paper presents a novel parallel scheme for fast similarity search based on DTW,which is called MRDTW(MapRedcuebased DTW).The experimental results show that our approach not only retained the original accuracy as DTW,but also greatly improved the efficiency of similarity measure in large time series.
文摘The combined finiteediscrete element method (FDEM) belongs to a family of methods of computationalmechanics of discontinua. The method is suitable for problems of discontinua, where particles aredeformable and can fracture or fragment. The applications of FDEM have spread over a number of disciplinesincluding rock mechanics, where problems like mining, mineral processing or rock blasting canbe solved by employing FDEM. In this work, a novel approach for the parallelization of two-dimensional(2D) FDEM aiming at clusters and desktop computers is developed. Dynamic domain decompositionbased parallelization solvers covering all aspects of FDEM have been developed. These have beenimplemented into the open source Y2D software package and have been tested on a PC cluster. Theoverall performance and scalability of the parallel code have been studied using numerical examples. Theresults obtained confirm the suitability of the parallel implementation for solving large scale problems. 2014 Institute of Rock and Soil Mechanics, Chinese Academy of Sciences. Production and hosting byElsevier B.V. All rights reserved.
文摘The advent of Big Data has led to the rapid growth in the usage of parallel clustering algorithms that work over distributed computing frameworks such as MPI,MapReduce,and Spark.An important step for any parallel clustering algorithm is the distribution of data amongst the cluster nodes.This step governs the methodology and performance of the entire algorithm.Researchers typically use random,or a spatial/geometric distribution strategy like kd-tree based partitioning and grid-based partitioning,as per the requirements of the algorithm.However,these strategies are generic and are not tailor-made for any specific parallel clustering algorithm.In this paper,we give a very comprehensive literature survey of MPI-based parallel clustering algorithms with special reference to the specific data distribution strategies they employ.We also propose three new data distribution strategies namely Parameterized Dimensional Split for parallel density-based clustering algorithms like DBSCAN and OPTICS,Cell-Based Dimensional Split for dGridSLINK,which is a grid-based hierarchical clustering algorithm that exhibits efficiency for disjoint spatial distribution,and Projection-Based Split,which is a generic distribution strategy.All of these preserve spatial locality,achieve disjoint partitioning,and ensure good data load balancing.The experimental analysis shows the benefits of using the proposed data distribution strategies for algorithms they are designed for,based on which we give appropriate recommendations for their usage.