Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have ...Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have been introduced to formknowledge-driven clustering algorithms,which reveal a data structure that considers not only the relationships between data but also the compatibility with knowledge hints.However,these algorithms cannot produce the optimal number of clusters by the clustering algorithm itself;they require the assistance of evaluation indices.Moreover,knowledge hints are usually used as part of the data structure(directly replacing some clustering centers),which severely limits the flexibility of the algorithm and can lead to knowledgemisguidance.To solve this problem,this study designs a newknowledge-driven clustering algorithmcalled the PCM clusteringwith High-density Points(HP-PCM),in which domain knowledge is represented in the form of so-called high-density points.First,a newdatadensitycalculation function is proposed.The Density Knowledge Points Extraction(DKPE)method is established to filter out high-density points from the dataset to form knowledge hints.Then,these hints are incorporated into the PCM objective function so that the clustering algorithm is guided by high-density points to discover the natural data structure.Finally,the initial number of clusters is set to be greater than the true one based on the number of knowledge hints.Then,the HP-PCM algorithm automatically determines the final number of clusters during the clustering process by considering the cluster elimination mechanism.Through experimental studies,including some comparative analyses,the results highlight the effectiveness of the proposed algorithm,such as the increased success rate in clustering,the ability to determine the optimal cluster number,and the faster convergence speed.展开更多
To the problem that it is hard to determine the clustering number and the abnormal points by using the clustering validity function, an effective clustering partition model based on the genetic algorithm is built in t...To the problem that it is hard to determine the clustering number and the abnormal points by using the clustering validity function, an effective clustering partition model based on the genetic algorithm is built in this paper. The solution to the problem is formed by the combination of the clustering partition and the encoding samples, and the fitness function is defined by the distances among and within clusters. The clustering number and the samples in each cluster are determined and the abnormal points are distinguished by implementing the triple random crossover operator and the mutation. Based on the known sample data, the results of the novel method and the clustering validity function are compared. Numerical experiments are given and the results show that the novel method is more effective.展开更多
Refined 3D modeling of mine slopes is pivotal for precise prediction of geological hazards.Aiming at the inadequacy of existing single modeling methods in comprehensively representing the overall and localized charact...Refined 3D modeling of mine slopes is pivotal for precise prediction of geological hazards.Aiming at the inadequacy of existing single modeling methods in comprehensively representing the overall and localized characteristics of mining slopes,this study introduces a new method that fuses model data from Unmanned aerial vehicles(UAV)tilt photogrammetry and 3D laser scanning through a data alignment algorithm based on control points.First,the mini batch K-Medoids algorithm is utilized to cluster the point cloud data from ground 3D laser scanning.Then,the elbow rule is applied to determine the optimal cluster number(K0),and the feature points are extracted.Next,the nearest neighbor point algorithm is employed to match the feature points obtained from UAV tilt photogrammetry,and the internal point coordinates are adjusted through the distanceweighted average to construct a 3D model.Finally,by integrating an engineering case study,the K0 value is determined to be 8,with a matching accuracy between the two model datasets ranging from 0.0669 to 1.0373 mm.Therefore,compared with the modeling method utilizing K-medoids clustering algorithm,the new modeling method significantly enhances the computational efficiency,the accuracy of selecting the optimal number of feature points in 3D laser scanning,and the precision of the 3D model derived from UAV tilt photogrammetry.This method provides a research foundation for constructing mine slope model.展开更多
Clustering data with varying densities and complicated structures is important,while many existing clustering algorithms face difficulties for this problem. The reason is that varying densities and complicated structu...Clustering data with varying densities and complicated structures is important,while many existing clustering algorithms face difficulties for this problem. The reason is that varying densities and complicated structure make single algorithms perform badly for different parts of data. More intensive parts are assumed to have more information probably,an algorithm clustering from high density part is proposed,which begins from a tiny distance to find the highest density-connected partition and form corresponding super cores,then distance is iteratively increased by a global heuristic method to cluster parts with different densities. Mean of silhouette coefficient indicates the cluster performance. Denoising function is implemented to eliminate influence of noise and outliers. Many challenging experiments indicate that the algorithm has good performance on data with widely varying densities and extremely complex structures. It decides the optimal number of clusters automatically.Background knowledge is not needed and parameters tuning is easy. It is robust against noise and outliers.展开更多
Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable ...Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.展开更多
Fuzzy C-means(FCM)is a clustering method that falls under unsupervised machine learning.The main issues plaguing this clustering algorithm are the number of the unknown clusters within a particular dataset and initial...Fuzzy C-means(FCM)is a clustering method that falls under unsupervised machine learning.The main issues plaguing this clustering algorithm are the number of the unknown clusters within a particular dataset and initialization sensitivity of cluster centres.Artificial Bee Colony(ABC)is a type of swarm algorithm that strives to improve the members’solution quality as an iterative process with the utilization of particular kinds of randomness.However,ABC has some weaknesses,such as balancing exploration and exploitation.To improve the exploration process within the ABC algorithm,the mean artificial bee colony(MeanABC)by its modified search equation that depends on solutions of mean previous and global best is used.Furthermore,to solve the main issues of FCM,Automatic clustering algorithm was proposed based on the mean artificial bee colony called(AC-MeanABC).It uses the MeanABC capability of balancing between exploration and exploitation and its capacity to explore the positive and negative directions in search space to find the best value of clusters number and centroids value.A few benchmark datasets and a set of natural images were used to evaluate the effectiveness of AC-MeanABC.The experimental findings are encouraging and indicate considerable improvements compared to other state-of-the-art approaches in the same domain.展开更多
We propose a novel scheme based on clustering analysis in color space to solve text segmentation in complex color images. Text segmentation includes automatic clustering of color space and foreground image generation....We propose a novel scheme based on clustering analysis in color space to solve text segmentation in complex color images. Text segmentation includes automatic clustering of color space and foreground image generation. Two methods are also proposed for automatic clustering: The first one is to determine the optimal number of clusters and the second one is the fuzzy competitively clustering method based on competitively learning techniques. Essential foreground images obtained from any of the color clusters are combined into foreground images. Further performance analysis reveals the advantages of the proposed methods.展开更多
One of the most important problems of clustering is to define the number of classes. In fact, it is not easy to find an appropriate method to measure whether the cluster configuration is acceptable or not. In this pap...One of the most important problems of clustering is to define the number of classes. In fact, it is not easy to find an appropriate method to measure whether the cluster configuration is acceptable or not. In this paper we propose a possible and non-automatic solution considering different criteria of clustering and comparing their results. In this way robust structures of an analyzed dataset can be often caught (or established) and an optimal cluster configuration, which presents a meaningful association, may be defined. In particular, we also focus on the variables which may be used in cluster analysis. In fact, variables which contain little clustering information can cause misleading and not-robustness results. Therefore, three algorithms are employed in this study: K-means partitioning methods, Partitioning Around Medoids (PAM) and the Heuristic Identification of Noisy Variables (HINoV). The results are compared with robust methods ones.展开更多
The upper bound of the optimal number of clusters in clustering algorithm is studied in this paper. A new method is proposed to solve this issue. This method shows that the rule cmax≤N^(1/N), which is popular in curr...The upper bound of the optimal number of clusters in clustering algorithm is studied in this paper. A new method is proposed to solve this issue. This method shows that the rule cmax≤N^(1/N), which is popular in current papers, is reasonable in some sense. The above conclusion is tested and analyzed by some typical examples in the literature, which demonstrates the validity of the new method.展开更多
The cavitation cloud of different internal structures results in different collapse pressures owing to the interaction among bubbles. The internal structure of cloud cavitation is required to accurately predict collap...The cavitation cloud of different internal structures results in different collapse pressures owing to the interaction among bubbles. The internal structure of cloud cavitation is required to accurately predict collapse pressure. A cavitation model was developed through dimensional analysis and direct numerical simulation of collapse of bubble cluster. Bubble number density was included in proposed model to characterize the internal structure of bubble cloud. Implemented on flows over a projectile, the proposed model predicts a higher collapse pressure compared with Singhal model. Results indicate that the collapse pressure of detached cavitation cloud is affected by bubble number density.展开更多
Percolation theory deals with the numbers and properties of the clusters formed in the different occupation probability. In this Paper, we study the calculation method of small clusters. We calcu-lated the small clust...Percolation theory deals with the numbers and properties of the clusters formed in the different occupation probability. In this Paper, we study the calculation method of small clusters. We calcu-lated the small cluster density of 1, 2 and 3 in the percolation model with the exact method and the numerical method. The results of the two methods are very close, which can be verified by each other. We find that the cluster density of all three kinds of small clusters reaches the highest value when the occupation probability is between 0.1 and 0.2. It is very difficult to get the analytical formula for the exact method when the cluster area is relatively large (such as the area is more than 50), so we can get the density value of the cluster by numerical method. We find that the time required calculating the cluster density is proportional to the percolation area, which is indepen-dent of the cluster size and the occupation probability.展开更多
Despite marked improvements in tropical cyclone(TC) track ensemble forecasting,forecasters still have difficulty in making quick decisions when facing multiple potential predictions,so it is demanding to develop post-...Despite marked improvements in tropical cyclone(TC) track ensemble forecasting,forecasters still have difficulty in making quick decisions when facing multiple potential predictions,so it is demanding to develop post-processing techniques reducing the uncertainty in TC track forecasts,and one of such techniques is the cluster-based methods.To improve the effect and efficiency of the previous cluster-based methods,this study adopts recombination clustering(RC) by optimizing the use of limited TC variables and constructing better features that can accurately capture the good TC track forecasts from the ensemble prediction system(EPS) of the China Meteorological Administration Tropical Regional Atmosphere Model for the South China Sea(CMA-TRAMS).The RC technique is further optimized by constraining the number of clusters using the absolute track bias between the ensemble mean(EM) and ensemble spread(ES).Finally,the RC-based deterministic and weighted probabilistic forecasts are compared with the TC track forecasts from traditional methods.It is found that(1) for deterministic TC track forecasts,the RC-based TC track forecasts outperform all other methods at 12–72-h lead times;compared with the skillful EM(118.6 km),the improvements introduced by the use of RC reach up to 10.8%(8.1 km),10.2%(13.7 km),and 8.7%(20.5 km) at forecast times of 24,48,and 72 h,respectively.(2) For probabilistic TC track forecasts,RC yields significantly more accurate and discriminative forecasts than traditional equal-weight track forecasts,by increasing the weight of the best cluster,with a decrease of 4.1% in brier score(BS) and an increase of 1.4% in area under the relative operating characteristic curve(AUC).(3) In particular,for cases with recurved tracks,such as typhoons Saudel(2017) and Bavi(2008),RC significantly reduces track errors relative to EM by 56.0%(125.5 km) and 77.7%(192.2 km),respectively.Our results demonstrate that the RC technique not only improves TC track forecasts but also helps to unravel skillful ensemble members,and is likely useful for feature construction in machine learning.展开更多
This paper exposes some intrlnsic chsracterlstlca of the spectral clustering method by using the tools from the mstrlx perturbation theory. We construct s welght mstrix of s graph and study Its elgenvalues and elgenve...This paper exposes some intrlnsic chsracterlstlca of the spectral clustering method by using the tools from the mstrlx perturbation theory. We construct s welght mstrix of s graph and study Its elgenvalues and elgenvectors. It shows that the number of clusters Is equal to the number of elgenvslues that are larger than 1, and the number of polnts In each of the clusters can be spproxlmsted by the associated elgenvslue. It also shows that the elgenvector of the weight rnatrlx can be used dlrectly to perform clusterlng; that Is, the dlrectlonsl angle between the two-row vectors of the mstrlx derlved from the elgenvectors Is s sultable distance measure for clustsrlng. As s result, an unsupervised spectral clusterlng slgorlthm based on welght mstrlx (USCAWM) Is developed. The experlmental results on s number of srtlficisl and real-world data sets show the correctness of the theoretical analysis.展开更多
Dilute suspension of particles with same density and size develops clusters when settle at high Reynolds number(≥250).It is due to particles entrapment in the wakes produced by upstream particles.In this work,this ph...Dilute suspension of particles with same density and size develops clusters when settle at high Reynolds number(≥250).It is due to particles entrapment in the wakes produced by upstream particles.In this work,this phenomenon is studied for suspension having particles with different densities by numerical simulations.The particle-fluid interactions are modelled using immersed boundary method and inter-particle collisions are modelled using discrete element method.In simulations,settling Reynolds number is always kept above 250 and the suspension solid volume fraction is nearly 0.1 percent.Two particle density ratios(i.e.density of heavy particles to lighter particles)equal to 4:1 and 2:1 and particles with same density are studied.For each density ratio,the percentage volume fraction of each particle density is nearly varied from 0.8 to 0.2.Settling characteristics such as microstructures of settling particle,average settling velocity and velocity fluctuations of settling particles are studied.Simulations show that for different density particles settling characteristics of suspension is largely dominated by heavy particles.At the end of paper,the underlying physics is explained for the anomalies observed in simulation.展开更多
基金supported by the National Key Research and Development Program of China(No.2022YFB3304400)the National Natural Science Foundation of China(Nos.6230311,62303111,62076060,61932007,and 62176083)the Key Research and Development Program of Jiangsu Province of China(No.BE2022157).
文摘Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have been introduced to formknowledge-driven clustering algorithms,which reveal a data structure that considers not only the relationships between data but also the compatibility with knowledge hints.However,these algorithms cannot produce the optimal number of clusters by the clustering algorithm itself;they require the assistance of evaluation indices.Moreover,knowledge hints are usually used as part of the data structure(directly replacing some clustering centers),which severely limits the flexibility of the algorithm and can lead to knowledgemisguidance.To solve this problem,this study designs a newknowledge-driven clustering algorithmcalled the PCM clusteringwith High-density Points(HP-PCM),in which domain knowledge is represented in the form of so-called high-density points.First,a newdatadensitycalculation function is proposed.The Density Knowledge Points Extraction(DKPE)method is established to filter out high-density points from the dataset to form knowledge hints.Then,these hints are incorporated into the PCM objective function so that the clustering algorithm is guided by high-density points to discover the natural data structure.Finally,the initial number of clusters is set to be greater than the true one based on the number of knowledge hints.Then,the HP-PCM algorithm automatically determines the final number of clusters during the clustering process by considering the cluster elimination mechanism.Through experimental studies,including some comparative analyses,the results highlight the effectiveness of the proposed algorithm,such as the increased success rate in clustering,the ability to determine the optimal cluster number,and the faster convergence speed.
文摘To the problem that it is hard to determine the clustering number and the abnormal points by using the clustering validity function, an effective clustering partition model based on the genetic algorithm is built in this paper. The solution to the problem is formed by the combination of the clustering partition and the encoding samples, and the fitness function is defined by the distances among and within clusters. The clustering number and the samples in each cluster are determined and the abnormal points are distinguished by implementing the triple random crossover operator and the mutation. Based on the known sample data, the results of the novel method and the clustering validity function are compared. Numerical experiments are given and the results show that the novel method is more effective.
基金funded by National Natural Science Foundation of China(Grant Nos.42272333,42277147).
文摘Refined 3D modeling of mine slopes is pivotal for precise prediction of geological hazards.Aiming at the inadequacy of existing single modeling methods in comprehensively representing the overall and localized characteristics of mining slopes,this study introduces a new method that fuses model data from Unmanned aerial vehicles(UAV)tilt photogrammetry and 3D laser scanning through a data alignment algorithm based on control points.First,the mini batch K-Medoids algorithm is utilized to cluster the point cloud data from ground 3D laser scanning.Then,the elbow rule is applied to determine the optimal cluster number(K0),and the feature points are extracted.Next,the nearest neighbor point algorithm is employed to match the feature points obtained from UAV tilt photogrammetry,and the internal point coordinates are adjusted through the distanceweighted average to construct a 3D model.Finally,by integrating an engineering case study,the K0 value is determined to be 8,with a matching accuracy between the two model datasets ranging from 0.0669 to 1.0373 mm.Therefore,compared with the modeling method utilizing K-medoids clustering algorithm,the new modeling method significantly enhances the computational efficiency,the accuracy of selecting the optimal number of feature points in 3D laser scanning,and the precision of the 3D model derived from UAV tilt photogrammetry.This method provides a research foundation for constructing mine slope model.
基金Supported by the National Key Research and Development Program of China(No.2016YFB0201305)National Science and Technology Major Project(No.2013ZX0102-8001-001-001)National Natural Science Foundation of China(No.91430218,31327901,61472395,61272134,61432018)
文摘Clustering data with varying densities and complicated structures is important,while many existing clustering algorithms face difficulties for this problem. The reason is that varying densities and complicated structure make single algorithms perform badly for different parts of data. More intensive parts are assumed to have more information probably,an algorithm clustering from high density part is proposed,which begins from a tiny distance to find the highest density-connected partition and form corresponding super cores,then distance is iteratively increased by a global heuristic method to cluster parts with different densities. Mean of silhouette coefficient indicates the cluster performance. Denoising function is implemented to eliminate influence of noise and outliers. Many challenging experiments indicate that the algorithm has good performance on data with widely varying densities and extremely complex structures. It decides the optimal number of clusters automatically.Background knowledge is not needed and parameters tuning is easy. It is robust against noise and outliers.
基金provided by the Marine S&T Fund of Shandong Province for Pilot National Laboratory for Marine Science and Technology (Qingdao) (No.2018SDKJ0501-2)。
文摘Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.
基金supported by the Research Management Center,Xiamen University Malaysia under XMUM Research Program Cycle 4(Grant No:XMUMRF/2019-C4/IECE/0012).
文摘Fuzzy C-means(FCM)is a clustering method that falls under unsupervised machine learning.The main issues plaguing this clustering algorithm are the number of the unknown clusters within a particular dataset and initialization sensitivity of cluster centres.Artificial Bee Colony(ABC)is a type of swarm algorithm that strives to improve the members’solution quality as an iterative process with the utilization of particular kinds of randomness.However,ABC has some weaknesses,such as balancing exploration and exploitation.To improve the exploration process within the ABC algorithm,the mean artificial bee colony(MeanABC)by its modified search equation that depends on solutions of mean previous and global best is used.Furthermore,to solve the main issues of FCM,Automatic clustering algorithm was proposed based on the mean artificial bee colony called(AC-MeanABC).It uses the MeanABC capability of balancing between exploration and exploitation and its capacity to explore the positive and negative directions in search space to find the best value of clusters number and centroids value.A few benchmark datasets and a set of natural images were used to evaluate the effectiveness of AC-MeanABC.The experimental findings are encouraging and indicate considerable improvements compared to other state-of-the-art approaches in the same domain.
文摘We propose a novel scheme based on clustering analysis in color space to solve text segmentation in complex color images. Text segmentation includes automatic clustering of color space and foreground image generation. Two methods are also proposed for automatic clustering: The first one is to determine the optimal number of clusters and the second one is the fuzzy competitively clustering method based on competitively learning techniques. Essential foreground images obtained from any of the color clusters are combined into foreground images. Further performance analysis reveals the advantages of the proposed methods.
文摘One of the most important problems of clustering is to define the number of classes. In fact, it is not easy to find an appropriate method to measure whether the cluster configuration is acceptable or not. In this paper we propose a possible and non-automatic solution considering different criteria of clustering and comparing their results. In this way robust structures of an analyzed dataset can be often caught (or established) and an optimal cluster configuration, which presents a meaningful association, may be defined. In particular, we also focus on the variables which may be used in cluster analysis. In fact, variables which contain little clustering information can cause misleading and not-robustness results. Therefore, three algorithms are employed in this study: K-means partitioning methods, Partitioning Around Medoids (PAM) and the Heuristic Identification of Noisy Variables (HINoV). The results are compared with robust methods ones.
基金This work was supported by the National Natural Science Foundation of China (Grant Nos. 69872003 and 40035010)
文摘The upper bound of the optimal number of clusters in clustering algorithm is studied in this paper. A new method is proposed to solve this issue. This method shows that the rule cmax≤N^(1/N), which is popular in current papers, is reasonable in some sense. The above conclusion is tested and analyzed by some typical examples in the literature, which demonstrates the validity of the new method.
基金support from the National Natural Science Foundation of China (11402276)
文摘The cavitation cloud of different internal structures results in different collapse pressures owing to the interaction among bubbles. The internal structure of cloud cavitation is required to accurately predict collapse pressure. A cavitation model was developed through dimensional analysis and direct numerical simulation of collapse of bubble cluster. Bubble number density was included in proposed model to characterize the internal structure of bubble cloud. Implemented on flows over a projectile, the proposed model predicts a higher collapse pressure compared with Singhal model. Results indicate that the collapse pressure of detached cavitation cloud is affected by bubble number density.
文摘Percolation theory deals with the numbers and properties of the clusters formed in the different occupation probability. In this Paper, we study the calculation method of small clusters. We calcu-lated the small cluster density of 1, 2 and 3 in the percolation model with the exact method and the numerical method. The results of the two methods are very close, which can be verified by each other. We find that the cluster density of all three kinds of small clusters reaches the highest value when the occupation probability is between 0.1 and 0.2. It is very difficult to get the analytical formula for the exact method when the cluster area is relatively large (such as the area is more than 50), so we can get the density value of the cluster by numerical method. We find that the time required calculating the cluster density is proportional to the percolation area, which is indepen-dent of the cluster size and the occupation probability.
基金Supported by the National Natural Science Foundation of China (42375002, 41975136, U2242201, and 42105146)Hunan Provincial Natural Science Foundation of China (2021JC0009)。
文摘Despite marked improvements in tropical cyclone(TC) track ensemble forecasting,forecasters still have difficulty in making quick decisions when facing multiple potential predictions,so it is demanding to develop post-processing techniques reducing the uncertainty in TC track forecasts,and one of such techniques is the cluster-based methods.To improve the effect and efficiency of the previous cluster-based methods,this study adopts recombination clustering(RC) by optimizing the use of limited TC variables and constructing better features that can accurately capture the good TC track forecasts from the ensemble prediction system(EPS) of the China Meteorological Administration Tropical Regional Atmosphere Model for the South China Sea(CMA-TRAMS).The RC technique is further optimized by constraining the number of clusters using the absolute track bias between the ensemble mean(EM) and ensemble spread(ES).Finally,the RC-based deterministic and weighted probabilistic forecasts are compared with the TC track forecasts from traditional methods.It is found that(1) for deterministic TC track forecasts,the RC-based TC track forecasts outperform all other methods at 12–72-h lead times;compared with the skillful EM(118.6 km),the improvements introduced by the use of RC reach up to 10.8%(8.1 km),10.2%(13.7 km),and 8.7%(20.5 km) at forecast times of 24,48,and 72 h,respectively.(2) For probabilistic TC track forecasts,RC yields significantly more accurate and discriminative forecasts than traditional equal-weight track forecasts,by increasing the weight of the best cluster,with a decrease of 4.1% in brier score(BS) and an increase of 1.4% in area under the relative operating characteristic curve(AUC).(3) In particular,for cases with recurved tracks,such as typhoons Saudel(2017) and Bavi(2008),RC significantly reduces track errors relative to EM by 56.0%(125.5 km) and 77.7%(192.2 km),respectively.Our results demonstrate that the RC technique not only improves TC track forecasts but also helps to unravel skillful ensemble members,and is likely useful for feature construction in machine learning.
基金Supported by the National Natural Science Foundation of China (Grant No. 60375003)the Aeronatical Science Foundation of China (Grant No. 03I53059)
文摘This paper exposes some intrlnsic chsracterlstlca of the spectral clustering method by using the tools from the mstrlx perturbation theory. We construct s welght mstrix of s graph and study Its elgenvalues and elgenvectors. It shows that the number of clusters Is equal to the number of elgenvslues that are larger than 1, and the number of polnts In each of the clusters can be spproxlmsted by the associated elgenvslue. It also shows that the elgenvector of the weight rnatrlx can be used dlrectly to perform clusterlng; that Is, the dlrectlonsl angle between the two-row vectors of the mstrlx derlved from the elgenvectors Is s sultable distance measure for clustsrlng. As s result, an unsupervised spectral clusterlng slgorlthm based on welght mstrlx (USCAWM) Is developed. The experlmental results on s number of srtlficisl and real-world data sets show the correctness of the theoretical analysis.
文摘Dilute suspension of particles with same density and size develops clusters when settle at high Reynolds number(≥250).It is due to particles entrapment in the wakes produced by upstream particles.In this work,this phenomenon is studied for suspension having particles with different densities by numerical simulations.The particle-fluid interactions are modelled using immersed boundary method and inter-particle collisions are modelled using discrete element method.In simulations,settling Reynolds number is always kept above 250 and the suspension solid volume fraction is nearly 0.1 percent.Two particle density ratios(i.e.density of heavy particles to lighter particles)equal to 4:1 and 2:1 and particles with same density are studied.For each density ratio,the percentage volume fraction of each particle density is nearly varied from 0.8 to 0.2.Settling characteristics such as microstructures of settling particle,average settling velocity and velocity fluctuations of settling particles are studied.Simulations show that for different density particles settling characteristics of suspension is largely dominated by heavy particles.At the end of paper,the underlying physics is explained for the anomalies observed in simulation.