Cluster analysis is a crucial technique in unsupervised machine learning,pattern recognition,and data analysis.However,current clustering algorithms suffer from the need for manual determination of parameter values,lo...Cluster analysis is a crucial technique in unsupervised machine learning,pattern recognition,and data analysis.However,current clustering algorithms suffer from the need for manual determination of parameter values,low accuracy,and inconsistent performance concerning data size and structure.To address these challenges,a novel clustering algorithm called the fully automated density-based clustering method(FADBC)is proposed.The FADBC method consists of two stages:parameter selection and cluster extraction.In the first stage,a proposed method extracts optimal parameters for the dataset,including the epsilon size and a minimum number of points thresholds.These parameters are then used in a density-based technique to scan each point in the dataset and evaluate neighborhood densities to find clusters.The proposed method was evaluated on different benchmark datasets andmetrics,and the experimental results demonstrate its competitive performance without requiring manual inputs.The results show that the FADBC method outperforms well-known clustering methods such as the agglomerative hierarchical method,k-means,spectral clustering,DBSCAN,FCDCSD,Gaussian mixtures,and density-based spatial clustering methods.It can handle any kind of data set well and perform excellently.展开更多
Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Sp...Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Spatial Clustering of Applications with Noise(DBSCAN).It identifies clusters by grouping the densely connected objects into one group and discarding the noise objects.It requires two input parameters:epsilon(fixed neighborhood radius)and MinPts(the lowest number of objects in epsilon).However,it can’t handle clusters of various densities since it uses a global value for epsilon.This article proposes an adaptation of the DBSCAN method so it can discover clusters of varied densities besides reducing the required number of input parameters to only one.Only user input in the proposed method is the MinPts.Epsilon on the other hand,is computed automatically based on statistical information of the dataset.The proposed method finds the core distance for each object in the dataset,takes the average of these distances as the first value of epsilon,and finds the clusters satisfying this density level.The remaining unclustered objects will be clustered using a new value of epsilon that equals the average core distances of unclustered objects.This process continues until all objects have been clustered or the remaining unclustered objects are less than 0.006 of the dataset’s size.The proposed method requires MinPts only as an input parameter because epsilon is computed from data.Benchmark datasets were used to evaluate the effectiveness of the proposed method that produced promising results.Practical experiments demonstrate that the outstanding ability of the proposed method to detect clusters of different densities even if there is no separation between them.The accuracy of the method ranges from 92%to 100%for the experimented datasets.展开更多
Ball milling is widely used in industry to mill particulate material.The primary purpose of this process is to attain an appropriate product size with the least possible energy consumption.The process is also extensiv...Ball milling is widely used in industry to mill particulate material.The primary purpose of this process is to attain an appropriate product size with the least possible energy consumption.The process is also extensively utilised in pharmaceuticals for the comminution of the excipients or drugs.Surprisingly,for ball mill,little is known concerning the mechanism of size reduction.Traditional prediction approaches are not deemed useful to provide significant insights into the operation or facilitate radical step changes in performance.Therefore,the discrete element method(DEM)as a computational modelling approach has been used in this paper.In previous research,DEM has been applied to simulate breaking behaviour through the impact energy of all ball collisions as the driving force for fracturing.However,the nature of pharmaceutical material fragmentation during ball milling is more complex.Suitable functional equations which link broken media and applied energy do not consider the collision of particulate media of different shapes or collisions of particulate media(such as granules)with balls and rotating mill drum.This could have a significant impact on fragmentation.Therefore,this paper aimed to investigate the fragmentation of bounded particles into DEM granules of different shape/size during the ball milling process.A systematic study was undertaken to explore the effect of milling speed on breakage behaviour.Also,in this study,a combination of a density-based clustering method and discrete element method was employed to numerically investigate the number and size of the fragments generated during the ball milling process over time.It was discovered that the collisions of the ball increased proportionally with rotation speed until reaching the critical rotation speed.Consequently,results illustrate that with an increase of rotation speed,the mill power increased correspondingly.The caratacting motion of mill material together with balls was identified as the most effective regime regarding the fragmentation,and fewer breakage events occurred for centrifugal motion.Higher quantities of the fines in each batch were produced with increased milling speed with less quantities of grain fragments.Moreover,the relationship between the number of produced fragment and milling speed at the end of the process exhibited a linear tendency.展开更多
South America’s climatic diversity is a product of its vast geographical expanse, encompassing tropical to subtropical latitudes. The variations in precipitation and temperature across the region stem from the influe...South America’s climatic diversity is a product of its vast geographical expanse, encompassing tropical to subtropical latitudes. The variations in precipitation and temperature across the region stem from the influence of distinct atmospheric systems. While some studies have characterized the prevailing systems over South America, they often lacked the utilization of statistical techniques for homogenization. On the other hand, other research has employed multivariate statistical methods to identify homogeneous regions regarding temperature and precipitation, but their focus has been limited to specific areas, such as the south, southeast, and northeast. Surprisingly, there is a lack of work that compares various multivariate statistical techniques to determine homogeneous regions across the entirety of South America concerning temperature and precipitation. This paper aims to address this gap by comparing three such techniques: Cluster Analysis (K-means and Ward) and Self Organizing Maps, using data from different sources for temperature (ERA5, ERA5-Land, and CRU) and precipitation (ERA5, ERA5-Land, and CPC). Spatial patterns and time series were generated for each region over the period 1981-2010. The results from this analysis of spatially homogeneous regions concerning temperature and precipitation have the potential to significantly benefit climate analysis and forecasts. Moreover, they can offer valuable insights for various climatological studies, guiding decision-making processes in diverse fields that rely on climate information, such as agriculture, disaster management, and water resources planning.展开更多
We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases...We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases. In the first phase, CDC employs the idea of density-based clustering algorithm to split the original data into a number of fragmented clusters. At the same time, CDC cuts off the noises and outliers. In the second phase, CDC employs the concept of K-means clustering algorithm to select a greater cluster to be the center. Then, the greater cluster merges some smaller clusters which satisfy some constraint rules. Due to the merged clusters around the center cluster, the clustering results show high accuracy. Moreover, CDC reduces the calculations and speeds up the clustering process. In this paper, the accuracy of CDC is evaluated and compared with those of K-means, hierarchical clustering, and the genetic clustering algorithm (GCA) proposed in 2004. Experimental results show that CDC has better performance.展开更多
Clustering evolving data streams is important to be performed in a limited time with a reasonable quality. The existing micro clustering based methods do not consider the distribution of data points inside the micro c...Clustering evolving data streams is important to be performed in a limited time with a reasonable quality. The existing micro clustering based methods do not consider the distribution of data points inside the micro cluster. We propose LeaDen-Stream (Leader Density-based clustering algorithm over evolving data Stream), a density-based clustering algorithm using leader clustering. The algorithm is based on a two-phase clustering. The online phase selects the proper mini-micro or micro-cluster leaders based on the distribution of data points in the micro clusters. Then, the leader centers are sent to the offline phase to form final clusters. In LeaDen-Stream, by carefully choosing between two kinds of micro leaders, we decrease time complexity of the clustering while maintaining the cluster quality. A pruning strategy is also used to filter out real data from noise by introducing dense and sparse mini-micro and micro-cluster leaders. Our performance study over a number of real and synthetic data sets demonstrates the effectiveness and efficiency of our method.展开更多
High fidelity analysis models,which are beneficial to improving the design quality,have been more and more widely utilized in the modern engineering design optimization problems.However,the high fidelity analysis mode...High fidelity analysis models,which are beneficial to improving the design quality,have been more and more widely utilized in the modern engineering design optimization problems.However,the high fidelity analysis models are so computationally expensive that the time required in design optimization is usually unacceptable.In order to improve the efficiency of optimization involving high fidelity analysis models,the optimization efficiency can be upgraded through applying surrogates to approximate the computationally expensive models,which can greately reduce the computation time.An efficient heuristic global optimization method using adaptive radial basis function(RBF) based on fuzzy clustering(ARFC) is proposed.In this method,a novel algorithm of maximin Latin hypercube design using successive local enumeration(SLE) is employed to obtain sample points with good performance in both space-filling and projective uniformity properties,which does a great deal of good to metamodels accuracy.RBF method is adopted for constructing the metamodels,and with the increasing the number of sample points the approximation accuracy of RBF is gradually enhanced.The fuzzy c-means clustering method is applied to identify the reduced attractive regions in the original design space.The numerical benchmark examples are used for validating the performance of ARFC.The results demonstrates that for most application examples the global optima are effectively obtained and comparison with adaptive response surface method(ARSM) proves that the proposed method can intuitively capture promising design regions and can efficiently identify the global or near-global design optimum.This method improves the efficiency and global convergence of the optimization problems,and gives a new optimization strategy for engineering design optimization problems involving computationally expensive models.展开更多
Tarq geochemical 1:100,000 Sheet is located in Isfahan province which is investigated by Iran’s Geological and Explorations Organization using stream sediment analyzes. This area has stratigraphy of Precambrian to Qu...Tarq geochemical 1:100,000 Sheet is located in Isfahan province which is investigated by Iran’s Geological and Explorations Organization using stream sediment analyzes. This area has stratigraphy of Precambrian to Quaternary rocks and is located in the Central Iran zone. According to the presence of signs of gold mineralization in this area, it is necessary to identify important mineral areas in this area. Therefore, finding information is necessary about the relationship and monitoring the elements of gold, arsenic, and antimony relative to each other in this area to determine the extent of geochemical halos and to estimate the grade. Therefore, a well-known and useful K-means method is used for monitoring the elements in the present study, this is a clustering method based on minimizing the total Euclidean distances of each sample from the center of the classes which are assigned to them. In this research, the clustering quality function and the utility rate of the sample have been used in the desired cluster (S(i)) to determine the optimum number of clusters. Finally, with regard to the cluster centers and the results, the equations were used to predict the amount of the gold element based on four parameters of arsenic and antimony grade, length and width of sampling points.展开更多
Single-molecule force spectroscopy(SMFS)measurements of the dynamics of biomolecules typically require identifying massive events and states from large data sets,such as extracting rupture forces from force-extension ...Single-molecule force spectroscopy(SMFS)measurements of the dynamics of biomolecules typically require identifying massive events and states from large data sets,such as extracting rupture forces from force-extension curves(FECs)in pulling experiments and identifying states from extension-time trajectories(ETTs)in force-clamp experiments.The former is often accomplished manually and hence is time-consuming and laborious while the latter is always impeded by the presence of baseline drift.In this study,we attempt to accurately and automatically identify the events and states from SMFS experiments with a machine learning approach,which combines clustering and classification for event identification of SMFS(ACCESS).As demonstrated by analysis of a series of data sets,ACCESS can extract the rupture forces from FECs containing multiple unfolding steps and classify the rupture forces into the corresponding conformational transitions.Moreover,ACCESS successfully identifies the unfolded and folded states even though the ETTs display severe nonmonotonic baseline drift.Besides,ACCESS is straightforward in use as it requires only three easy-to-interpret parameters.As such,we anticipate that ACCESS will be a useful,easy-to-implement and high-performance tool for event and state identification across a range of single-molecule experiments.展开更多
The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, d...The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, data with noise, data with mixture of heterogeneous cluster prototypes, asymmetric data, etc. Based on the Mercer kernel, FKCM clustering algorithm is derived from FCM algorithm united with kernel method. The results of experiments with the synthetic and real data show that the FKCM clustering algorithm is universality and can effectively unsupervised analyze datasets with variform structures in contrast to FCM algorithm. It is can be imagined that kernel-based clustering algorithm is one of important research direction of fuzzy clustering analysis.展开更多
The selection of refracturing candidate is one of the most important jobs faced by oilfield engineers. However, due to the complicated multi-parameter relationships and their comprehensive influence, the selection of ...The selection of refracturing candidate is one of the most important jobs faced by oilfield engineers. However, due to the complicated multi-parameter relationships and their comprehensive influence, the selection of refracturing candidate is often very difficult. In this paper, a novel approach combining data analysis techniques and fuzzy clustering was proposed to select refracturing candidate. First, the analysis techniques were used to quantitatively calculate the weight coefficient and determine the key factors. Then, the idealized refracturing well was established by considering the main factors. Fuzzy clustering was applied to evaluate refracturing potential. Finally, reservoirs numerical simulation was used to further evaluate reservoirs energy and material basis of the optimum refracturing candidates. The hybrid method has been successfully applied to a tight oil reservoir in China. The average steady production was 15.8 t/d after refracturing treatment, increasing significantly compared with previous status. The research results can guide the development of tight oil and gas reservoirs effectively.展开更多
In order to improve the accuracy and efficiency of 3D model retrieval,the method based on affinity propagation clustering algorithm is proposed. Firstly,projection ray-based method is proposed to improve the feature e...In order to improve the accuracy and efficiency of 3D model retrieval,the method based on affinity propagation clustering algorithm is proposed. Firstly,projection ray-based method is proposed to improve the feature extraction efficiency of 3D models. Based on the relationship between model and its projection,the intersection in 3D space is transformed into intersection in 2D space,which reduces the number of intersection and improves the efficiency of the extraction algorithm. In feature extraction,multi-layer spheres method is analyzed. The two-layer spheres method makes the feature vector more accurate and improves retrieval precision. Secondly,Semi-supervised Affinity Propagation ( S-AP) clustering is utilized because it can be applied to different cluster structures. The S-AP algorithm is adopted to find the center models and then the center model collection is built. During retrieval process,the collection is utilized to classify the query model into corresponding model base and then the most similar model is retrieved in the model base. Finally,75 sample models from Princeton library are selected to do the experiment and then 36 models are used for retrieval test. The results validate that the proposed method outperforms the original method and the retrieval precision and recall ratios are improved effectively.展开更多
Motif-based graph local clustering(MGLC)algorithms are gen-erally designed with the two-phase framework,which gets the motif weight for each edge beforehand and then conducts the local clustering algorithm on the weig...Motif-based graph local clustering(MGLC)algorithms are gen-erally designed with the two-phase framework,which gets the motif weight for each edge beforehand and then conducts the local clustering algorithm on the weighted graph to output the result.Despite correctness,this frame-work brings limitations on both practical and theoretical aspects and is less applicable in real interactive situations.This research develops a purely local and index-adaptive method,Index-adaptive Triangle-based Graph Local Clustering(TGLC+),to solve the MGLC problem w.r.t.triangle.TGLC+combines the approximated Monte-Carlo method Triangle-based Random Walk(TRW)and deterministic Brute-Force method Triangle-based Forward Push(TFP)adaptively to estimate the Personalized PageRank(PPR)vector without calculating the exact triangle-weighted transition probability and then outputs the clustering result by conducting the standard sweep procedure.This paper presents the efficiency of TGLC+through theoretical analysis and demonstrates its effectiveness through extensive experiments.To our knowl-edge,TGLC+is the first to solve the MGLC problem without computing the motif weight beforehand,thus achieving better efficiency with comparable effectiveness.TGLC+is suitable for large-scale and interactive graph analysis tasks,including visualization,system optimization,and decision-making.展开更多
The knowledge of bubble profiles in gas-liquid two-phase flows is crucial for analyzing the kinetic processes such as heat and mass transfer, and this knowledge is contained in field data obtained by surface-resolved ...The knowledge of bubble profiles in gas-liquid two-phase flows is crucial for analyzing the kinetic processes such as heat and mass transfer, and this knowledge is contained in field data obtained by surface-resolved computational fluid dynamics (CFD) simulations. To obtain this information, an efficient bubble profile reconstruction method based on an improved agglomerative hierarchical clustering (AHC) algorithm is proposed in this paper. The reconstruction method is featured by the implementations of a binary space division preprocessing, which aims to reduce the computational complexity, an adaptive linkage criterion, which guarantees the applicability of the AHC algorithm when dealing with datasets involving either non-uniform or distorted grids, and a stepwise execution strategy, which enables the separation of attached bubbles. To illustrate and verify this method, it was applied to dealing with 3 datasets, 2 of them with pre-specified spherical bubbles and the other obtained by a surface-resolved CFD simulation. Application results indicate that the proposed method is effective even when the data include some non-uniform and distortion.展开更多
To make the quantitative results of nuclear magnetic resonance(NMR) transverse relaxation(T;) spectrums reflect the type and pore structure of reservoir more directly, an unsupervised clustering method was developed t...To make the quantitative results of nuclear magnetic resonance(NMR) transverse relaxation(T;) spectrums reflect the type and pore structure of reservoir more directly, an unsupervised clustering method was developed to obtain the quantitative pore structure information from the NMR T;spectrums based on the Gaussian mixture model(GMM). Firstly, We conducted the principal component analysis on T;spectrums in order to reduce the dimension data and the dependence of the original variables. Secondly, the dimension-reduced data was fitted using the GMM probability density function, and the model parameters and optimal clustering numbers were obtained according to the expectation-maximization algorithm and the change of the Akaike information criterion. Finally, the T;spectrum features and pore structure types of different clustering groups were analyzed and compared with T;geometric mean and T;arithmetic mean. The effectiveness of the algorithm has been verified by numerical simulation and field NMR logging data. The research shows that the clustering results based on GMM method have good correlations with the shape and distribution of the T;spectrum, pore structure, and petroleum productivity, providing a new means for quantitative identification of pore structure, reservoir grading, and oil and gas productivity evaluation.展开更多
Designing a sparse array with reduced transmit/receive modules(TRMs)is vital for some applications where the antenna system’s size,weight,allowed operating space,and cost are limited.Sparse arrays exhibit distinct ar...Designing a sparse array with reduced transmit/receive modules(TRMs)is vital for some applications where the antenna system’s size,weight,allowed operating space,and cost are limited.Sparse arrays exhibit distinct architectures,roughly classified into three categories:Thinned arrays,nonuniformly spaced arrays,and clustered arrays.While numerous advanced synthesis methods have been presented for the three types of sparse arrays in recent years,a comprehensive review of the latest development in sparse array synthesis is lacking.This work aims to fill this gap by thoroughly summarizing these techniques.The study includes synthesis examples to facilitate a comparative analysis of different techniques in terms of both accuracy and efficiency.Thus,this review is intended to assist researchers and engineers in related fields,offering a clear understanding of the development and distinctions among sparse array synthesis techniques.展开更多
Refined 3D modeling of mine slopes is pivotal for precise prediction of geological hazards.Aiming at the inadequacy of existing single modeling methods in comprehensively representing the overall and localized charact...Refined 3D modeling of mine slopes is pivotal for precise prediction of geological hazards.Aiming at the inadequacy of existing single modeling methods in comprehensively representing the overall and localized characteristics of mining slopes,this study introduces a new method that fuses model data from Unmanned aerial vehicles(UAV)tilt photogrammetry and 3D laser scanning through a data alignment algorithm based on control points.First,the mini batch K-Medoids algorithm is utilized to cluster the point cloud data from ground 3D laser scanning.Then,the elbow rule is applied to determine the optimal cluster number(K0),and the feature points are extracted.Next,the nearest neighbor point algorithm is employed to match the feature points obtained from UAV tilt photogrammetry,and the internal point coordinates are adjusted through the distanceweighted average to construct a 3D model.Finally,by integrating an engineering case study,the K0 value is determined to be 8,with a matching accuracy between the two model datasets ranging from 0.0669 to 1.0373 mm.Therefore,compared with the modeling method utilizing K-medoids clustering algorithm,the new modeling method significantly enhances the computational efficiency,the accuracy of selecting the optimal number of feature points in 3D laser scanning,and the precision of the 3D model derived from UAV tilt photogrammetry.This method provides a research foundation for constructing mine slope model.展开更多
The idea of modified water masses is introduced and a cluster analysis is used for determining the boundary of modified water masses and its variety in the shallow water area of the Huanghai Sea (Yellow Sea) and the E...The idea of modified water masses is introduced and a cluster analysis is used for determining the boundary of modified water masses and its variety in the shallow water area of the Huanghai Sea (Yellow Sea) and the East China Sea. According to the specified standards to make the cluster, we have determined the number and boundary of the water masses and the mixed zones.The results obtained by the cluster method show that there are eight modified water masses in this area. According to the relative index of temperature and salinity,the modified water masses are divided into nine different characteristic parts. The water, masses may also be divided into three salinity types. On the TS-Diagram, the points concerning temperature and safinity of different modified mater masses are distributed around a curve, from which the characteristics of gradual modification may be embodied. The variation ranges of different modified water masses are all large, explaining the intensive modification of water masses in展开更多
With the development of green data centers,a large number of Uninterruptible Power Supply(UPS)resources in Internet Data Center(IDC)are becoming idle assets owing to their low utilization rate.The revitalization of th...With the development of green data centers,a large number of Uninterruptible Power Supply(UPS)resources in Internet Data Center(IDC)are becoming idle assets owing to their low utilization rate.The revitalization of these idle UPS resources is an urgent problem that must be addressed.Based on the energy storage type of the UPS(EUPS)and using renewable sources,a solution for IDCs is proposed in this study.Subsequently,an EUPS cluster classification method based on the concept of shared mechanism niche(CSMN)was proposed to effectively solve the EUPS control problem.Accordingly,the classified EUPS aggregation unit was used to determine the optimal operation of the IDC.An IDC cost minimization optimization model was established,and the Quantum Particle Swarm Optimization(QPSO)algorithm was adopted.Finally,the economy and effectiveness of the three-tier optimization framework and model were verified through three case studies.展开更多
[Objectives]The paper was to screen new varieties of long cowpea that are suitable for autumn cultivation in Hunan,as well as to develop a comprehensive evaluation method to assess their adaptability and performance.[...[Objectives]The paper was to screen new varieties of long cowpea that are suitable for autumn cultivation in Hunan,as well as to develop a comprehensive evaluation method to assess their adaptability and performance.[Methods]A total of 48 long cowpea varieties were introduced,and a range of comprehensive evaluation methods was employed to assess these varieties through the collection and analysis of field data.[Results]The square Euclidean distance of 14 allowed for the classification of all varieties into eight distinct groups.Groups II,III,and V belong to the autumn dominant group within this region,while groups I and VIII belong to the intermediate group.Additionally,groups IV,VI,and VII belong to the autumn inferior group in this area.Through a comparative analysis of various comprehensive evaluation methods,it was determined that the common factor comprehensive evaluation,grey correlation method,and fuzzy evaluation method were appropriate for application in the selection of long cowpea varieties.Furthermore,the evaluation outcomes were largely consistent with the cluster pedigree diagram.[Conclusions]Through comprehensive index method,ten varieties demonstrating superior performance in autumn cultivation have been identified,including C20,C42,C29,C40,C3,C14,C18,C25,C15,and C47.The selected varieties exhibit several advantageous traits,such as a reduced growth duration,a lower position of initial flower nodes,a decreased number of branches,predominantly green young pods,elongated pod strips,thicker pod structures,an increased number of pods per plant,and higher overall yields.These characteristics render them particularly valuable for extensive cultivation.展开更多
基金the Deanship of Scientific Research at Umm Al-Qura University,Grant Code:(23UQU4361009DSR001).
文摘Cluster analysis is a crucial technique in unsupervised machine learning,pattern recognition,and data analysis.However,current clustering algorithms suffer from the need for manual determination of parameter values,low accuracy,and inconsistent performance concerning data size and structure.To address these challenges,a novel clustering algorithm called the fully automated density-based clustering method(FADBC)is proposed.The FADBC method consists of two stages:parameter selection and cluster extraction.In the first stage,a proposed method extracts optimal parameters for the dataset,including the epsilon size and a minimum number of points thresholds.These parameters are then used in a density-based technique to scan each point in the dataset and evaluate neighborhood densities to find clusters.The proposed method was evaluated on different benchmark datasets andmetrics,and the experimental results demonstrate its competitive performance without requiring manual inputs.The results show that the FADBC method outperforms well-known clustering methods such as the agglomerative hierarchical method,k-means,spectral clustering,DBSCAN,FCDCSD,Gaussian mixtures,and density-based spatial clustering methods.It can handle any kind of data set well and perform excellently.
基金The author extends his appreciation to theDeputyship forResearch&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the project number(IFPSAU-2021/01/17758).
文摘Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Spatial Clustering of Applications with Noise(DBSCAN).It identifies clusters by grouping the densely connected objects into one group and discarding the noise objects.It requires two input parameters:epsilon(fixed neighborhood radius)and MinPts(the lowest number of objects in epsilon).However,it can’t handle clusters of various densities since it uses a global value for epsilon.This article proposes an adaptation of the DBSCAN method so it can discover clusters of varied densities besides reducing the required number of input parameters to only one.Only user input in the proposed method is the MinPts.Epsilon on the other hand,is computed automatically based on statistical information of the dataset.The proposed method finds the core distance for each object in the dataset,takes the average of these distances as the first value of epsilon,and finds the clusters satisfying this density level.The remaining unclustered objects will be clustered using a new value of epsilon that equals the average core distances of unclustered objects.This process continues until all objects have been clustered or the remaining unclustered objects are less than 0.006 of the dataset’s size.The proposed method requires MinPts only as an input parameter because epsilon is computed from data.Benchmark datasets were used to evaluate the effectiveness of the proposed method that produced promising results.Practical experiments demonstrate that the outstanding ability of the proposed method to detect clusters of different densities even if there is no separation between them.The accuracy of the method ranges from 92%to 100%for the experimented datasets.
基金supported by the Career-FIT Fellowshipsfunded through European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No.713654supported by ACCORD(ITMS project code:313021X329),funded through the European Regional Development Fund.
文摘Ball milling is widely used in industry to mill particulate material.The primary purpose of this process is to attain an appropriate product size with the least possible energy consumption.The process is also extensively utilised in pharmaceuticals for the comminution of the excipients or drugs.Surprisingly,for ball mill,little is known concerning the mechanism of size reduction.Traditional prediction approaches are not deemed useful to provide significant insights into the operation or facilitate radical step changes in performance.Therefore,the discrete element method(DEM)as a computational modelling approach has been used in this paper.In previous research,DEM has been applied to simulate breaking behaviour through the impact energy of all ball collisions as the driving force for fracturing.However,the nature of pharmaceutical material fragmentation during ball milling is more complex.Suitable functional equations which link broken media and applied energy do not consider the collision of particulate media of different shapes or collisions of particulate media(such as granules)with balls and rotating mill drum.This could have a significant impact on fragmentation.Therefore,this paper aimed to investigate the fragmentation of bounded particles into DEM granules of different shape/size during the ball milling process.A systematic study was undertaken to explore the effect of milling speed on breakage behaviour.Also,in this study,a combination of a density-based clustering method and discrete element method was employed to numerically investigate the number and size of the fragments generated during the ball milling process over time.It was discovered that the collisions of the ball increased proportionally with rotation speed until reaching the critical rotation speed.Consequently,results illustrate that with an increase of rotation speed,the mill power increased correspondingly.The caratacting motion of mill material together with balls was identified as the most effective regime regarding the fragmentation,and fewer breakage events occurred for centrifugal motion.Higher quantities of the fines in each batch were produced with increased milling speed with less quantities of grain fragments.Moreover,the relationship between the number of produced fragment and milling speed at the end of the process exhibited a linear tendency.
文摘South America’s climatic diversity is a product of its vast geographical expanse, encompassing tropical to subtropical latitudes. The variations in precipitation and temperature across the region stem from the influence of distinct atmospheric systems. While some studies have characterized the prevailing systems over South America, they often lacked the utilization of statistical techniques for homogenization. On the other hand, other research has employed multivariate statistical methods to identify homogeneous regions regarding temperature and precipitation, but their focus has been limited to specific areas, such as the south, southeast, and northeast. Surprisingly, there is a lack of work that compares various multivariate statistical techniques to determine homogeneous regions across the entirety of South America concerning temperature and precipitation. This paper aims to address this gap by comparing three such techniques: Cluster Analysis (K-means and Ward) and Self Organizing Maps, using data from different sources for temperature (ERA5, ERA5-Land, and CRU) and precipitation (ERA5, ERA5-Land, and CPC). Spatial patterns and time series were generated for each region over the period 1981-2010. The results from this analysis of spatially homogeneous regions concerning temperature and precipitation have the potential to significantly benefit climate analysis and forecasts. Moreover, they can offer valuable insights for various climatological studies, guiding decision-making processes in diverse fields that rely on climate information, such as agriculture, disaster management, and water resources planning.
文摘We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases. In the first phase, CDC employs the idea of density-based clustering algorithm to split the original data into a number of fragmented clusters. At the same time, CDC cuts off the noises and outliers. In the second phase, CDC employs the concept of K-means clustering algorithm to select a greater cluster to be the center. Then, the greater cluster merges some smaller clusters which satisfy some constraint rules. Due to the merged clusters around the center cluster, the clustering results show high accuracy. Moreover, CDC reduces the calculations and speeds up the clustering process. In this paper, the accuracy of CDC is evaluated and compared with those of K-means, hierarchical clustering, and the genetic clustering algorithm (GCA) proposed in 2004. Experimental results show that CDC has better performance.
文摘Clustering evolving data streams is important to be performed in a limited time with a reasonable quality. The existing micro clustering based methods do not consider the distribution of data points inside the micro cluster. We propose LeaDen-Stream (Leader Density-based clustering algorithm over evolving data Stream), a density-based clustering algorithm using leader clustering. The algorithm is based on a two-phase clustering. The online phase selects the proper mini-micro or micro-cluster leaders based on the distribution of data points in the micro clusters. Then, the leader centers are sent to the offline phase to form final clusters. In LeaDen-Stream, by carefully choosing between two kinds of micro leaders, we decrease time complexity of the clustering while maintaining the cluster quality. A pruning strategy is also used to filter out real data from noise by introducing dense and sparse mini-micro and micro-cluster leaders. Our performance study over a number of real and synthetic data sets demonstrates the effectiveness and efficiency of our method.
基金supported by National Natural Science Foundation of China (Grant Nos. 50875024,51105040)Excellent Young Scholars Research Fund of Beijing Institute of Technology,China (Grant No.2010Y0102)Defense Creative Research Group Foundation of China(Grant No. GFTD0803)
文摘High fidelity analysis models,which are beneficial to improving the design quality,have been more and more widely utilized in the modern engineering design optimization problems.However,the high fidelity analysis models are so computationally expensive that the time required in design optimization is usually unacceptable.In order to improve the efficiency of optimization involving high fidelity analysis models,the optimization efficiency can be upgraded through applying surrogates to approximate the computationally expensive models,which can greately reduce the computation time.An efficient heuristic global optimization method using adaptive radial basis function(RBF) based on fuzzy clustering(ARFC) is proposed.In this method,a novel algorithm of maximin Latin hypercube design using successive local enumeration(SLE) is employed to obtain sample points with good performance in both space-filling and projective uniformity properties,which does a great deal of good to metamodels accuracy.RBF method is adopted for constructing the metamodels,and with the increasing the number of sample points the approximation accuracy of RBF is gradually enhanced.The fuzzy c-means clustering method is applied to identify the reduced attractive regions in the original design space.The numerical benchmark examples are used for validating the performance of ARFC.The results demonstrates that for most application examples the global optima are effectively obtained and comparison with adaptive response surface method(ARSM) proves that the proposed method can intuitively capture promising design regions and can efficiently identify the global or near-global design optimum.This method improves the efficiency and global convergence of the optimization problems,and gives a new optimization strategy for engineering design optimization problems involving computationally expensive models.
文摘Tarq geochemical 1:100,000 Sheet is located in Isfahan province which is investigated by Iran’s Geological and Explorations Organization using stream sediment analyzes. This area has stratigraphy of Precambrian to Quaternary rocks and is located in the Central Iran zone. According to the presence of signs of gold mineralization in this area, it is necessary to identify important mineral areas in this area. Therefore, finding information is necessary about the relationship and monitoring the elements of gold, arsenic, and antimony relative to each other in this area to determine the extent of geochemical halos and to estimate the grade. Therefore, a well-known and useful K-means method is used for monitoring the elements in the present study, this is a clustering method based on minimizing the total Euclidean distances of each sample from the center of the classes which are assigned to them. In this research, the clustering quality function and the utility rate of the sample have been used in the desired cluster (S(i)) to determine the optimum number of clusters. Finally, with regard to the cluster centers and the results, the equations were used to predict the amount of the gold element based on four parameters of arsenic and antimony grade, length and width of sampling points.
基金the support from the Physical Research Platform in the School of Physics of Sun Yat-sen University(PRPSP,SYSU)Project supported by the National Natural Science Foundation of China(Grant No.12074445)the Open Fund of the State Key Laboratory of Optoelectronic Materials and Technologies of Sun Yat-sen University(Grant No.OEMT-2022-ZTS-05)。
文摘Single-molecule force spectroscopy(SMFS)measurements of the dynamics of biomolecules typically require identifying massive events and states from large data sets,such as extracting rupture forces from force-extension curves(FECs)in pulling experiments and identifying states from extension-time trajectories(ETTs)in force-clamp experiments.The former is often accomplished manually and hence is time-consuming and laborious while the latter is always impeded by the presence of baseline drift.In this study,we attempt to accurately and automatically identify the events and states from SMFS experiments with a machine learning approach,which combines clustering and classification for event identification of SMFS(ACCESS).As demonstrated by analysis of a series of data sets,ACCESS can extract the rupture forces from FECs containing multiple unfolding steps and classify the rupture forces into the corresponding conformational transitions.Moreover,ACCESS successfully identifies the unfolded and folded states even though the ETTs display severe nonmonotonic baseline drift.Besides,ACCESS is straightforward in use as it requires only three easy-to-interpret parameters.As such,we anticipate that ACCESS will be a useful,easy-to-implement and high-performance tool for event and state identification across a range of single-molecule experiments.
文摘The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, data with noise, data with mixture of heterogeneous cluster prototypes, asymmetric data, etc. Based on the Mercer kernel, FKCM clustering algorithm is derived from FCM algorithm united with kernel method. The results of experiments with the synthetic and real data show that the FKCM clustering algorithm is universality and can effectively unsupervised analyze datasets with variform structures in contrast to FCM algorithm. It is can be imagined that kernel-based clustering algorithm is one of important research direction of fuzzy clustering analysis.
基金Projects(51204054,51504203)supported by the National Natural Science Foundation of ChinaProject(2016ZX05023-001)supported by the National Science and Technology Major Project of China
文摘The selection of refracturing candidate is one of the most important jobs faced by oilfield engineers. However, due to the complicated multi-parameter relationships and their comprehensive influence, the selection of refracturing candidate is often very difficult. In this paper, a novel approach combining data analysis techniques and fuzzy clustering was proposed to select refracturing candidate. First, the analysis techniques were used to quantitatively calculate the weight coefficient and determine the key factors. Then, the idealized refracturing well was established by considering the main factors. Fuzzy clustering was applied to evaluate refracturing potential. Finally, reservoirs numerical simulation was used to further evaluate reservoirs energy and material basis of the optimum refracturing candidates. The hybrid method has been successfully applied to a tight oil reservoir in China. The average steady production was 15.8 t/d after refracturing treatment, increasing significantly compared with previous status. The research results can guide the development of tight oil and gas reservoirs effectively.
基金Sponsored by the National Natural Science Foundation of China (Grant No. 51075083)
文摘In order to improve the accuracy and efficiency of 3D model retrieval,the method based on affinity propagation clustering algorithm is proposed. Firstly,projection ray-based method is proposed to improve the feature extraction efficiency of 3D models. Based on the relationship between model and its projection,the intersection in 3D space is transformed into intersection in 2D space,which reduces the number of intersection and improves the efficiency of the extraction algorithm. In feature extraction,multi-layer spheres method is analyzed. The two-layer spheres method makes the feature vector more accurate and improves retrieval precision. Secondly,Semi-supervised Affinity Propagation ( S-AP) clustering is utilized because it can be applied to different cluster structures. The S-AP algorithm is adopted to find the center models and then the center model collection is built. During retrieval process,the collection is utilized to classify the query model into corresponding model base and then the most similar model is retrieved in the model base. Finally,75 sample models from Princeton library are selected to do the experiment and then 36 models are used for retrieval test. The results validate that the proposed method outperforms the original method and the retrieval precision and recall ratios are improved effectively.
基金supported by the Fundamental Research Funds for the Central Universities(No.2020JS005).
文摘Motif-based graph local clustering(MGLC)algorithms are gen-erally designed with the two-phase framework,which gets the motif weight for each edge beforehand and then conducts the local clustering algorithm on the weighted graph to output the result.Despite correctness,this frame-work brings limitations on both practical and theoretical aspects and is less applicable in real interactive situations.This research develops a purely local and index-adaptive method,Index-adaptive Triangle-based Graph Local Clustering(TGLC+),to solve the MGLC problem w.r.t.triangle.TGLC+combines the approximated Monte-Carlo method Triangle-based Random Walk(TRW)and deterministic Brute-Force method Triangle-based Forward Push(TFP)adaptively to estimate the Personalized PageRank(PPR)vector without calculating the exact triangle-weighted transition probability and then outputs the clustering result by conducting the standard sweep procedure.This paper presents the efficiency of TGLC+through theoretical analysis and demonstrates its effectiveness through extensive experiments.To our knowl-edge,TGLC+is the first to solve the MGLC problem without computing the motif weight beforehand,thus achieving better efficiency with comparable effectiveness.TGLC+is suitable for large-scale and interactive graph analysis tasks,including visualization,system optimization,and decision-making.
基金Projects(51634010,51676211) supported by the National Natural Science Foundation of ChinaProject(2017SK2253) supported by the Key Research and Development Program of Hunan Province,China
文摘The knowledge of bubble profiles in gas-liquid two-phase flows is crucial for analyzing the kinetic processes such as heat and mass transfer, and this knowledge is contained in field data obtained by surface-resolved computational fluid dynamics (CFD) simulations. To obtain this information, an efficient bubble profile reconstruction method based on an improved agglomerative hierarchical clustering (AHC) algorithm is proposed in this paper. The reconstruction method is featured by the implementations of a binary space division preprocessing, which aims to reduce the computational complexity, an adaptive linkage criterion, which guarantees the applicability of the AHC algorithm when dealing with datasets involving either non-uniform or distorted grids, and a stepwise execution strategy, which enables the separation of attached bubbles. To illustrate and verify this method, it was applied to dealing with 3 datasets, 2 of them with pre-specified spherical bubbles and the other obtained by a surface-resolved CFD simulation. Application results indicate that the proposed method is effective even when the data include some non-uniform and distortion.
基金Supported by the National Natural Science Foundation of China (42174142)National Science and Technology Major Project (2017ZX05039-002)+2 种基金Operation Fund of China National Petroleum Corporation Logging Key Laboratory (2021DQ20210107-11)Fundamental Research Funds for Central Universities (19CX02006A)Major Science and Technology Project of China National Petroleum Corporation (ZD2019-183-006)。
文摘To make the quantitative results of nuclear magnetic resonance(NMR) transverse relaxation(T;) spectrums reflect the type and pore structure of reservoir more directly, an unsupervised clustering method was developed to obtain the quantitative pore structure information from the NMR T;spectrums based on the Gaussian mixture model(GMM). Firstly, We conducted the principal component analysis on T;spectrums in order to reduce the dimension data and the dependence of the original variables. Secondly, the dimension-reduced data was fitted using the GMM probability density function, and the model parameters and optimal clustering numbers were obtained according to the expectation-maximization algorithm and the change of the Akaike information criterion. Finally, the T;spectrum features and pore structure types of different clustering groups were analyzed and compared with T;geometric mean and T;arithmetic mean. The effectiveness of the algorithm has been verified by numerical simulation and field NMR logging data. The research shows that the clustering results based on GMM method have good correlations with the shape and distribution of the T;spectrum, pore structure, and petroleum productivity, providing a new means for quantitative identification of pore structure, reservoir grading, and oil and gas productivity evaluation.
基金supported by the National Natural Science Foundation of China under Grant No.U2341208.
文摘Designing a sparse array with reduced transmit/receive modules(TRMs)is vital for some applications where the antenna system’s size,weight,allowed operating space,and cost are limited.Sparse arrays exhibit distinct architectures,roughly classified into three categories:Thinned arrays,nonuniformly spaced arrays,and clustered arrays.While numerous advanced synthesis methods have been presented for the three types of sparse arrays in recent years,a comprehensive review of the latest development in sparse array synthesis is lacking.This work aims to fill this gap by thoroughly summarizing these techniques.The study includes synthesis examples to facilitate a comparative analysis of different techniques in terms of both accuracy and efficiency.Thus,this review is intended to assist researchers and engineers in related fields,offering a clear understanding of the development and distinctions among sparse array synthesis techniques.
基金funded by National Natural Science Foundation of China(Grant Nos.42272333,42277147).
文摘Refined 3D modeling of mine slopes is pivotal for precise prediction of geological hazards.Aiming at the inadequacy of existing single modeling methods in comprehensively representing the overall and localized characteristics of mining slopes,this study introduces a new method that fuses model data from Unmanned aerial vehicles(UAV)tilt photogrammetry and 3D laser scanning through a data alignment algorithm based on control points.First,the mini batch K-Medoids algorithm is utilized to cluster the point cloud data from ground 3D laser scanning.Then,the elbow rule is applied to determine the optimal cluster number(K0),and the feature points are extracted.Next,the nearest neighbor point algorithm is employed to match the feature points obtained from UAV tilt photogrammetry,and the internal point coordinates are adjusted through the distanceweighted average to construct a 3D model.Finally,by integrating an engineering case study,the K0 value is determined to be 8,with a matching accuracy between the two model datasets ranging from 0.0669 to 1.0373 mm.Therefore,compared with the modeling method utilizing K-medoids clustering algorithm,the new modeling method significantly enhances the computational efficiency,the accuracy of selecting the optimal number of feature points in 3D laser scanning,and the precision of the 3D model derived from UAV tilt photogrammetry.This method provides a research foundation for constructing mine slope model.
文摘The idea of modified water masses is introduced and a cluster analysis is used for determining the boundary of modified water masses and its variety in the shallow water area of the Huanghai Sea (Yellow Sea) and the East China Sea. According to the specified standards to make the cluster, we have determined the number and boundary of the water masses and the mixed zones.The results obtained by the cluster method show that there are eight modified water masses in this area. According to the relative index of temperature and salinity,the modified water masses are divided into nine different characteristic parts. The water, masses may also be divided into three salinity types. On the TS-Diagram, the points concerning temperature and safinity of different modified mater masses are distributed around a curve, from which the characteristics of gradual modification may be embodied. The variation ranges of different modified water masses are all large, explaining the intensive modification of water masses in
基金supported by the Key Technology Projects of the China Southern Power Grid Corporation(STKJXM20200059)the Key Support Project of the Joint Fund of the National Natural Science Foundation of China(U22B20123)。
文摘With the development of green data centers,a large number of Uninterruptible Power Supply(UPS)resources in Internet Data Center(IDC)are becoming idle assets owing to their low utilization rate.The revitalization of these idle UPS resources is an urgent problem that must be addressed.Based on the energy storage type of the UPS(EUPS)and using renewable sources,a solution for IDCs is proposed in this study.Subsequently,an EUPS cluster classification method based on the concept of shared mechanism niche(CSMN)was proposed to effectively solve the EUPS control problem.Accordingly,the classified EUPS aggregation unit was used to determine the optimal operation of the IDC.An IDC cost minimization optimization model was established,and the Quantum Particle Swarm Optimization(QPSO)algorithm was adopted.Finally,the economy and effectiveness of the three-tier optimization framework and model were verified through three case studies.
基金Supported by China Agricultural Industry Research System(CARS-23-G31)Technology Innovation Guidance Project of Changde City(CDKJJ20220265,CDKJJ2023YF33).
文摘[Objectives]The paper was to screen new varieties of long cowpea that are suitable for autumn cultivation in Hunan,as well as to develop a comprehensive evaluation method to assess their adaptability and performance.[Methods]A total of 48 long cowpea varieties were introduced,and a range of comprehensive evaluation methods was employed to assess these varieties through the collection and analysis of field data.[Results]The square Euclidean distance of 14 allowed for the classification of all varieties into eight distinct groups.Groups II,III,and V belong to the autumn dominant group within this region,while groups I and VIII belong to the intermediate group.Additionally,groups IV,VI,and VII belong to the autumn inferior group in this area.Through a comparative analysis of various comprehensive evaluation methods,it was determined that the common factor comprehensive evaluation,grey correlation method,and fuzzy evaluation method were appropriate for application in the selection of long cowpea varieties.Furthermore,the evaluation outcomes were largely consistent with the cluster pedigree diagram.[Conclusions]Through comprehensive index method,ten varieties demonstrating superior performance in autumn cultivation have been identified,including C20,C42,C29,C40,C3,C14,C18,C25,C15,and C47.The selected varieties exhibit several advantageous traits,such as a reduced growth duration,a lower position of initial flower nodes,a decreased number of branches,predominantly green young pods,elongated pod strips,thicker pod structures,an increased number of pods per plant,and higher overall yields.These characteristics render them particularly valuable for extensive cultivation.