In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared dista...In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.展开更多
A convective and stratiform cloud classification method for weather radar is proposed based on the density-based spatial clustering of applications with noise(DBSCAN)algorithm.To identify convective and stratiform clo...A convective and stratiform cloud classification method for weather radar is proposed based on the density-based spatial clustering of applications with noise(DBSCAN)algorithm.To identify convective and stratiform clouds in different developmental phases,two-dimensional(2D)and three-dimensional(3D)models are proposed by applying reflectivity factors at 0.5°and at 0.5°,1.5°,and 2.4°elevation angles,respectively.According to the thresholds of the algorithm,which include echo intensity,the echo top height of 35 dBZ(ET),density threshold,andεneighborhood,cloud clusters can be marked into four types:deep-convective cloud(DCC),shallow-convective cloud(SCC),hybrid convective-stratiform cloud(HCS),and stratiform cloud(SFC)types.Each cloud cluster type is further identified as a core area and boundary area,which can provide more abundant cloud structure information.The algorithm is verified using the volume scan data observed with new-generation S-band weather radars in Nanjing,Xuzhou,and Qingdao.The results show that cloud clusters can be intuitively identified as core and boundary points,which change in area continuously during the process of convective evolution,by the improved DBSCAN algorithm.Therefore,the occurrence and disappearance of convective weather can be estimated in advance by observing the changes of the classification.Because density thresholds are different and multiple elevations are utilized in the 3D model,the identified echo types and areas are dissimilar between the 2D and 3D models.The 3D model identifies larger convective and stratiform clouds than the 2D model.However,the developing convective clouds of small areas at lower heights cannot be identified with the 3D model because they are covered by thick stratiform clouds.In addition,the 3D model can avoid the influence of the melting layer and better suggest convective clouds in the developmental stage.展开更多
To develop a better approach for spatial evaluation of drinking water quality, an intelligent evaluation method integrating a geographical information system(GIS) and an ant colony clustering algorithm(ACCA) was used....To develop a better approach for spatial evaluation of drinking water quality, an intelligent evaluation method integrating a geographical information system(GIS) and an ant colony clustering algorithm(ACCA) was used. Drinking water samples from 29 wells in Zhenping County, China, were collected and analyzed. 35 parameters on water quality were selected, such as chloride concentration, sulphate concentration, total hardness, nitrate concentration, fluoride concentration, turbidity, pH, chromium concentration, COD, bacterium amount, total coliforms and color. The best spatial interpolation methods for the 35 parameters were found and selected from all types of interpolation methods in GIS environment according to the minimum cross-validation errors. The ACCA was improved through three strategies, namely mixed distance function, average similitude degree and probability conversion functions. Then, the ACCA was carried out to obtain different water quality grades in the GIS environment. In the end, the result from the ACCA was compared with those from the competitive Hopfield neural network(CHNN) to validate the feasibility and effectiveness of the ACCA according to three evaluation indexes, which are stochastic sampling method, pixel amount and convergence speed. It is shown that the spatial water quality grades obtained from the ACCA were more effective, accurate and intelligent than those obtained from the CHNN.展开更多
A quick and accurate extraction of dominant colors of background images is the basis of adaptive camouflage design.This paper proposes a Color Image Quick Fuzzy C-Means(CIQFCM)clustering algorithm based on clustering ...A quick and accurate extraction of dominant colors of background images is the basis of adaptive camouflage design.This paper proposes a Color Image Quick Fuzzy C-Means(CIQFCM)clustering algorithm based on clustering spatial mapping.First,the clustering sample space was mapped from the image pixels to the quantized color space,and several methods were adopted to compress the amount of clustering samples.Then,an improved pedigree clustering algorithm was applied to obtain the initial class centers.Finally,CIQFCM clustering algorithm was used for quick extraction of dominant colors of background image.After theoretical analysis of the effect and efficiency of the CIQFCM algorithm,several experiments were carried out to discuss the selection of proper quantization intervals and to verify the effect and efficiency of the CIQFCM algorithm.The results indicated that the value of quantization intervals should be set to 4,and the proposed algorithm could improve the clustering efficiency while maintaining the clustering effect.In addition,as the image size increased from 128×128 to 1024×1024,the efficiency improvement of CIQFCM algorithm was increased from 6.44 times to 36.42 times,which demonstrated the significant advantage of CIQFCM algorithm in dominant colors extraction of large-size images.展开更多
DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the alg...DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the algorithm is inefficient when processing large scale data. The MR-CLOPE algorithm is proposed, which is an extension and improvement on CLOPE based on Map Reduce. Different from the previous parallel clustering method, a two-stage Map Reduce implementation framework is proposed. Each of the stage is implemented by one kind Map Reduce task. In the first stage, the DNS query logs are divided into multiple splits and the CLOPE algorithm is executed on each split. The second stage usually tends to iterate many times to merge the small clusters into bigger satisfactory ones. In these two stages, a novel partition process is designed to randomly spread out original sub clusters, which will be moved and merged in the map phrase of the second phase according to the defined merge criteria. In such way, the advantage of the original CLOPE algorithm is kept and its disadvantages are dealt with in the proposed framework to achieve more excellent clustering performance. The experiment results show that MR-CLOPE is not only faster but also has better clustering quality on DNS query logs compared with CLOPE.展开更多
In order to solve security problem of clustering algorithm, we proposed amethod to enhance the security of the well-known lowest-ID clustering algorithm. This method isbased on the idea of the secret sharing and the (...In order to solve security problem of clustering algorithm, we proposed amethod to enhance the security of the well-known lowest-ID clustering algorithm. This method isbased on the idea of the secret sharing and the (k, n) threshold cryptography, Each node, whetherclusterhead or ordinary member, holds a share of the global certificate, and any k nodes cancommunicate securely. There is no need for any clusterhead to execute extra functions more thanrouting. Our scheme needs some prior configuration before deployment, and can be used in criticalenvironment with small scale. The security-enhancement for Lowest-ID algorithm can also be appliedinto other clustering approaches with minor modification. The feasibility of this method wasverified bythe simulation results.展开更多
This paper focuses on the unsupervised detection of the Higgs boson particle using the most informative features and variables which characterize the“Higgs machine learning challenge 2014”data set.This unsupervised ...This paper focuses on the unsupervised detection of the Higgs boson particle using the most informative features and variables which characterize the“Higgs machine learning challenge 2014”data set.This unsupervised detection goes in this paper analysis through 4 steps:(1)selection of the most informative features from the considered data;(2)definition of the number of clusters based on the elbow criterion.The experimental results showed that the optimal number of clusters that group the considered data in an unsupervised manner corresponds to 2 clusters;(3)proposition of a new approach for hybridization of both hard and fuzzy clustering tuned with Ant Lion Optimization(ALO);(4)comparison with some existing metaheuristic optimizations such as Genetic Algorithm(GA)and Particle Swarm Optimization(PSO).By employing a multi-angle analysis based on the cluster validation indices,the confusion matrix,the efficiencies and purities rates,the average cost variation,the computational time and the Sammon mapping visualization,the results highlight the effectiveness of the improved Gustafson-Kessel algorithm optimized withALO(ALOGK)to validate the proposed approach.Even if the paper gives a complete clustering analysis,its novel contribution concerns only the Steps(1)and(3)considered above.The first contribution lies in the method used for Step(1)to select the most informative features and variables.We used the t-Statistic technique to rank them.Afterwards,a feature mapping is applied using Self-Organizing Map(SOM)to identify the level of correlation between them.Then,Particle Swarm Optimization(PSO),a metaheuristic optimization technique,is used to reduce the data set dimension.The second contribution of thiswork concern the third step,where each one of the clustering algorithms as K-means(KM),Global K-means(GlobalKM),Partitioning AroundMedoids(PAM),Fuzzy C-means(FCM),Gustafson-Kessel(GK)and Gath-Geva(GG)is optimized and tuned with ALO.展开更多
Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experien...Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.展开更多
A novel approach for constructing robust Mamdani fuzzy system was proposed, which consisted of an efficiency robust estimator(partial robust M-regression, PRM) in the parameter learning phase of the initial fuzzy syst...A novel approach for constructing robust Mamdani fuzzy system was proposed, which consisted of an efficiency robust estimator(partial robust M-regression, PRM) in the parameter learning phase of the initial fuzzy system, and an improved subtractive clustering algorithm in the fuzzy-rule-selecting phase. The weights obtained in PRM, which gives protection against noise and outliers, were incorporated into the potential measure of the subtractive cluster algorithm to enhance the robustness of the fuzzy rule cluster process, and a compact Mamdani-type fuzzy system was established after the parameters in the consequent parts of rules were re-estimated by partial least squares(PLS). The main characteristics of the new approach were its simplicity and ability to construct fuzzy system fast and robustly. Simulation and experiment results show that the proposed approach can achieve satisfactory results in various kinds of data domains with noise and outliers. Compared with D-SVD and ARRBFN, the proposed approach yields much fewer rules and less RMSE values.展开更多
Big data analysis has penetrated into all fields of society and has brought about profound changes.However,there is relatively little research on big data supporting student management regarding college and university...Big data analysis has penetrated into all fields of society and has brought about profound changes.However,there is relatively little research on big data supporting student management regarding college and university’s big data.Taking the student card information as the research sample,using spark big data mining technology and K-Means clustering algorithm,taking scholarship evaluation as an example,the big data is analyzed.Data includes analysis of students’daily behavior from multiple dimensions,and it can prevent the unreasonable scholarship evaluation caused by unfair factors such as plagiarism,votes of teachers and students,etc.At the same time,students’absenteeism,physical health and psychological status in advance can be predicted,which makes student management work more active,accurate and effective.展开更多
Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical...Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.展开更多
Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-com...Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-commerce increases tremendously,the pressure on delivery companies increases to organise their transportation plans to achieve profits and customer satisfaction.One important planning problem in this domain is the multi-vehicle profitable pickup and delivery problem(MVPPDP),where a selected set of pickup and delivery customers need to be served within certain allowed trip time.In this paper,we proposed hybrid clustering algorithms with the greedy randomised adaptive search procedure(GRASP)to construct an initial solution for the MVPPDP.Our approaches first cluster the search space in order to reduce its dimensionality,then use GRASP to build routes for each cluster.We compared our results with state-of-the-art construction heuristics that have been used to construct initial solutions to this problem.Experimental results show that our proposed algorithms contribute to achieving excellent performance in terms of both quality of solutions and processing time.展开更多
Molecular dynamics (MD) simulation has become a powerful tool to investigate the structure- function relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timesc...Molecular dynamics (MD) simulation has become a powerful tool to investigate the structure- function relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timescales. MD simulations often produce massive datasets con- taining millions of snapshots describing proteins in motion. Therefore, clustering algorithms have been in high demand to be developed and applied to classify these MD snapshots and gain biological insights. There mainly exist two categories of clustering algorithms that aim to group protein conformations into clusters based on the similarity of their shape (geometric clustering) and kinetics (kinetic clustering). In this paper, we review a series of frequently used clustering algorithms applied in MD simulations, including divisive algorithms, ag- glomerative algorithms (single-linkage, complete-linkage, average-linkage, centroid-linkage and ward-linkage), center-based algorithms (K-Means, K-Medoids, K-Centers, and APM), density-based algorithms (neighbor-based, DBSCAN, density-peaks, and Robust-DB), and spectral-based algorithms (PCCA and PCCA+). In particular, differences between geomet- ric and kinetic clustering metrics will be discussed along with the performances of diflhrent clustering algorithms. We note that there does not exist a one-size-fits-all algorithm in the classification of MD datasets. For a specific application, the right choice of clustering algo- rithm should be based on the purpose of clustering, and the intrinsic properties of the MD conformational ensembles. Therefore, a main focus of our review is to describe the merits and limitations of each clustering algorithm. We expect that this review would be helpful to guide researchers to choose appropriate clustering algorithms for their own MD datasets.展开更多
Recently,the fundamental problem with Hybrid Mobile Ad-hoc Net-works(H-MANETs)is tofind a suitable and secure way of balancing the load through Internet gateways.Moreover,the selection of the gateway and overload of th...Recently,the fundamental problem with Hybrid Mobile Ad-hoc Net-works(H-MANETs)is tofind a suitable and secure way of balancing the load through Internet gateways.Moreover,the selection of the gateway and overload of the network results in packet loss and Delay(DL).For optimal performance,it is important to load balance between different gateways.As a result,a stable load balancing procedure is implemented,which selects gateways based on Fuzzy Logic(FL)and increases the efficiency of the network.In this case,since gate-ways are selected based on the number of nodes,the Energy Consumption(EC)was high.This paper presents a novel Node Quality-based Clustering Algo-rithm(NQCA)based on Fuzzy-Genetic for Cluster Head and Gateway Selection(FGCHGS).This algorithm combines NQCA with the Improved Weighted Clus-tering Algorithm(IWCA).The NQCA algorithm divides the network into clusters based upon node priority,transmission range,and neighbourfidelity.In addition,the simulation results tend to evaluate the performance effectiveness of the FFFCHGS algorithm in terms of EC,packet loss rate(PLR),etc.展开更多
In this letter, a real-time C-V (Characteristic-Vector) clustering algorithm is put forth to treat with vast action data which are dynamically collected from web site. The algorithm cites the concept of C-V to denote ...In this letter, a real-time C-V (Characteristic-Vector) clustering algorithm is put forth to treat with vast action data which are dynamically collected from web site. The algorithm cites the concept of C-V to denote characteristic, synchronously it adopts two-value [0,1]input and self-definition vigilance parameter to design clustering-architecture. Vector Degree of Matching (VDM) plays a key role in the clustering algorithm, which determines the magnitude of typical characteristic. Making use of stability analysis, the classifications are confirmed to have reliably hierarchical structure when vigilance parameter shifts from 0.1 to 0.99. This non-linear relation between vigilance parameter and classification upper limit helps mining out representative classifications from net-users according to the actual web resource, then administering system can map them to web resource space to implement the intelligent configuration effectually and rapidly.展开更多
At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for ident...At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for identifying high-risk scenarios of interlocking faults in new energy power grids based on a deep embedding clustering(DEC)algorithm and apply it in a risk assessment of cascading failures in different operating scenarios for new energy power grids.First,considering the real-time operation status and system structure of new energy power grids,the scenario cascading failure risk indicator is established.Based on this indicator,the risk of cascading failure is calculated for the scenario set,the scenarios are clustered based on the DEC algorithm,and the scenarios with the highest indicators are selected as the significant risk scenario set.The results of simulations with an example power grid show that our method can effectively identify scenarios with a high risk of cascading failures from a large number of scenarios.展开更多
The human error mechanism in coal mine safety is analyzed specifically from safety psychological and physiological factors, worker' s quality, safety management, safety education, mechanical equipment, and working en...The human error mechanism in coal mine safety is analyzed specifically from safety psychological and physiological factors, worker' s quality, safety management, safety education, mechanical equipment, and working environment, and also a human error' dominant factors classification model playing a great effect on the safety production of coal mine is established with the application of ant clustering algorithm. The experimental results show that management is the key in the human errors of coal mine.展开更多
In recent years,the urbanization process has brought modernity while also causing key issues,such as traffic congestion and parking conflicts.Therefore,cities need a more intelligent"brain"to form more intel...In recent years,the urbanization process has brought modernity while also causing key issues,such as traffic congestion and parking conflicts.Therefore,cities need a more intelligent"brain"to form more intelligent and efficient transportation systems.At present,as a type of machine learning,the traditional clustering algorithm still has limitations.K-means algorithm is widely used to solve traffic clustering problems,but it has limitations,such as sensitivity to initial points and poor robustness.Therefore,based on the hybrid architecture of Quantum Annealing(QA)and brain-inspired cognitive computing,this study proposes QA and Brain-Inspired Clustering Algorithm(QABICA)to solve the problem of urban taxi-stand locations.Based on the traffic trajectory data of Xi’an and Chengdu provided by Didi Chuxing,the clustering results of our algorithm and K-means algorithm are compared.We find that the average taxi-stand location bias of the final result based on QABICA is smaller than that based on K-means,and the bias of our algorithm can effectively reduce the tradition K-means bias by approximately 42%,up to approximately 83%,with higher robustness.QA algorithm is able to jump out of the local suboptimal solutions and approach the global optimum,and brain-inspired cognitive computing provides search feedback and direction.Thus,we will further consider applying our algorithm to analyze urban traffic flow,and solve traffic congestion and other key problems in intelligent transportation.展开更多
In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising...In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.展开更多
Color is an important element to consider when shaping urban characteristics.However,previous studies seldom included quantitative analyses of color relationships between urban agglomerations within proximal regions a...Color is an important element to consider when shaping urban characteristics.However,previous studies seldom included quantitative analyses of color relationships between urban agglomerations within proximal regions and with similar cultures to distinguish and shape individual urban personalities.This study focused on Xiamen,Zhangzhou,and Quanzhou metropolitan areas,which are influenced by Minnan culture,and collected natural and cultural landscape network images that collectively represent the urban landscape in China.Color extraction,computer vision processing technologies,and clustering algorithms,such as k-means partitioning,hierarchical methods,and co-occurrence frequency,were applied using image recognition.We then established an urban color database and quantified color attributes.Finally,we conducted a comparative analysis of dominant colors and color combination associations in Xiamen,Zhangzhou,and Quanzhou metropolitan areas to explore their similarities and differences and define their characteristics.We also considered other cities of the same type for comparison.展开更多
文摘In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.
基金funded by the Key-Area Research and Development Program of Guangdong Province(Grant No.2020B1111200001)the Key project of monitoring,early warning and prevention of major natural disasters of China(Grant No.2019YFC1510304)+1 种基金the S&T Program of Hebei(Grant No.19275408D)the Scientific Research Projects of Weather Modification in Northwest China(Grant No.RYSY201905).
文摘A convective and stratiform cloud classification method for weather radar is proposed based on the density-based spatial clustering of applications with noise(DBSCAN)algorithm.To identify convective and stratiform clouds in different developmental phases,two-dimensional(2D)and three-dimensional(3D)models are proposed by applying reflectivity factors at 0.5°and at 0.5°,1.5°,and 2.4°elevation angles,respectively.According to the thresholds of the algorithm,which include echo intensity,the echo top height of 35 dBZ(ET),density threshold,andεneighborhood,cloud clusters can be marked into four types:deep-convective cloud(DCC),shallow-convective cloud(SCC),hybrid convective-stratiform cloud(HCS),and stratiform cloud(SFC)types.Each cloud cluster type is further identified as a core area and boundary area,which can provide more abundant cloud structure information.The algorithm is verified using the volume scan data observed with new-generation S-band weather radars in Nanjing,Xuzhou,and Qingdao.The results show that cloud clusters can be intuitively identified as core and boundary points,which change in area continuously during the process of convective evolution,by the improved DBSCAN algorithm.Therefore,the occurrence and disappearance of convective weather can be estimated in advance by observing the changes of the classification.Because density thresholds are different and multiple elevations are utilized in the 3D model,the identified echo types and areas are dissimilar between the 2D and 3D models.The 3D model identifies larger convective and stratiform clouds than the 2D model.However,the developing convective clouds of small areas at lower heights cannot be identified with the 3D model because they are covered by thick stratiform clouds.In addition,the 3D model can avoid the influence of the melting layer and better suggest convective clouds in the developmental stage.
基金Projects(41161020,41261026) supported by the National Natural Science Foundation of ChinaProject(BQD2012013) supported by the Research starting Funds for Imported Talents,Ningxia University,China+1 种基金Project(ZR1209) supported by the Natural Science Funds,Ningxia University,ChinaProject(NGY2013005) supported by the Key Science Project of Colleges and Universities in Ningxia,China
文摘To develop a better approach for spatial evaluation of drinking water quality, an intelligent evaluation method integrating a geographical information system(GIS) and an ant colony clustering algorithm(ACCA) was used. Drinking water samples from 29 wells in Zhenping County, China, were collected and analyzed. 35 parameters on water quality were selected, such as chloride concentration, sulphate concentration, total hardness, nitrate concentration, fluoride concentration, turbidity, pH, chromium concentration, COD, bacterium amount, total coliforms and color. The best spatial interpolation methods for the 35 parameters were found and selected from all types of interpolation methods in GIS environment according to the minimum cross-validation errors. The ACCA was improved through three strategies, namely mixed distance function, average similitude degree and probability conversion functions. Then, the ACCA was carried out to obtain different water quality grades in the GIS environment. In the end, the result from the ACCA was compared with those from the competitive Hopfield neural network(CHNN) to validate the feasibility and effectiveness of the ACCA according to three evaluation indexes, which are stochastic sampling method, pixel amount and convergence speed. It is shown that the spatial water quality grades obtained from the ACCA were more effective, accurate and intelligent than those obtained from the CHNN.
文摘A quick and accurate extraction of dominant colors of background images is the basis of adaptive camouflage design.This paper proposes a Color Image Quick Fuzzy C-Means(CIQFCM)clustering algorithm based on clustering spatial mapping.First,the clustering sample space was mapped from the image pixels to the quantized color space,and several methods were adopted to compress the amount of clustering samples.Then,an improved pedigree clustering algorithm was applied to obtain the initial class centers.Finally,CIQFCM clustering algorithm was used for quick extraction of dominant colors of background image.After theoretical analysis of the effect and efficiency of the CIQFCM algorithm,several experiments were carried out to discuss the selection of proper quantization intervals and to verify the effect and efficiency of the CIQFCM algorithm.The results indicated that the value of quantization intervals should be set to 4,and the proposed algorithm could improve the clustering efficiency while maintaining the clustering effect.In addition,as the image size increased from 128×128 to 1024×1024,the efficiency improvement of CIQFCM algorithm was increased from 6.44 times to 36.42 times,which demonstrated the significant advantage of CIQFCM algorithm in dominant colors extraction of large-size images.
基金Project(61103046) supported in part by the National Natural Science Foundation of ChinaProject(B201312) supported by DHU Distinguished Young Professor Program,China+1 种基金Project(LY14F020007) supported by Zhejiang Provincial Natural Science Funds of ChinaProject(2014A610072) supported by the Natural Science Foundation of Ningbo City,China
文摘DNS(domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the algorithm is inefficient when processing large scale data. The MR-CLOPE algorithm is proposed, which is an extension and improvement on CLOPE based on Map Reduce. Different from the previous parallel clustering method, a two-stage Map Reduce implementation framework is proposed. Each of the stage is implemented by one kind Map Reduce task. In the first stage, the DNS query logs are divided into multiple splits and the CLOPE algorithm is executed on each split. The second stage usually tends to iterate many times to merge the small clusters into bigger satisfactory ones. In these two stages, a novel partition process is designed to randomly spread out original sub clusters, which will be moved and merged in the map phrase of the second phase according to the defined merge criteria. In such way, the advantage of the original CLOPE algorithm is kept and its disadvantages are dealt with in the proposed framework to achieve more excellent clustering performance. The experiment results show that MR-CLOPE is not only faster but also has better clustering quality on DNS query logs compared with CLOPE.
基金Supported by the National High Technology Re search and Development Program of China (2003AA142080)
文摘In order to solve security problem of clustering algorithm, we proposed amethod to enhance the security of the well-known lowest-ID clustering algorithm. This method isbased on the idea of the secret sharing and the (k, n) threshold cryptography, Each node, whetherclusterhead or ordinary member, holds a share of the global certificate, and any k nodes cancommunicate securely. There is no need for any clusterhead to execute extra functions more thanrouting. Our scheme needs some prior configuration before deployment, and can be used in criticalenvironment with small scale. The security-enhancement for Lowest-ID algorithm can also be appliedinto other clustering approaches with minor modification. The feasibility of this method wasverified bythe simulation results.
文摘This paper focuses on the unsupervised detection of the Higgs boson particle using the most informative features and variables which characterize the“Higgs machine learning challenge 2014”data set.This unsupervised detection goes in this paper analysis through 4 steps:(1)selection of the most informative features from the considered data;(2)definition of the number of clusters based on the elbow criterion.The experimental results showed that the optimal number of clusters that group the considered data in an unsupervised manner corresponds to 2 clusters;(3)proposition of a new approach for hybridization of both hard and fuzzy clustering tuned with Ant Lion Optimization(ALO);(4)comparison with some existing metaheuristic optimizations such as Genetic Algorithm(GA)and Particle Swarm Optimization(PSO).By employing a multi-angle analysis based on the cluster validation indices,the confusion matrix,the efficiencies and purities rates,the average cost variation,the computational time and the Sammon mapping visualization,the results highlight the effectiveness of the improved Gustafson-Kessel algorithm optimized withALO(ALOGK)to validate the proposed approach.Even if the paper gives a complete clustering analysis,its novel contribution concerns only the Steps(1)and(3)considered above.The first contribution lies in the method used for Step(1)to select the most informative features and variables.We used the t-Statistic technique to rank them.Afterwards,a feature mapping is applied using Self-Organizing Map(SOM)to identify the level of correlation between them.Then,Particle Swarm Optimization(PSO),a metaheuristic optimization technique,is used to reduce the data set dimension.The second contribution of thiswork concern the third step,where each one of the clustering algorithms as K-means(KM),Global K-means(GlobalKM),Partitioning AroundMedoids(PAM),Fuzzy C-means(FCM),Gustafson-Kessel(GK)and Gath-Geva(GG)is optimized and tuned with ALO.
文摘Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.
基金Project(61473298)supported by the National Natural Science Foundation of ChinaProject(2015QNA65)supported by Fundamental Research Funds for the Central Universities,China
文摘A novel approach for constructing robust Mamdani fuzzy system was proposed, which consisted of an efficiency robust estimator(partial robust M-regression, PRM) in the parameter learning phase of the initial fuzzy system, and an improved subtractive clustering algorithm in the fuzzy-rule-selecting phase. The weights obtained in PRM, which gives protection against noise and outliers, were incorporated into the potential measure of the subtractive cluster algorithm to enhance the robustness of the fuzzy rule cluster process, and a compact Mamdani-type fuzzy system was established after the parameters in the consequent parts of rules were re-estimated by partial least squares(PLS). The main characteristics of the new approach were its simplicity and ability to construct fuzzy system fast and robustly. Simulation and experiment results show that the proposed approach can achieve satisfactory results in various kinds of data domains with noise and outliers. Compared with D-SVD and ARRBFN, the proposed approach yields much fewer rules and less RMSE values.
基金Nanjing Key Laboratory of Intelligent Information Processing Open Fund Project(No.19AIP05)。
文摘Big data analysis has penetrated into all fields of society and has brought about profound changes.However,there is relatively little research on big data supporting student management regarding college and university’s big data.Taking the student card information as the research sample,using spark big data mining technology and K-Means clustering algorithm,taking scholarship evaluation as an example,the big data is analyzed.Data includes analysis of students’daily behavior from multiple dimensions,and it can prevent the unreasonable scholarship evaluation caused by unfair factors such as plagiarism,votes of teachers and students,etc.At the same time,students’absenteeism,physical health and psychological status in advance can be predicted,which makes student management work more active,accurate and effective.
文摘Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.
基金Deanship of scientific research for funding and supporting this research through the initiative of DSR Graduate Students Research Support(GSR).
文摘Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-commerce increases tremendously,the pressure on delivery companies increases to organise their transportation plans to achieve profits and customer satisfaction.One important planning problem in this domain is the multi-vehicle profitable pickup and delivery problem(MVPPDP),where a selected set of pickup and delivery customers need to be served within certain allowed trip time.In this paper,we proposed hybrid clustering algorithms with the greedy randomised adaptive search procedure(GRASP)to construct an initial solution for the MVPPDP.Our approaches first cluster the search space in order to reduce its dimensionality,then use GRASP to build routes for each cluster.We compared our results with state-of-the-art construction heuristics that have been used to construct initial solutions to this problem.Experimental results show that our proposed algorithms contribute to achieving excellent performance in terms of both quality of solutions and processing time.
基金supported by Shenzhen Science and Technology Innovation Committee(JCYJ20170413173837121)the Hong Kong Research Grant Council(HKUST C6009-15G,14203915,16302214,16304215,16318816,and AoE/P-705/16)+2 种基金King Abdullah University of Science and Technology(KAUST) Office of Sponsored Research(OSR)(OSR-2016-CRG5-3007)Guangzhou Science Technology and Innovation Commission(201704030116)Innovation and Technology Commission(ITCPD/17-9and ITC-CNERC14SC01)
文摘Molecular dynamics (MD) simulation has become a powerful tool to investigate the structure- function relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timescales. MD simulations often produce massive datasets con- taining millions of snapshots describing proteins in motion. Therefore, clustering algorithms have been in high demand to be developed and applied to classify these MD snapshots and gain biological insights. There mainly exist two categories of clustering algorithms that aim to group protein conformations into clusters based on the similarity of their shape (geometric clustering) and kinetics (kinetic clustering). In this paper, we review a series of frequently used clustering algorithms applied in MD simulations, including divisive algorithms, ag- glomerative algorithms (single-linkage, complete-linkage, average-linkage, centroid-linkage and ward-linkage), center-based algorithms (K-Means, K-Medoids, K-Centers, and APM), density-based algorithms (neighbor-based, DBSCAN, density-peaks, and Robust-DB), and spectral-based algorithms (PCCA and PCCA+). In particular, differences between geomet- ric and kinetic clustering metrics will be discussed along with the performances of diflhrent clustering algorithms. We note that there does not exist a one-size-fits-all algorithm in the classification of MD datasets. For a specific application, the right choice of clustering algo- rithm should be based on the purpose of clustering, and the intrinsic properties of the MD conformational ensembles. Therefore, a main focus of our review is to describe the merits and limitations of each clustering algorithm. We expect that this review would be helpful to guide researchers to choose appropriate clustering algorithms for their own MD datasets.
文摘Recently,the fundamental problem with Hybrid Mobile Ad-hoc Net-works(H-MANETs)is tofind a suitable and secure way of balancing the load through Internet gateways.Moreover,the selection of the gateway and overload of the network results in packet loss and Delay(DL).For optimal performance,it is important to load balance between different gateways.As a result,a stable load balancing procedure is implemented,which selects gateways based on Fuzzy Logic(FL)and increases the efficiency of the network.In this case,since gate-ways are selected based on the number of nodes,the Energy Consumption(EC)was high.This paper presents a novel Node Quality-based Clustering Algo-rithm(NQCA)based on Fuzzy-Genetic for Cluster Head and Gateway Selection(FGCHGS).This algorithm combines NQCA with the Improved Weighted Clus-tering Algorithm(IWCA).The NQCA algorithm divides the network into clusters based upon node priority,transmission range,and neighbourfidelity.In addition,the simulation results tend to evaluate the performance effectiveness of the FFFCHGS algorithm in terms of EC,packet loss rate(PLR),etc.
基金Supported by 973 National R&D Items(G1998030413)and Centurial Project of CAS
文摘In this letter, a real-time C-V (Characteristic-Vector) clustering algorithm is put forth to treat with vast action data which are dynamically collected from web site. The algorithm cites the concept of C-V to denote characteristic, synchronously it adopts two-value [0,1]input and self-definition vigilance parameter to design clustering-architecture. Vector Degree of Matching (VDM) plays a key role in the clustering algorithm, which determines the magnitude of typical characteristic. Making use of stability analysis, the classifications are confirmed to have reliably hierarchical structure when vigilance parameter shifts from 0.1 to 0.99. This non-linear relation between vigilance parameter and classification upper limit helps mining out representative classifications from net-users according to the actual web resource, then administering system can map them to web resource space to implement the intelligent configuration effectually and rapidly.
基金funded by the State Grid Limited Science and Technology Project of China,Grant Number SGSXDK00DJJS2200144.
文摘At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for identifying high-risk scenarios of interlocking faults in new energy power grids based on a deep embedding clustering(DEC)algorithm and apply it in a risk assessment of cascading failures in different operating scenarios for new energy power grids.First,considering the real-time operation status and system structure of new energy power grids,the scenario cascading failure risk indicator is established.Based on this indicator,the risk of cascading failure is calculated for the scenario set,the scenarios are clustered based on the DEC algorithm,and the scenarios with the highest indicators are selected as the significant risk scenario set.The results of simulations with an example power grid show that our method can effectively identify scenarios with a high risk of cascading failures from a large number of scenarios.
文摘The human error mechanism in coal mine safety is analyzed specifically from safety psychological and physiological factors, worker' s quality, safety management, safety education, mechanical equipment, and working environment, and also a human error' dominant factors classification model playing a great effect on the safety production of coal mine is established with the application of ant clustering algorithm. The experimental results show that management is the key in the human errors of coal mine.
基金the Special Zone Project of National Defense Innovation,the National Natural Science Foundation of China(Nos.61572304 and 61272096)the Key Program of the National Natural Science Foundation of China(No.61332019)Open Research Fund of State Key Laboratory of Cryptology。
文摘In recent years,the urbanization process has brought modernity while also causing key issues,such as traffic congestion and parking conflicts.Therefore,cities need a more intelligent"brain"to form more intelligent and efficient transportation systems.At present,as a type of machine learning,the traditional clustering algorithm still has limitations.K-means algorithm is widely used to solve traffic clustering problems,but it has limitations,such as sensitivity to initial points and poor robustness.Therefore,based on the hybrid architecture of Quantum Annealing(QA)and brain-inspired cognitive computing,this study proposes QA and Brain-Inspired Clustering Algorithm(QABICA)to solve the problem of urban taxi-stand locations.Based on the traffic trajectory data of Xi’an and Chengdu provided by Didi Chuxing,the clustering results of our algorithm and K-means algorithm are compared.We find that the average taxi-stand location bias of the final result based on QABICA is smaller than that based on K-means,and the bias of our algorithm can effectively reduce the tradition K-means bias by approximately 42%,up to approximately 83%,with higher robustness.QA algorithm is able to jump out of the local suboptimal solutions and approach the global optimum,and brain-inspired cognitive computing provides search feedback and direction.Thus,we will further consider applying our algorithm to analyze urban traffic flow,and solve traffic congestion and other key problems in intelligent transportation.
基金The National Natural Science Foundation of China(No.50674086)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060290508)the Postdoctoral Scientific Program of Jiangsu Province(No.0701045B)
文摘In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.
基金This research was funded by the Xiamen University Tan Kah Kee College 2019 School-level Research Incubation Project(YY2019L04).
文摘Color is an important element to consider when shaping urban characteristics.However,previous studies seldom included quantitative analyses of color relationships between urban agglomerations within proximal regions and with similar cultures to distinguish and shape individual urban personalities.This study focused on Xiamen,Zhangzhou,and Quanzhou metropolitan areas,which are influenced by Minnan culture,and collected natural and cultural landscape network images that collectively represent the urban landscape in China.Color extraction,computer vision processing technologies,and clustering algorithms,such as k-means partitioning,hierarchical methods,and co-occurrence frequency,were applied using image recognition.We then established an urban color database and quantified color attributes.Finally,we conducted a comparative analysis of dominant colors and color combination associations in Xiamen,Zhangzhou,and Quanzhou metropolitan areas to explore their similarities and differences and define their characteristics.We also considered other cities of the same type for comparison.