The observation error model of the underwater acous-tic positioning system is an important factor to influence the positioning accuracy of the underwater target.For the position inconsistency error caused by consideri...The observation error model of the underwater acous-tic positioning system is an important factor to influence the positioning accuracy of the underwater target.For the position inconsistency error caused by considering the underwater tar-get as a mass point,as well as the observation system error,the traditional error model best estimation trajectory(EMBET)with little observed data and too many parameters can lead to the ill-condition of the parameter model.In this paper,a multi-station fusion system error model based on the optimal polynomial con-straint is constructed,and the corresponding observation sys-tem error identification based on improved spectral clustering is designed.Firstly,the reduced parameter unified modeling for the underwater target position parameters and the system error is achieved through the polynomial optimization.Then a multi-sta-tion non-oriented graph network is established,which can address the problem of the inaccurate identification for the sys-tem errors.Moreover,the similarity matrix of the spectral cluster-ing is improved,and the iterative identification for the system errors based on the improved spectral clustering is proposed.Finally,the comprehensive measured data of long baseline lake test and sea test show that the proposed method can accu-rately identify the system errors,and moreover can improve the positioning accuracy for the underwater target positioning.展开更多
Traditional unsupervised seismic facies analysis techniques need to assume that seismic data obey mixed Gaussian distribution.However,fi eld seismic data may not meet this condition,thereby leading to wrong classifi c...Traditional unsupervised seismic facies analysis techniques need to assume that seismic data obey mixed Gaussian distribution.However,fi eld seismic data may not meet this condition,thereby leading to wrong classifi cation in the application of this technology.This paper introduces a spectral clustering technique for unsupervised seismic facies analysis.This algorithm is based on on the idea of a graph to cluster the data.Its kem is that seismic data are regarded as points in space,points can be connected with the edge and construct to graphs.When the graphs are divided,the weights of the edges between the different subgraphs are as low as possible,whereas the weights of the inner edges of the subgraph should be as high as possible.That has high computational complexity and entails large memory consumption for spectral clustering algorithm.To solve the problem this paper introduces the idea of sparse representation into spectral clustering.Through the selection of a small number of local sparse representation points,the spectral clustering matrix of all sample points is approximately represented to reduce the cost of spectral clustering operation.Verifi cation of physical model and fi eld data shows that the proposed approach can obtain more accurate seismic facies classification results without considering the data meet any hypothesis.The computing efficiency of this new method is better than that of the conventional spectral clustering method,thereby meeting the application needs of fi eld seismic data.展开更多
The similarity measure is crucial to the performance of spectral clustering. The Gaussian kernel function based on the Euclidean distance is usual y adopted as the similarity measure. However, the Euclidean distance m...The similarity measure is crucial to the performance of spectral clustering. The Gaussian kernel function based on the Euclidean distance is usual y adopted as the similarity measure. However, the Euclidean distance measure cannot ful y reveal the complex distribution data, and the result of spectral clustering is very sensitive to the scaling parameter. To solve these problems, a new manifold distance measure and a novel simulated anneal-ing spectral clustering (SASC) algorithm based on the manifold distance measure are proposed. The simulated annealing based on genetic algorithm (SAGA), characterized by its rapid convergence to the global optimum, is used to cluster the sample points in the spectral mapping space. The proposed algorithm can not only reflect local and global consistency better, but also reduce the sensitivity of spectral clustering to the kernel parameter, which improves the algorithm’s clustering performance. To efficiently apply the algorithm to image segmentation, the Nystrom method is used to reduce the computation complexity. Experimental results show that compared with traditional clustering algorithms and those popular spectral clustering algorithms, the proposed algorithm can achieve better clustering performances on several synthetic datasets, texture images and real images.展开更多
The defense techniques for machine learning are critical yet challenging due tothe number and type of attacks for widely applied machine learning algorithms aresignificantly increasing. Among these attacks, the poison...The defense techniques for machine learning are critical yet challenging due tothe number and type of attacks for widely applied machine learning algorithms aresignificantly increasing. Among these attacks, the poisoning attack, which disturbsmachine learning algorithms by injecting poisoning samples, is an attack with the greatestthreat. In this paper, we focus on analyzing the characteristics of positioning samples andpropose a novel sample evaluation method to defend against the poisoning attack cateringfor the characteristics of poisoning samples. To capture the intrinsic data characteristicsfrom heterogeneous aspects, we first evaluate training data by multiple criteria, each ofwhich is reformulated from a spectral clustering. Then, we integrate the multipleevaluation scores generated by the multiple criteria through the proposed multiplespectral clustering aggregation (MSCA) method. Finally, we use the unified score as theindicator of poisoning attack samples. Experimental results on intrusion detection datasets show that MSCA significantly outperforms the K-means outlier detection in terms ofdata legality evaluation and poisoning attack detection.展开更多
Clustering is one of the most widely used techniques for exploratory data analysis. Spectral clustering algorithm, a popular modern cluslering algorithm, has been shown to be more effective in detecting clusters than ...Clustering is one of the most widely used techniques for exploratory data analysis. Spectral clustering algorithm, a popular modern cluslering algorithm, has been shown to be more effective in detecting clusters than many traditional algorithms. It has applications ranging from computer vision and information retrieval to social sienee and biology. With the size of databases soaring, cluostering algorithms bare saling computational time and memory use. In this paper, we propose a parallel spectral elustering implementation based on MapRednee. Both the computation and data storage are dislributed, which solves the sealability problems for most existing algorithms. We empirically analyze the proposed implementation on both benchmark net- works and a real social network dataset of about two million vertices and two billion edges crawled from Sina Weibo. It is shown that the proposed implementation scales well, speeds up the clustering without sacrificing quality, and processes massive datasets efficiently on commodity machine clusters.展开更多
This paper proposes a novel phishing web image segmentation algorithm which based on improving spectral clustering.Firstly,we construct a set of points which are composed of spatial location pixels and gray levels fro...This paper proposes a novel phishing web image segmentation algorithm which based on improving spectral clustering.Firstly,we construct a set of points which are composed of spatial location pixels and gray levels from a given image.Secondly,the data is clustered in spectral space of the similar matrix of the set points,in order to avoid the drawbacks of K-means algorithm in the conventional spectral clustering method that is sensitive to initial clustering centroids and convergence to local optimal solution,we introduce the clone operator,Cauthy mutation to enlarge the scale of clustering centers,quantum-inspired evolutionary algorithm to find the global optimal clustering centroids.Compared with phishing web image segmentation based on K-means,experimental results show that the segmentation performance of our method gains much improvement.Moreover,our method can convergence to global optimal solution and is better in accuracy of phishing web segmentation.展开更多
[ Objective] The research aimed to study assessment index system of the rainstorm disaster in Fujian Province based on spectral cluste- ring model with grey correlation analysis. [Method] According to meteorological d...[ Objective] The research aimed to study assessment index system of the rainstorm disaster in Fujian Province based on spectral cluste- ring model with grey correlation analysis. [Method] According to meteorological disaster yearbook in Fujian Province, by comprehensively consider- ing disaster-inducing factor, disaster-inducing environment, disaster-sustaining body and regional disaster-prevention level, evaluation index system of the regional rainstorm disaster in Fujian was established. By spectral clustering model based on grey correlation analysis, dsk zoning of the rain- storm disaster was conducted in each area of Fujian. Finally, effect and application of the clustering model were analyzed by case research. [ Re- sult] In order to dig immanent connection among regional characteristics and improve disaster-preventing linkage performance of the evaluation unit, a spectral clustering model based on grey correlation analysis was used to conduct risk zoning of the rainstorm disaster in Fujian Province. Moreo- ver, combined weight was introduced to judge each evaluation index, so as to adjust clustering model. By case study, rainstorm disaster levels in 67 counties were obtained. Internal characteristics of each type were analyzed, and main correlation factors of each type were extracted. It was compared with statistical result of the rainstorm disaster, verifying validity and feasibility of the model. [ Conclusion] The method was feasible, and its evaluated result had better differentiation and decision accuracv.展开更多
In the process of clothing image researching,how to segment the clothing quickly and accurately and retain the clothing style details as much as possible is the basis of subsequent image analysis.Spectral clustering c...In the process of clothing image researching,how to segment the clothing quickly and accurately and retain the clothing style details as much as possible is the basis of subsequent image analysis.Spectral clustering clothing image segmentation algorithm is a common method in the process of clothing image extraction.However,the traditional model requires high computing power and is easily affected by the initial center of clustering.It often falls into local optimization.Aiming at the above two points,an improved spectral clustering clothing image segmentation algorithm is proposed in this paper.The Nystrom approximation strategy is introduced into the spectral mapping process to reduce the computational complexity.In the clustering stage,this algorithm uses the global optimization advantage of the particle swarm optimization algorithm and selects the sparrow search algorithm to search the optimal initial clustering point,to effectively avoid the occurrence of local optimization.In the end,the effectiveness of this algorithm is verified on clothing images in each environment.展开更多
A new fuzzy support vector machine algorithm with dual membership values based on spectral clustering method is pro- posed to overcome the shortcoming of the normal support vector machine algorithm, which divides the ...A new fuzzy support vector machine algorithm with dual membership values based on spectral clustering method is pro- posed to overcome the shortcoming of the normal support vector machine algorithm, which divides the training datasets into two absolutely exclusive classes in the binary classification, ignoring the possibility of "overlapping" region between the two training classes. The proposed method handles sample "overlap" effi- ciently with spectral clustering, overcoming the disadvantages of over-fitting well, and improving the data mining efficiency greatly. Simulation provides clear evidences to the new method.展开更多
This paper exposes some intrlnsic chsracterlstlca of the spectral clustering method by using the tools from the mstrlx perturbation theory. We construct s welght mstrix of s graph and study Its elgenvalues and elgenve...This paper exposes some intrlnsic chsracterlstlca of the spectral clustering method by using the tools from the mstrlx perturbation theory. We construct s welght mstrix of s graph and study Its elgenvalues and elgenvectors. It shows that the number of clusters Is equal to the number of elgenvslues that are larger than 1, and the number of polnts In each of the clusters can be spproxlmsted by the associated elgenvslue. It also shows that the elgenvector of the weight rnatrlx can be used dlrectly to perform clusterlng; that Is, the dlrectlonsl angle between the two-row vectors of the mstrlx derlved from the elgenvectors Is s sultable distance measure for clustsrlng. As s result, an unsupervised spectral clusterlng slgorlthm based on welght mstrlx (USCAWM) Is developed. The experlmental results on s number of srtlficisl and real-world data sets show the correctness of the theoretical analysis.展开更多
Phasor measurement units(PMUs) can provide real-time measurement data to construct the ubiquitous electric of the Internet of Things. However, due to complex factors on site, PMU data can be easily compromised by inte...Phasor measurement units(PMUs) can provide real-time measurement data to construct the ubiquitous electric of the Internet of Things. However, due to complex factors on site, PMU data can be easily compromised by interference or synchronization jitter. It will lead to various levels of PMU data quality issues, which can directly affect the PMU-based application and even threaten the safety of power systems. In order to improve the PMU data quality, a data-driven PMU bad data detection algorithm based on spectral clustering using single PMU data is proposed in this paper. The proposed algorithm does not require the system topology and parameters. Firstly, a data identification method based on a decision tree is proposed to distinguish event data and bad data by using the slope feature of each data. Then, a bad data detection method based on spectral clustering is developed. By analyzing the weighted relationships among all the data, this method can detect the bad data with a small deviation. Simulations and results of field recording data test illustrate that this data-driven method can achieve bad data identification and detection effectively. This technique can improve PMU data quality to guarantee its applications in the power systems.展开更多
This paper proposes a sampling based hierarchical approach for solving the computational demands of the spectral clustering methods when applied to the problem of image segmentation. The authors first define the dista...This paper proposes a sampling based hierarchical approach for solving the computational demands of the spectral clustering methods when applied to the problem of image segmentation. The authors first define the distance between a pixel and a cluster, and then derive a new theorem to estimate the number of samples needed for clustering. Finally, by introducing a scale parameter into the similarity function, a novel spectral clustering based image segmentation method has been developed. An important characteristic of the approach is that in the course of image segmentation one needs not only to tune the scale parameter to merge the small size clusters or split the large size clusters but also take samples from the data set at the different scales. The multiscale and stochastic nature makes it feasible to apply the method to very large grouping problem. In addition, it also makes the segmentation compute in time that is linear in the size of the image. The experimental results on various synthetic and real world images show the effectiveness of the approach.展开更多
An unsupervised learning algorithm, named soft spectral clustering ensemble (SSCE), is proposed in this paper. Until now many proposed ensemble algorithms cannot be used on image data, even images of a mere 256 ...An unsupervised learning algorithm, named soft spectral clustering ensemble (SSCE), is proposed in this paper. Until now many proposed ensemble algorithms cannot be used on image data, even images of a mere 256 × 256 pixels are too expensive in computational cost and storage. The proposed method is suitable for performing image segmentation and can, to some degree, solve some open problems of spectral clustering (SC). In this paper, a random scaling parameter and Nystrǒm approximation are applied to generate the individual spectral clusters for ensemble learning. We slightly modify the standard SC algorithm to aquire a soft partition and then map it via a centralized logcontrast transform to relax the constraint of probability data, the sum of which is one. All mapped data are concatenated to form the new features for each instance. Principal component analysis (PCA) is used to reduce the dimension of the new features. The final aggregated result can be achieved by clustering dimension-reduced data. Experimental results, on UCI data and different image types, show that the proposed algorithm is more efficient compared with some existing consensus functions.展开更多
The spectral clustering method has notable advantages in segmentation.But the high computational complexity and time consuming limit its application in large-scale and dense airborne Light Detection and Ranging(LiDAR)...The spectral clustering method has notable advantages in segmentation.But the high computational complexity and time consuming limit its application in large-scale and dense airborne Light Detection and Ranging(LiDAR)point cloud data.We proposed the Nyström-based spectral clustering(NSC)algorithm to decrease the computational burden.This novel NSC method showed accurate and rapid in individual tree segmentation using point cloud data.The K-nearest neighbour-based sampling(KNNS)was proposed for the Nyström approximation of voxels to improve the efficiency.The NSC algorithm showed good performance for 32 plots in China and Europe.The overall matching rate and extraction rate of proposed algorithm reached 69%and 103%.For all trees located by Global Navigation Satellite System(GNSS)calibrated tape-measures,the tree height regression of the matching results showed an value of 0.88 and a relative root mean square error(RMSE)of 5.97%.For all trees located by GNSS calibrated total-station measures,the values were 0.89 and 4.49%.The method also showed good performance in a benchmark dataset with an improvement of 7%for the average matching rate.The results demonstrate that the proposed NSC algorithm provides an accurate individual tree segmentation and parameter estimation using airborne LiDAR point cloud data.展开更多
In this paper, we explore a novel ensemble method for spectral clustering. In contrast to the traditional clustering ensemble methods that combine all the obtained clustering results, we propose the adaptive spectral ...In this paper, we explore a novel ensemble method for spectral clustering. In contrast to the traditional clustering ensemble methods that combine all the obtained clustering results, we propose the adaptive spectral clustering ensemble method to achieve a better clustering solution. This method can adaptively assess the number of the component members, which is not owned by many other algorithms. The component clusterings of the ensemble system are generated by spectral clustering (SC) which bears some good characteristics to engender the diverse committees. The selection process works by evaluating the generated component spectral clustering through resampling technique and population-based incremental learning algorithm (PBIL). Experimental results on UCI datasets demonstrate that the proposed algorithm can achieve better results compared with traditional clustering ensemble methods, especially when the number of component clusterings is large.展开更多
Spectral clustering is a well-regarded subspace clustering algorithm that exhibits outstanding performance in hyperspectral image classification through eigenvalue decomposition of the Laplacian matrix.However,its cla...Spectral clustering is a well-regarded subspace clustering algorithm that exhibits outstanding performance in hyperspectral image classification through eigenvalue decomposition of the Laplacian matrix.However,its classification accuracy is severely limited by the selected eigenvectors,and the commonly used eigenvectors not only fail to guarantee the inclusion of detailed discriminative information,but also have high computational complexity.To address these challenges,we proposed an intuitive eigenvector selection method based on the coincidence degree of data distribution(CDES).First,the clustering result of improved k-means,which can well reflect the spatial distribution of various types was used as the reference map.Then,the adjusted Rand index and adjusted mutual information were calculated to assess the data distribution consistency between each eigenvector and the reference map.Finally,the eigenvectors with high coincidence degrees were selected for clustering.A case study on hyperspectral mineral mapping demonstrated that the mapping accuracies of CDES are approximately 56.3%,15.5%,and 10.5%higher than those of the commonly used top,high entropy,and high relevance eigenvectors,and CDES can save more than 99%of the eigenvector selection time.Especially,due to the unsupervised nature of k-means,CDES provides a novel solution for autonomous feature selection of hyperspectral images.展开更多
For understanding more about the water exchange between the Kuroshio and the East China Sea,We studied the variability of the Kuroshio in the East China Sea(ECS) in the period of 1991 to 2008 using a three-dimensional...For understanding more about the water exchange between the Kuroshio and the East China Sea,We studied the variability of the Kuroshio in the East China Sea(ECS) in the period of 1991 to 2008 using a three-dimensional circulation model,and calculated Kuroshio onshore volume transport in the ECS at the minimum of 0.48 Sv(1 Sv ;106 m3/s) in summer and the maximum of 1.69 Sv in winter.Based on the data of WOA05 and NCEP,The modeled result indicates that the Kuroshio transport east of Taiwan Island decreased since 2000.Lateral movements tended to be stronger at two ends of the Kuroshio in the ECS than that of the middle segment.In addition,we applied a spectral mixture model(SMM) to determine the exchange zone between the Kuroshio and the shelf water of the ECS.The result reveals a significantly negative correlation(coefficient of-0.78) between the area of exchange zone and the Kuroshio onshore transport at 200 m isobath in the ECS.This conclusion brings a new view for the water exchange between the Kuroshio and the East China Sea.Additional to annual and semi-annual signals,intra-seasonal signal of probably the Pacific origin may trigger the events of Kuroshio intrusion and exchange in the ECS.展开更多
Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with ...Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with this problem,we propose two scenarios to extract meaningful strings based on document clustering and term clustering with multi-strategies to optimize a Vector Space Model(VSM) in order to improve webpage classification.The results show that document clustering work better than term clustering in coping with document content.However,a better overall performance is obtained by spectral clustering with document clustering.Moreover,owing to image existing in a same webpage with document content,the proposed method is also applied to extract image meaningful terms,and experiment results also show its effectiveness in improving webpage classification.展开更多
Clustering analysis plays a very important role in the field of data mining,image segmentation and pattern recognition.The method of cluster analysis is introduced to analyze NetEYun music data.In addition,different t...Clustering analysis plays a very important role in the field of data mining,image segmentation and pattern recognition.The method of cluster analysis is introduced to analyze NetEYun music data.In addition,different types of music data are clustered to find the commonness among the same kind of music.A music data-oriented clustering analysis method is proposed:Firstly,the audio beat period is calculated by reading the audio file data,and the emotional features of the audio are extracted;Secondly,the audio beat period is calculated by Fourier transform.Finally,a clustering algorithm is designed to obtain the clustering results of music data.展开更多
Dynamometer cards are commonly used to analyze down-hole working conditions of pumping systems in actual oil production. Nowadays, the traditional supervised learning methods heavily rely on the classification accurac...Dynamometer cards are commonly used to analyze down-hole working conditions of pumping systems in actual oil production. Nowadays, the traditional supervised learning methods heavily rely on the classification accuracy of the training samples. In order to reduce the errors of manual classification, an automatic clustering algorithm is proposed and applied to diagnose down-hole conditions of pumping systems. The spectral clustering (SC) is a new clustering algorithm, which is suitable for any data distribution. However, it is sensitive to initial cluster centers and scale parameters, and needs to predefine the cluster number. In order to overcome these shortcom- ings, we propose an automatic clustering algorithm, fast black hole-spectral clustering (FBH-SC). The FBH algo- rithm is used to replace the K-mean method in SC, and a CritC index function is used as the target function to automatically choose the best scale parameter and clus- tering number in the clustering process. Different simulation experiments were designed to define the relationship among scale parameter, clustering number, CritC index value, and clustering accuracy. Finally, an example is given to validate the effectiveness of the proposed algorithm.展开更多
基金This work was supported by the National Natural Science Foundation of China(61903086,61903366,62001115)the Natural Science Foundation of Hunan Province(2019JJ50745,2020JJ4280,2021JJ40133)the Fundamentals and Basic of Applications Research Foundation of Guangdong Province(2019A1515110136).
文摘The observation error model of the underwater acous-tic positioning system is an important factor to influence the positioning accuracy of the underwater target.For the position inconsistency error caused by considering the underwater tar-get as a mass point,as well as the observation system error,the traditional error model best estimation trajectory(EMBET)with little observed data and too many parameters can lead to the ill-condition of the parameter model.In this paper,a multi-station fusion system error model based on the optimal polynomial con-straint is constructed,and the corresponding observation sys-tem error identification based on improved spectral clustering is designed.Firstly,the reduced parameter unified modeling for the underwater target position parameters and the system error is achieved through the polynomial optimization.Then a multi-sta-tion non-oriented graph network is established,which can address the problem of the inaccurate identification for the sys-tem errors.Moreover,the similarity matrix of the spectral cluster-ing is improved,and the iterative identification for the system errors based on the improved spectral clustering is proposed.Finally,the comprehensive measured data of long baseline lake test and sea test show that the proposed method can accu-rately identify the system errors,and moreover can improve the positioning accuracy for the underwater target positioning.
基金This work was supported by National Natural Science Foundation of China(Nos.U1562218,41604107,and 41804126).
文摘Traditional unsupervised seismic facies analysis techniques need to assume that seismic data obey mixed Gaussian distribution.However,fi eld seismic data may not meet this condition,thereby leading to wrong classifi cation in the application of this technology.This paper introduces a spectral clustering technique for unsupervised seismic facies analysis.This algorithm is based on on the idea of a graph to cluster the data.Its kem is that seismic data are regarded as points in space,points can be connected with the edge and construct to graphs.When the graphs are divided,the weights of the edges between the different subgraphs are as low as possible,whereas the weights of the inner edges of the subgraph should be as high as possible.That has high computational complexity and entails large memory consumption for spectral clustering algorithm.To solve the problem this paper introduces the idea of sparse representation into spectral clustering.Through the selection of a small number of local sparse representation points,the spectral clustering matrix of all sample points is approximately represented to reduce the cost of spectral clustering operation.Verifi cation of physical model and fi eld data shows that the proposed approach can obtain more accurate seismic facies classification results without considering the data meet any hypothesis.The computing efficiency of this new method is better than that of the conventional spectral clustering method,thereby meeting the application needs of fi eld seismic data.
基金supported by the National Natural Science Foundationof China(61272119)
文摘The similarity measure is crucial to the performance of spectral clustering. The Gaussian kernel function based on the Euclidean distance is usual y adopted as the similarity measure. However, the Euclidean distance measure cannot ful y reveal the complex distribution data, and the result of spectral clustering is very sensitive to the scaling parameter. To solve these problems, a new manifold distance measure and a novel simulated anneal-ing spectral clustering (SASC) algorithm based on the manifold distance measure are proposed. The simulated annealing based on genetic algorithm (SAGA), characterized by its rapid convergence to the global optimum, is used to cluster the sample points in the spectral mapping space. The proposed algorithm can not only reflect local and global consistency better, but also reduce the sensitivity of spectral clustering to the kernel parameter, which improves the algorithm’s clustering performance. To efficiently apply the algorithm to image segmentation, the Nystrom method is used to reduce the computation complexity. Experimental results show that compared with traditional clustering algorithms and those popular spectral clustering algorithms, the proposed algorithm can achieve better clustering performances on several synthetic datasets, texture images and real images.
文摘The defense techniques for machine learning are critical yet challenging due tothe number and type of attacks for widely applied machine learning algorithms aresignificantly increasing. Among these attacks, the poisoning attack, which disturbsmachine learning algorithms by injecting poisoning samples, is an attack with the greatestthreat. In this paper, we focus on analyzing the characteristics of positioning samples andpropose a novel sample evaluation method to defend against the poisoning attack cateringfor the characteristics of poisoning samples. To capture the intrinsic data characteristicsfrom heterogeneous aspects, we first evaluate training data by multiple criteria, each ofwhich is reformulated from a spectral clustering. Then, we integrate the multipleevaluation scores generated by the multiple criteria through the proposed multiplespectral clustering aggregation (MSCA) method. Finally, we use the unified score as theindicator of poisoning attack samples. Experimental results on intrusion detection datasets show that MSCA significantly outperforms the K-means outlier detection in terms ofdata legality evaluation and poisoning attack detection.
文摘Clustering is one of the most widely used techniques for exploratory data analysis. Spectral clustering algorithm, a popular modern cluslering algorithm, has been shown to be more effective in detecting clusters than many traditional algorithms. It has applications ranging from computer vision and information retrieval to social sienee and biology. With the size of databases soaring, cluostering algorithms bare saling computational time and memory use. In this paper, we propose a parallel spectral elustering implementation based on MapRednee. Both the computation and data storage are dislributed, which solves the sealability problems for most existing algorithms. We empirically analyze the proposed implementation on both benchmark net- works and a real social network dataset of about two million vertices and two billion edges crawled from Sina Weibo. It is shown that the proposed implementation scales well, speeds up the clustering without sacrificing quality, and processes massive datasets efficiently on commodity machine clusters.
基金Supported by the Fundamental Research Funds for the Central Universities in North China Electric Power University(11MG13)the Natural Science Foundation of Hebei Province(F2011502038)
文摘This paper proposes a novel phishing web image segmentation algorithm which based on improving spectral clustering.Firstly,we construct a set of points which are composed of spatial location pixels and gray levels from a given image.Secondly,the data is clustered in spectral space of the similar matrix of the set points,in order to avoid the drawbacks of K-means algorithm in the conventional spectral clustering method that is sensitive to initial clustering centroids and convergence to local optimal solution,we introduce the clone operator,Cauthy mutation to enlarge the scale of clustering centers,quantum-inspired evolutionary algorithm to find the global optimal clustering centroids.Compared with phishing web image segmentation based on K-means,experimental results show that the segmentation performance of our method gains much improvement.Moreover,our method can convergence to global optimal solution and is better in accuracy of phishing web segmentation.
基金Supported by Special Item of the Public Sector(Meteorological) Science Research(GYHY201106040)
文摘[ Objective] The research aimed to study assessment index system of the rainstorm disaster in Fujian Province based on spectral cluste- ring model with grey correlation analysis. [Method] According to meteorological disaster yearbook in Fujian Province, by comprehensively consider- ing disaster-inducing factor, disaster-inducing environment, disaster-sustaining body and regional disaster-prevention level, evaluation index system of the regional rainstorm disaster in Fujian was established. By spectral clustering model based on grey correlation analysis, dsk zoning of the rain- storm disaster was conducted in each area of Fujian. Finally, effect and application of the clustering model were analyzed by case research. [ Re- sult] In order to dig immanent connection among regional characteristics and improve disaster-preventing linkage performance of the evaluation unit, a spectral clustering model based on grey correlation analysis was used to conduct risk zoning of the rainstorm disaster in Fujian Province. Moreo- ver, combined weight was introduced to judge each evaluation index, so as to adjust clustering model. By case study, rainstorm disaster levels in 67 counties were obtained. Internal characteristics of each type were analyzed, and main correlation factors of each type were extracted. It was compared with statistical result of the rainstorm disaster, verifying validity and feasibility of the model. [ Conclusion] The method was feasible, and its evaluated result had better differentiation and decision accuracv.
文摘In the process of clothing image researching,how to segment the clothing quickly and accurately and retain the clothing style details as much as possible is the basis of subsequent image analysis.Spectral clustering clothing image segmentation algorithm is a common method in the process of clothing image extraction.However,the traditional model requires high computing power and is easily affected by the initial center of clustering.It often falls into local optimization.Aiming at the above two points,an improved spectral clustering clothing image segmentation algorithm is proposed in this paper.The Nystrom approximation strategy is introduced into the spectral mapping process to reduce the computational complexity.In the clustering stage,this algorithm uses the global optimization advantage of the particle swarm optimization algorithm and selects the sparrow search algorithm to search the optimal initial clustering point,to effectively avoid the occurrence of local optimization.In the end,the effectiveness of this algorithm is verified on clothing images in each environment.
基金supported by the National Natural Science Foundation of China (7083100170821061)
文摘A new fuzzy support vector machine algorithm with dual membership values based on spectral clustering method is pro- posed to overcome the shortcoming of the normal support vector machine algorithm, which divides the training datasets into two absolutely exclusive classes in the binary classification, ignoring the possibility of "overlapping" region between the two training classes. The proposed method handles sample "overlap" effi- ciently with spectral clustering, overcoming the disadvantages of over-fitting well, and improving the data mining efficiency greatly. Simulation provides clear evidences to the new method.
基金Supported by the National Natural Science Foundation of China (Grant No. 60375003)the Aeronatical Science Foundation of China (Grant No. 03I53059)
文摘This paper exposes some intrlnsic chsracterlstlca of the spectral clustering method by using the tools from the mstrlx perturbation theory. We construct s welght mstrix of s graph and study Its elgenvalues and elgenvectors. It shows that the number of clusters Is equal to the number of elgenvslues that are larger than 1, and the number of polnts In each of the clusters can be spproxlmsted by the associated elgenvslue. It also shows that the elgenvector of the weight rnatrlx can be used dlrectly to perform clusterlng; that Is, the dlrectlonsl angle between the two-row vectors of the mstrlx derlved from the elgenvectors Is s sultable distance measure for clustsrlng. As s result, an unsupervised spectral clusterlng slgorlthm based on welght mstrlx (USCAWM) Is developed. The experlmental results on s number of srtlficisl and real-world data sets show the correctness of the theoretical analysis.
基金supported by the National Key R&D Program (No.2017YFB0902901)the National Natural Science Foundation of China (No.51627811,No.51725702,and No.51707064)。
文摘Phasor measurement units(PMUs) can provide real-time measurement data to construct the ubiquitous electric of the Internet of Things. However, due to complex factors on site, PMU data can be easily compromised by interference or synchronization jitter. It will lead to various levels of PMU data quality issues, which can directly affect the PMU-based application and even threaten the safety of power systems. In order to improve the PMU data quality, a data-driven PMU bad data detection algorithm based on spectral clustering using single PMU data is proposed in this paper. The proposed algorithm does not require the system topology and parameters. Firstly, a data identification method based on a decision tree is proposed to distinguish event data and bad data by using the slope feature of each data. Then, a bad data detection method based on spectral clustering is developed. By analyzing the weighted relationships among all the data, this method can detect the bad data with a small deviation. Simulations and results of field recording data test illustrate that this data-driven method can achieve bad data identification and detection effectively. This technique can improve PMU data quality to guarantee its applications in the power systems.
基金National Natural Science Foundation of China (Grant No. 60375003)the Aeronautical Science Foundation of China (Grant No. 03I53059)
文摘This paper proposes a sampling based hierarchical approach for solving the computational demands of the spectral clustering methods when applied to the problem of image segmentation. The authors first define the distance between a pixel and a cluster, and then derive a new theorem to estimate the number of samples needed for clustering. Finally, by introducing a scale parameter into the similarity function, a novel spectral clustering based image segmentation method has been developed. An important characteristic of the approach is that in the course of image segmentation one needs not only to tune the scale parameter to merge the small size clusters or split the large size clusters but also take samples from the data set at the different scales. The multiscale and stochastic nature makes it feasible to apply the method to very large grouping problem. In addition, it also makes the segmentation compute in time that is linear in the size of the image. The experimental results on various synthetic and real world images show the effectiveness of the approach.
文摘An unsupervised learning algorithm, named soft spectral clustering ensemble (SSCE), is proposed in this paper. Until now many proposed ensemble algorithms cannot be used on image data, even images of a mere 256 × 256 pixels are too expensive in computational cost and storage. The proposed method is suitable for performing image segmentation and can, to some degree, solve some open problems of spectral clustering (SC). In this paper, a random scaling parameter and Nystrǒm approximation are applied to generate the individual spectral clusters for ensemble learning. We slightly modify the standard SC algorithm to aquire a soft partition and then map it via a centralized logcontrast transform to relax the constraint of probability data, the sum of which is one. All mapped data are concatenated to form the new features for each instance. Principal component analysis (PCA) is used to reduce the dimension of the new features. The final aggregated result can be achieved by clustering dimension-reduced data. Experimental results, on UCI data and different image types, show that the proposed algorithm is more efficient compared with some existing consensus functions.
文摘The spectral clustering method has notable advantages in segmentation.But the high computational complexity and time consuming limit its application in large-scale and dense airborne Light Detection and Ranging(LiDAR)point cloud data.We proposed the Nyström-based spectral clustering(NSC)algorithm to decrease the computational burden.This novel NSC method showed accurate and rapid in individual tree segmentation using point cloud data.The K-nearest neighbour-based sampling(KNNS)was proposed for the Nyström approximation of voxels to improve the efficiency.The NSC algorithm showed good performance for 32 plots in China and Europe.The overall matching rate and extraction rate of proposed algorithm reached 69%and 103%.For all trees located by Global Navigation Satellite System(GNSS)calibrated tape-measures,the tree height regression of the matching results showed an value of 0.88 and a relative root mean square error(RMSE)of 5.97%.For all trees located by GNSS calibrated total-station measures,the values were 0.89 and 4.49%.The method also showed good performance in a benchmark dataset with an improvement of 7%for the average matching rate.The results demonstrate that the proposed NSC algorithm provides an accurate individual tree segmentation and parameter estimation using airborne LiDAR point cloud data.
基金Supported by the National Natural Science Foundation of China (60661003)the Research Project Department of Education of Jiangxi Province (GJJ10566)
文摘In this paper, we explore a novel ensemble method for spectral clustering. In contrast to the traditional clustering ensemble methods that combine all the obtained clustering results, we propose the adaptive spectral clustering ensemble method to achieve a better clustering solution. This method can adaptively assess the number of the component members, which is not owned by many other algorithms. The component clusterings of the ensemble system are generated by spectral clustering (SC) which bears some good characteristics to engender the diverse committees. The selection process works by evaluating the generated component spectral clustering through resampling technique and population-based incremental learning algorithm (PBIL). Experimental results on UCI datasets demonstrate that the proposed algorithm can achieve better results compared with traditional clustering ensemble methods, especially when the number of component clusterings is large.
基金supported by the[National Key Research and Development Program]under Grant[number 2019YFE0126700][Shandong Provincial Natural Science Foundation]under Grant[number ZR2020QD018].
文摘Spectral clustering is a well-regarded subspace clustering algorithm that exhibits outstanding performance in hyperspectral image classification through eigenvalue decomposition of the Laplacian matrix.However,its classification accuracy is severely limited by the selected eigenvectors,and the commonly used eigenvectors not only fail to guarantee the inclusion of detailed discriminative information,but also have high computational complexity.To address these challenges,we proposed an intuitive eigenvector selection method based on the coincidence degree of data distribution(CDES).First,the clustering result of improved k-means,which can well reflect the spatial distribution of various types was used as the reference map.Then,the adjusted Rand index and adjusted mutual information were calculated to assess the data distribution consistency between each eigenvector and the reference map.Finally,the eigenvectors with high coincidence degrees were selected for clustering.A case study on hyperspectral mineral mapping demonstrated that the mapping accuracies of CDES are approximately 56.3%,15.5%,and 10.5%higher than those of the commonly used top,high entropy,and high relevance eigenvectors,and CDES can save more than 99%of the eigenvector selection time.Especially,due to the unsupervised nature of k-means,CDES provides a novel solution for autonomous feature selection of hyperspectral images.
基金Supported by the National Basic Research Program of China (973 Program) (Nos. 2005CB422300,2007CB411804,2010CB428904)the National Natural Science Foundation of China (Nos. 40976001,40940025,41006002)+2 种基金Tianjin Municipal Science and Technology Commission Project (No. 09JCYBJC07400)the "111 Project" (No.B07036)the Program for New Century Excellent Talents in University (No. NECT-07-0781)
文摘For understanding more about the water exchange between the Kuroshio and the East China Sea,We studied the variability of the Kuroshio in the East China Sea(ECS) in the period of 1991 to 2008 using a three-dimensional circulation model,and calculated Kuroshio onshore volume transport in the ECS at the minimum of 0.48 Sv(1 Sv ;106 m3/s) in summer and the maximum of 1.69 Sv in winter.Based on the data of WOA05 and NCEP,The modeled result indicates that the Kuroshio transport east of Taiwan Island decreased since 2000.Lateral movements tended to be stronger at two ends of the Kuroshio in the ECS than that of the middle segment.In addition,we applied a spectral mixture model(SMM) to determine the exchange zone between the Kuroshio and the shelf water of the ECS.The result reveals a significantly negative correlation(coefficient of-0.78) between the area of exchange zone and the Kuroshio onshore transport at 200 m isobath in the ECS.This conclusion brings a new view for the water exchange between the Kuroshio and the East China Sea.Additional to annual and semi-annual signals,intra-seasonal signal of probably the Pacific origin may trigger the events of Kuroshio intrusion and exchange in the ECS.
基金supported by the National Natural Science Foundation of China under Grants No.61100205,No.60873001the HiTech Research and Development Program of China under Grant No.2011AA010705the Fundamental Research Funds for the Central Universities under Grant No.2009RC0212
文摘Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with this problem,we propose two scenarios to extract meaningful strings based on document clustering and term clustering with multi-strategies to optimize a Vector Space Model(VSM) in order to improve webpage classification.The results show that document clustering work better than term clustering in coping with document content.However,a better overall performance is obtained by spectral clustering with document clustering.Moreover,owing to image existing in a same webpage with document content,the proposed method is also applied to extract image meaningful terms,and experiment results also show its effectiveness in improving webpage classification.
基金Thisre search was partially supported by the National Natural Science Foundation of China(grant 62076215)the Talent Introduction Project of Yancheng Institute of Technology under Grant No.XKR2011019.
文摘Clustering analysis plays a very important role in the field of data mining,image segmentation and pattern recognition.The method of cluster analysis is introduced to analyze NetEYun music data.In addition,different types of music data are clustered to find the commonness among the same kind of music.A music data-oriented clustering analysis method is proposed:Firstly,the audio beat period is calculated by reading the audio file data,and the emotional features of the audio are extracted;Secondly,the audio beat period is calculated by Fourier transform.Finally,a clustering algorithm is designed to obtain the clustering results of music data.
基金the National Natural Science Foundation of China (Grant No. 61403040)
文摘Dynamometer cards are commonly used to analyze down-hole working conditions of pumping systems in actual oil production. Nowadays, the traditional supervised learning methods heavily rely on the classification accuracy of the training samples. In order to reduce the errors of manual classification, an automatic clustering algorithm is proposed and applied to diagnose down-hole conditions of pumping systems. The spectral clustering (SC) is a new clustering algorithm, which is suitable for any data distribution. However, it is sensitive to initial cluster centers and scale parameters, and needs to predefine the cluster number. In order to overcome these shortcom- ings, we propose an automatic clustering algorithm, fast black hole-spectral clustering (FBH-SC). The FBH algo- rithm is used to replace the K-mean method in SC, and a CritC index function is used as the target function to automatically choose the best scale parameter and clus- tering number in the clustering process. Different simulation experiments were designed to define the relationship among scale parameter, clustering number, CritC index value, and clustering accuracy. Finally, an example is given to validate the effectiveness of the proposed algorithm.