期刊文献+
共找到10篇文章
< 1 >
每页显示 20 50 100
指定类数下仿射传播聚类的快速算法 被引量:3
1
作者 王开军 郑捷 《计算机系统应用》 2010年第7期207-209,共3页
针对Science杂志上提出的仿射传播(Affinity propagation)聚类产生指定类数的聚类结果时效率较低的问题,提出了基于多网格策略的快速算法。该算法采用多网格搜索策略来减少调用仿射传播算法的次数,改进偏向参数的上界以缩小搜索范围。... 针对Science杂志上提出的仿射传播(Affinity propagation)聚类产生指定类数的聚类结果时效率较低的问题,提出了基于多网格策略的快速算法。该算法采用多网格搜索策略来减少调用仿射传播算法的次数,改进偏向参数的上界以缩小搜索范围。新方法大幅度地提高了仿射传播聚类在指定类数下的速度性能。实验结果表明新方法十分有效,在运行时间上比现有方法减少了22%-90%。 展开更多
关键词 快速 指定 AFFINITY PROPAGATION
下载PDF
基于局部集成和克隆选择的多目标聚类算法 被引量:1
2
作者 曹萌萌 郭晓磊 刘晓斐 《计算机工程与设计》 北大核心 2015年第8期2234-2238,共5页
多目标聚类过程中会产生一些明显不合理的解,影响最终划分结果以及聚类类数的判断。为此,提出一种基于局部集成和克隆选择的多目标聚类算法。在聚类过程中周期性的将聚类解集划分为若干邻域,对每个邻域进行局部集成操作,剔除各个类... 多目标聚类过程中会产生一些明显不合理的解,影响最终划分结果以及聚类类数的判断。为此,提出一种基于局部集成和克隆选择的多目标聚类算法。在聚类过程中周期性的将聚类解集划分为若干邻域,对每个邻域进行局部集成操作,剔除各个类数下的不舍理划分;利用克隆选择算法的思想构建3种变异算子,推动种群的进化,分别具有增大或减小当前解的聚类类数、调整当前解样本划分情况的功能。3组人工数据集以及3组UCI数据集的实验结果表明,该算法能够得到优于对比算法的聚类结果,准确判断出合理的聚类类数,判断类数的准确率可提高0%~46.67%。 展开更多
关键词 多目标 局部集成 克隆选择 聚类类数 种群进化
下载PDF
一种改进的K-means算法最佳聚类数确定方法 被引量:12
3
作者 边鹏 赵妍 苏玉召 《现代图书情报技术》 CSSCI 北大核心 2011年第9期34-40,共7页
对BWP方法进行研究,从嵌入式NSTL个性化推荐的文本聚类需求入手,分析BWP方法的不足,提出一种改进的K-means算法最佳聚类数确定方法。对单一样本类的类内距离计算方法进行优化,扩展BWP方法适用的聚类数范围,使原有局部最优的聚类数优化... 对BWP方法进行研究,从嵌入式NSTL个性化推荐的文本聚类需求入手,分析BWP方法的不足,提出一种改进的K-means算法最佳聚类数确定方法。对单一样本类的类内距离计算方法进行优化,扩展BWP方法适用的聚类数范围,使原有局部最优的聚类数优化为全局最优。实验结果可以验证该方法具有良好性能。 展开更多
关键词 K—means文本推荐系统
原文传递
谱聚类的扰动分析 被引量:33
4
作者 田铮 李小斌 句彦伟 《中国科学(E辑)》 CSCD 北大核心 2007年第4期527-543,共17页
以矩阵的扰动理论为工具对谱聚类(spectral clustering)进行了分析,通过引入图的权矩阵并对权矩阵的谱和特征向量进行分析,得到了权矩阵的谱与聚类的类数、权矩阵特征值的大小与每一类所含点的个数、以及权矩阵的特征向量与聚类之间的关... 以矩阵的扰动理论为工具对谱聚类(spectral clustering)进行了分析,通过引入图的权矩阵并对权矩阵的谱和特征向量进行分析,得到了权矩阵的谱与聚类的类数、权矩阵特征值的大小与每一类所含点的个数、以及权矩阵的特征向量与聚类之间的关系.据此,设计了一个基于权矩阵的无监督谱聚类算法(unsupervised spectral clustering algorithm based on weight matrix,简记为USCAWM),并在模拟点集和实际的数据集上进行了实验,实验结果肯定了理论分析的正确性. 展开更多
关键词 权矩阵 权矩阵的谱 基于权矩阵的无监督谱 算法
原文传递
Gaussian mixture models for clustering and classifying traffic flow in real-time for traffic operation and management 被引量:1
5
作者 孙璐 张惠民 +3 位作者 高荣 顾文钧 徐冰 陈鲤梁 《Journal of Southeast University(English Edition)》 EI CAS 2011年第2期174-179,共6页
Based on Gaussian mixture models(GMM), speed, flow and occupancy are used together in the cluster analysis of traffic flow data. Compared with other clustering and sorting techniques, as a structural model, the GMM ... Based on Gaussian mixture models(GMM), speed, flow and occupancy are used together in the cluster analysis of traffic flow data. Compared with other clustering and sorting techniques, as a structural model, the GMM is suitable for various kinds of traffic flow parameters. Gap statistics and domain knowledge of traffic flow are used to determine a proper number of clusters. The expectation-maximization (E-M) algorithm is used to estimate parameters of the GMM model. The clustered traffic flow pattems are then analyzed statistically and utilized for designing maximum likelihood classifiers for grouping real-time traffic flow data when new observations become available. Clustering analysis and pattern recognition can also be used to cluster and classify dynamic traffic flow patterns for freeway on-ramp and off-ramp weaving sections as well as for other facilities or things involving the concept of level of service, such as airports, parking lots, intersections, interrupted-flow pedestrian facilities, etc. 展开更多
关键词 traffic flow patterns Gaussian mixture model level of service data mining cluster analysis CLASSIFIER
下载PDF
Linear manifold clustering for high dimensional data based on line manifold searching and fusing 被引量:1
6
作者 黎刚果 王正志 +2 位作者 王晓敏 倪青山 强波 《Journal of Central South University》 SCIE EI CAS 2010年第5期1058-1069,共12页
High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this prob... High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this problem.The basic idea was to search the line manifold clusters hidden in datasets,and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters.The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure.The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time.The proposed method is able to obtain high clustering accuracy for various data sets with different sizes,manifold dimensions and noise ratios,which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data. 展开更多
关键词 linear manifold subspace clustering line manifold data mining data fusing clustering algorithm
下载PDF
Watershed classification by remote sensing indices: A fuzzy c-means clustering approach 被引量:10
7
作者 Bahram CHOUBIN Karim SOLAIMANI +1 位作者 Mahmoud HABIBNEJAD ROSHAN Arash MALEKIAN 《Journal of Mountain Science》 SCIE CSCD 2017年第10期2053-2063,共11页
Determining the relatively similar hydrological properties of the watersheds is very crucial in order to readily classify them for management practices such as flood and soil erosion control. This study aimed to ident... Determining the relatively similar hydrological properties of the watersheds is very crucial in order to readily classify them for management practices such as flood and soil erosion control. This study aimed to identify homogeneous hydrological watersheds using remote sensing data in western Iran. To achieve this goal, remote sensing indices including SAVI, LAI, NDMI, NDVI and snow cover, were extracted from MODIS data over the period 2000 to 2015. Then, a fuzzy method was used to clustering the watersheds based on the extracted indices. A fuzzy c-mean(FCM) algorithm enabled to classify 38 watersheds in three homogeneous groups.The optimal number of clusters was determined through evaluation of partition coefficient, partition entropy function and trial and error. The results indicated three homogeneous regions identified by the fuzzy c-mean clustering and remote sensing product which are consistent with the variations of topography and climate of the study area. Inherently,the grouped watersheds have similar hydrological properties and are likely to need similar management considerations and measures. 展开更多
关键词 Karkheh watershed Fuzzy c-means clustering Watershed classification Homogeneous sub-watersheds
下载PDF
A new clustering algorithm for large datasets 被引量:1
8
作者 李清峰 彭文峰 《Journal of Central South University》 SCIE EI CAS 2011年第3期823-829,共7页
The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between c... The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between clustering aggregation and the problem of correlation clustering.The best deterministic approximation algorithm was provided for the variation of the correlation of clustering problem,and showed how sampling can be used to scale the algorithms for large datasets.An extensive empirical evaluation was given for the usefulness of the problem and the solutions.The results show that this method achieves more than 50% reduction in the running time without sacrificing the quality of the clustering. 展开更多
关键词 data mining Circle algorithm clustering categorical data clustering aggregation
下载PDF
Interactive Protein Data Clustering
9
作者 Terje Kristensen Vemund Jakobsen 《Computer Technology and Application》 2011年第10期818-827,共10页
In this paper, the authors present three different algorithms for data clustering. These are Self-Organizing Map (SOM), Neural Gas (NG) and Fuzzy C-Means (FCM) algorithms. SOM and NG algorithms are based on comp... In this paper, the authors present three different algorithms for data clustering. These are Self-Organizing Map (SOM), Neural Gas (NG) and Fuzzy C-Means (FCM) algorithms. SOM and NG algorithms are based on competitive leaming. An important property of these algorithms is that they preserve the topological structure of data. This means that data that is close in input distribution is mapped to nearby locations in the network. The FCM algorithm is an algorithm based on soft clustering which means that the different clusters are not necessarily distinct, but may overlap. This clustering method may be very useful in many biological problems, for instance in genetics, where a gene may belong to different clusters. The different algorithms are compared in terms of their visualization of the clustering of proteomic data. 展开更多
关键词 DATAMINING self-organizing map neural gas fuzzy c-means algorithm and protein clustering.
下载PDF
A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA
10
作者 Ohn Mar San Van-Nam Huynh Yoshiteru Nakamori 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2003年第4期562-571,共10页
Most of the earlier work on clustering mainly focused on numeric data whoseinherent geometric properties can be exploited to naturally define distance functions between datapoints. However, data mining applications fr... Most of the earlier work on clustering mainly focused on numeric data whoseinherent geometric properties can be exploited to naturally define distance functions between datapoints. However, data mining applications frequently involve many datasets that also consists ofmixed numeric and categorical attributes. In this paper we present a clustering algorithm which isbased on the k-means algorithm. The algorithm clusters objects with numeric and categoricalattributes in a way similar to k-means. The object similarity measure is derived from both numericand categorical attributes. When applied to numeric data, the algorithm is identical to the k-means.The main result of this paper is to provide a method to update the 'cluster centers' of clusteringobjects described by mixed numeric and categorical attributes in the clustering process to minimizethe clustering cost function. The clustering performance of the algorithm is demonstrated with thetwo well known data sets, namely credit approval and abalone databases. 展开更多
关键词 cluster analysis numeric data categorical data k-means algorithm
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部