期刊文献+
共找到11篇文章
< 1 >
每页显示 20 50 100
跳跃跟踪SSA交叉迭代AP聚类算法 被引量:1
1
作者 黄鹤 李文龙 +3 位作者 杨澜 王会峰 高涛 陈婷 《电子学报》 EI CAS CSCD 北大核心 2024年第3期977-990,共14页
针对传统近邻传播聚类算法以数据点对之间的相似度作为输入度量,由于需要预设偏向参数p和阻尼系数λ,算法精度无法精确控制的问题,提出了一种跳跃跟踪麻雀搜索算法优化的交叉迭代近邻传播聚类方法.首先,针对麻雀搜索算法中发现者和加入... 针对传统近邻传播聚类算法以数据点对之间的相似度作为输入度量,由于需要预设偏向参数p和阻尼系数λ,算法精度无法精确控制的问题,提出了一种跳跃跟踪麻雀搜索算法优化的交叉迭代近邻传播聚类方法.首先,针对麻雀搜索算法中发现者和加入者位置更新不足的问题,设计了一种跳跃跟踪优化策略,通过考虑偏好阻尼因子的跳跃策略设计大步长更新发现者,增加麻雀搜索算法的全局勘探能力和寻优速度,加入者设计动态小步长跟踪领头雀更新位置,同时,利用自适应种群划分机制更新发现者和加入者的比重,增加算法的后期局部开发能力和寻优速度;其次,设计基于扰动因子的Tent映射,在此基础上增加3个参数,使映射分布范围增大,并避免了陷入小周期点和不稳周期点;最后,引入轮廓系数作为评价函数,跳跃跟踪麻雀搜索算法自动寻找较优的p和λ,代替手动输入参数,并融合基于扰动因子的Tent映射优化近邻传播算法,交叉迭代确定最优簇数.使用多种算法聚类University of California Irvine数据集的10种公共数据集,仿真结果表明,本文提出的聚类算法与经典近邻传播算法、基于差分改进的仿射传播聚类算法、基于麻雀搜索算法优化的近邻传播聚类算法和进化近邻传播算法相比具有更优的搜索效率以及聚类精度.对国家信息数据进行了聚类分析,提出的方法更加准确有效合理,具有较好的应用价值. 展开更多
关键词 近邻传播 改进Tent映射 改进麻雀搜索算法 轮廓系数 聚类数据集
下载PDF
一种基于抽样改进加权核K-means的大数据谱聚类算法 被引量:7
2
作者 金海 张劲松 吴睿 《测绘通报》 CSCD 北大核心 2018年第11期78-82,共5页
经典谱聚类将数据聚类转化为加权图划分问题,在分析Normalized Cut目标函数与加权核K-means函数等价基础上,设计了一种基于抽样改进加权核K-means算法的大规模数据谱聚类算法。算法通过Leaders进行初始聚类预处理,以控制后续随机抽样的... 经典谱聚类将数据聚类转化为加权图划分问题,在分析Normalized Cut目标函数与加权核K-means函数等价基础上,设计了一种基于抽样改进加权核K-means算法的大规模数据谱聚类算法。算法通过Leaders进行初始聚类预处理,以控制后续随机抽样的数据规模及对原始数据类别的覆盖,通过抽样子集内加权核K-means迭代优化,避免Laplacian矩阵特征分解的大量资源占用,从而以部分核矩阵的使用避免全部核矩的时间、空间复杂度。试验结果表明,改进算法在保持与经典算法相近聚类精度基础上,大幅提高了聚类效率。 展开更多
关键词 大规模数据 加权核K-means算法 数据抽样 核矩阵
下载PDF
半监督的仿射传播聚类 被引量:29
3
作者 王开军 李健 +1 位作者 张军英 涂重阳 《计算机工程》 CAS CSCD 北大核心 2007年第23期197-198,201,共3页
仿射传播聚类算法快速、有效,可以解决大数据集的聚类问题,但当数据的聚类结构比较松散时,聚类准确性不高。该文提出了半监督的仿射传播聚类算法,在迭代过程中嵌入了有效性指标以监督和引导算法向最优聚类结果的方向运行。实验结果表明... 仿射传播聚类算法快速、有效,可以解决大数据集的聚类问题,但当数据的聚类结构比较松散时,聚类准确性不高。该文提出了半监督的仿射传播聚类算法,在迭代过程中嵌入了有效性指标以监督和引导算法向最优聚类结果的方向运行。实验结果表明,该方法对于聚类结构比较紧密和松散的数据集,均可以给出较为准确的聚类结果。 展开更多
关键词 仿射传播 半监督 数据算法
下载PDF
Linear manifold clustering for high dimensional data based on line manifold searching and fusing 被引量:1
4
作者 黎刚果 王正志 +2 位作者 王晓敏 倪青山 强波 《Journal of Central South University》 SCIE EI CAS 2010年第5期1058-1069,共12页
High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this prob... High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this problem.The basic idea was to search the line manifold clusters hidden in datasets,and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters.The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure.The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time.The proposed method is able to obtain high clustering accuracy for various data sets with different sizes,manifold dimensions and noise ratios,which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data. 展开更多
关键词 linear manifold subspace clustering line manifold data mining data fusing clustering algorithm
下载PDF
A new clustering algorithm for large datasets 被引量:1
5
作者 李清峰 彭文峰 《Journal of Central South University》 SCIE EI CAS 2011年第3期823-829,共7页
The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between c... The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between clustering aggregation and the problem of correlation clustering.The best deterministic approximation algorithm was provided for the variation of the correlation of clustering problem,and showed how sampling can be used to scale the algorithms for large datasets.An extensive empirical evaluation was given for the usefulness of the problem and the solutions.The results show that this method achieves more than 50% reduction in the running time without sacrificing the quality of the clustering. 展开更多
关键词 data mining Circle algorithm clustering categorical data clustering aggregation
下载PDF
A FUZZY CLOPE ALGORITHM AND ITS OPTIMAL PARAMETER CHOICE 被引量:1
6
作者 Li Jie Gao Xinbo Jiao Licheng 《Journal of Electronics(China)》 2006年第3期384-388,共5页
Among the available clustering algorithms in data mining, the CLOPE algorithm attracts much more attention with its high speed and good performance. However, the proper choice of some parameters in the CLOPE algorithm... Among the available clustering algorithms in data mining, the CLOPE algorithm attracts much more attention with its high speed and good performance. However, the proper choice of some parameters in the CLOPE algorithm directly affects the validity of the clustering results, which is still an open issue. For this purpose, this paper proposes a fuzzy CLOPE algorithm, and presents a method for the optimal parameter choice by defining a modified partition fuzzy degree as a clustering validity function. The experimental results with real data set illustrate the effectiveness of the proposed fuzzy CLOPE algorithm and optimal parameter choice method based on the modified partition fuzzy degree. 展开更多
关键词 Data mining Cluster analysis Cluster validity Categorical attributes Optimal parameter choice
下载PDF
CLUSTERING VALIDITY BASED ON THE IMPROVED S_DBW INDEX 被引量:1
7
作者 Tong Jianhua Tan Hongzhou 《Journal of Electronics(China)》 2009年第2期258-264,共7页
For many clustering algorithms,it is very important to determine an appropriate number of clusters,which is called cluster validity problem.In this paper,a new clustering validity assessment index is proposed based on... For many clustering algorithms,it is very important to determine an appropriate number of clusters,which is called cluster validity problem.In this paper,a new clustering validity assessment index is proposed based on a novel method to select the margin point between two clusters for in-ter-cluster similarity more accurately,and provides an improved scatter function for intra-cluster similarity.Simulation results show the effectiveness of the proposed index on the data sets under consideration regardless of the choice of a clustering algorithm. 展开更多
关键词 Clustering validity Inter-cluster similarity Intra-cluster similarity
下载PDF
Out-of-core clustering of volumetric datasets
8
作者 GRANBERG Carl J. 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2006年第7期1134-1140,共7页
In this paper we present a novel method for dividing and clustering large volumetric scalar out-of-core datasets. This work is based on the Ordered Cluster Binary Tree (OCBT) structure created using a top-down or divi... In this paper we present a novel method for dividing and clustering large volumetric scalar out-of-core datasets. This work is based on the Ordered Cluster Binary Tree (OCBT) structure created using a top-down or divisive clustering method. The OCBT structure allows fast and efficient sub volume queries to be made in combination with level of detail (LOD) queries of the tree. The initial partitioning of the large out-of-core dataset is done by using non-axis aligned planes calculated using Principal Component Analysis (PCA). A hybrid OCBT structure is also proposed where an in-core cluster binary tree is combined with a large out-of-core file. 展开更多
关键词 Out-of-core clustering Hybrid rendering Scientific visualization
下载PDF
Clustering: from Clusters to Knowledge
9
作者 Peter Grabusts 《Computer Technology and Application》 2013年第6期284-290,共7页
Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities... Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes. 展开更多
关键词 Data analysis clustering algorithms K-MEANS fuzzy C-means rule extraction.
下载PDF
A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA
10
作者 Ohn Mar San Van-Nam Huynh Yoshiteru Nakamori 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2003年第4期562-571,共10页
Most of the earlier work on clustering mainly focused on numeric data whoseinherent geometric properties can be exploited to naturally define distance functions between datapoints. However, data mining applications fr... Most of the earlier work on clustering mainly focused on numeric data whoseinherent geometric properties can be exploited to naturally define distance functions between datapoints. However, data mining applications frequently involve many datasets that also consists ofmixed numeric and categorical attributes. In this paper we present a clustering algorithm which isbased on the k-means algorithm. The algorithm clusters objects with numeric and categoricalattributes in a way similar to k-means. The object similarity measure is derived from both numericand categorical attributes. When applied to numeric data, the algorithm is identical to the k-means.The main result of this paper is to provide a method to update the 'cluster centers' of clusteringobjects described by mixed numeric and categorical attributes in the clustering process to minimizethe clustering cost function. The clustering performance of the algorithm is demonstrated with thetwo well known data sets, namely credit approval and abalone databases. 展开更多
关键词 cluster analysis numeric data categorical data k-means algorithm
原文传递
Integrating OWA and Data Mining for Analyzing Customers Churn in E-Commerce 被引量:1
11
作者 CAO Jie YU Xiaobing ZHANG Zhifei 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2015年第2期381-392,共12页
Customers are of great importance to E-commerce in intense competition.It is known that twenty percent customers produce eighty percent profiles.Thus,how to find these customers is very critical.Customer lifetime valu... Customers are of great importance to E-commerce in intense competition.It is known that twenty percent customers produce eighty percent profiles.Thus,how to find these customers is very critical.Customer lifetime value(CLV) is presented to evaluate customers in terms of recency,frequency and monetary(RFM) variables.A novel model is proposed to analyze customers purchase data and RFM variables based on ordered weighting averaging(OWA) and K-Means cluster algorithm.OWA is employed to determine the weights of RFM variables in evaluating customer lifetime value or loyalty.K-Means algorithm is used to cluster customers according to RFM values.Churn customers could be found out by comparing RFM values of every cluster group with average RFM.Questionnaire is conducted to investigate which reasons cause customers dissatisfaction.Rank these reasons to help E-commerce improve services.The experimental results have demonstrated that the model is effective and reasonable. 展开更多
关键词 Customer life value E-COMMERCE K-MEANS OWA.
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部