期刊文献+

基于密度的K-means算法在轨迹数据聚类中的优化 被引量:8

Optimization of density-based K-means algorithm in trajectory data clustering
下载PDF
导出
摘要 针对传统的K-means算法无法预先明确聚类数目,对初始聚类中心选取敏感且易受离群孤点影响导致聚类结果稳定性和准确性欠佳的问题,提出一种改进的基于密度的K-means算法。该算法首先基于轨迹数据分布密度和增加轨迹数据关键点密度权值的方式选取高密度的轨迹数据点作为初始聚类中心进行K-means聚类,然后结合聚类有效函数类内类外划分指标对聚类结果进行评价,最后根据评价确定最佳聚类数目和最优聚类划分。理论研究与实验结果表明,该算法能够更好地提取轨迹关键点,保留关键路径信息,且与传统的K-means算法相比,聚类准确性提高了28个百分点,与具有噪声的基于密度的聚类算法相比,聚类准确性提高了17个百分点。所提算法在轨迹数据聚类中具有更好的稳定性和准确性。 Since the traditional K-means algorithm can hardly predefine the number of clusters, and performs sensitively to the initial clustering centers and outliers, which may result in unstable and inaccurate results, an improved density-based K- means algorithm was proposed. Firstly, high-density trajectory data points were selected as the initial clustering centers to perform K-means clustering by considering the density of the trajectory data distribution and increasing the weight of the density of important points. Secondly, the clustering results were evaluated by the Between-Within Proportion (BWP) index of cluster validity function. Finally, the optimal number of clusters and clustering were determined according to the clustering results evaluation. Theoretical researches and experimental results show that the improved algorithm can be better at extracting the trajectory key points and keeping the key path information. The accuracy of clustering resuhs was 28 percentage points higher than that of the traditional K-means algorithm and 17 percentage points higher than that of the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. The proposed algorithm has a better stability and a higher accuracy in trajectory data clustering.
出处 《计算机应用》 CSCD 北大核心 2017年第10期2946-2951,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(61571318)~~
关键词 K-MEANS算法 基于密度 车辆活动特征 密度权值 初始聚类中心 类内类外划分指标 K-means algorithm density-based characteristics of vehicle activity weight of density initial clusteringcenter Between-Within Proportion (BWP) index
  • 相关文献

参考文献5

二级参考文献144

  • 1陈小全,张继红.基于改进粒子群算法的聚类算法[J].计算机研究与发展,2012,49(S1):287-291. 被引量:31
  • 2袁方,周志勇,宋鑫.初始聚类中心优化的k-means算法[J].计算机工程,2007,33(3):65-66. 被引量:154
  • 3毛韶阳,李肯立.优化K-means初始聚类中心研究[J].计算机工程与应用,2007,43(22):179-181. 被引量:26
  • 4CALINSKI R,HARABASZ J.A dendrite method for cluster analysis[J].Communications in Statistics,1974,3(1):1 -27.
  • 5DAVIES D L,BOULDIN D W.A cluster separation measure[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1979,1(2):224-227.
  • 6DUDOIT S,FRIDLYAND J.A prediction-based resampling method for estimating the number of clusters in a dataset[J].Genome Biology,2002,3(7):1-21.
  • 7DIMITRIADOU E,DOLNICAR S,WEINGESSEL A.An examination of indexes for determining the number of cluster in binary data sets[J].Psychometrika,2002,67(1):137-160.
  • 8KAPP A V,TIBSHIRANI R.Are clusters found in one dataset present in another dataset?[J].Biostatistics,2007,8(1):9-31.
  • 9ROUSSEEUW P J.Silhouettes:a graphical aid to the interpretation and validation of cluster analysis[J].Journal of Computational and Applied Mathematics,1987,20(1):53 -65.
  • 10DEMB(E)L(E) D,KASTNER P.Fuzzy C-means method for clustering microarray data[J].Bioinformatics,2003,19(8):973-980.

共引文献274

同被引文献60

引证文献8

二级引证文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部