摘要
聚类可以将结构相似的无标签数据分成不同的类。但是,现有的聚类算法无法让用户从直观上把握数据的分布情况,尤其是在高维空间中的分布情况。尽管维数约简的方法可以有效地将高维数据映射到低维空间便于用户理解,但是低维空间中数据点的重叠会影响可视化的效果。为了解决这一问题,提出了一种基于局部主方向的交互式聚类可视化方法。具体地,用户可以通过主方向上的频数直方图来理解和利用数据的统计特性,采用交互的方法收缩或拉伸点点距离,解决投影点的重叠问题。在人工数据集和真实数据集上进行了实验,实验结果表明,该方法可以有效地改善数据点在低维子空间中的可分性,为用户提供更好的可视化聚类效果。此外,该方法还能在保持良好聚类效果的同时,有效地减少降维算法的迭代次数,提升聚类分析效率。
Clustering can group unlabeled data into different cliques, each of which has a similar structure. However,the existing clustering algorithms cannot provide users with an intuitive impression to the data distribution, especially when data lie on a high-dimensional space. Although dimension reduction is helpful for this issue, the effect of lowdimensional visualization may suffer from data overlapping. This paper proposes an interactive clustering visualization method based on local principal directions of data to solve the problem. Specifically, dimension reduction method is adopted first to give users an initial visualization effect of data, then local principal direction and corresponding frequency histogram are calculated and presented, so that users can understand and utilize the statistical characteristics of the data by looking at the frequency histogram along the local principal direction, and interactively shrink or stretch out the distance between points to separate some seemingly accumulated data. Experiments on artificial and real-world datasets indicate that the proposed method effectively improves the separability of data points in the lowdimensional subspace, provides users with a better visual effect to carry out clustering analysis and further exploration. The method is also useful to reduce the iteration times of a widely-used dimension reduction algorithm yet maintain a competitive clustering performance.
作者
卢颖
张志豪
张军平
LU Ying1,ZHANG Zhihao1,2,ZHANG Junping1,2(1.School of Computer Science, Fudan University,Shanghai 200433, China; 2.Shanghai Key Laboratory of Intelligent Information Processing,Shanghai 200433, Chin)
出处
《计算机科学与探索》
CSCD
北大核心
2018年第6期859-871,共13页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金No.61673118
浦江人才计划基金No.16PJD009~~
关键词
局部主方向
可视聚类分析
可视分析
交互方法
local principal direction
visual clustering analysis
visual analysis
interactive method