摘要
随着信息技术的飞速发展和大数据时代的来临,数据呈现出高维性、非线性等复杂特征。对于高维数据来说,在全维空间上往往很难找到反映分布模式的特征区域,而大多数传统聚类算法仅对低维数据具有良好的扩展性。因此,传统聚类算法在处理高维数据的时候,产生的聚类结果可能无法满足现阶段的需求。而子空间聚类算法搜索存在于高维数据子空间中的簇,将数据的原始特征空间分为不同的特征子集,减少不相关特征的影响,保留原数据中的主要特征。通过子空间聚类方法可以发现高维数据中不易展现的信息,并通过可视化技术展现数据属性和维度的内在结构,为高维数据可视分析提供了有效手段。总结了近年来基于子空间聚类的高维数据可视分析方法研究进展,从基于特征选择、基于子空间探索、基于子空间聚类的3种不同方法进行阐述,并对其交互分析方法和应用进行分析,同时对高维数据可视分析方法的未来发展趋势进行了展望。
With the rapid development of information technology and the advent of big data era, the data show the complex features of high dimensionality and nonlinearity. For high-dimensional data, it is often difficult to find feature regions that reflect distribution patterns in full-dimensional space, but most of the traditional clustering algorithms only have good scalability for low-dimensional data. Therefore, when the traditional clustering algorithm processes high-dimensional data,the clustering results may not meet the needs of the current stage. The subspace clustering algorithm searches for clusters existing in the high-dimensional data subspace, and divides the original feature space of data into different subsets of features to reduce the influence of uncorrelated features and preserve the main features in the original data. The subspace clustering method can find the information that is not easy to show in high-dimensional data and display the internal structure of data attributes and dimensions through visualization techniques, which provides an effective method for visual analysis of high-dimensional data. This paper summarizes the research progress of high-dimensional data visual analysis methods based on subspace clustering in recent years, and elaborates three different methods based on feature selection,subspace exploration and subspace clustering. Then, the methods and applications of its interaction analysis are analyzed,and the future development trends of visual analysis methods of high-dimensional data are prospected.
作者
田帅
陈谊
TIAN Shuai;CHEN Yi(Beijing Key Laboratory of Big Data Technology for Food Safety,School of Computer and Information Engineering,Beijing Technology and Business University,Beijing 100048,China)
出处
《计算机工程与应用》
CSCD
北大核心
2018年第13期19-26,共8页
Computer Engineering and Applications
基金
"十二五"国家科技支撑计划(No.2012BAD29B01-2)
国家科技基础性工作专项(No.2015FY111200)
北京市科技计划课题(No.Z151100001615041)
虚拟现实技术与系统国家重点实验室开放基金(No.BUAA-VR-17KF-07)
2018年研究生科研能力提升计划项目
关键词
高维数据
可视分析
子空间探索
子空间聚类
high dimensional data
visual analysis
subspace exploration
subspace clustering