摘要
流转数据是一种重要的数据类别,其中蕴含了较为丰富的规律性知识。如何通过数据可视化技术挖掘分析其中的知识具有重要意义。为此,提出一种基于平行坐标系的流转数据可视化方法,定义了流转数据可视化模型,将可视化过程抽象为流转数据集、矩阵模型、平行坐标系可视结构三个主要模型。流转数据集是可视化的数据对象,矩阵模型是可视化的内部表示,平行坐标系结构是可视化的图形元素,并通过转换算法实现三个模型的相互映射。此外,为解决海量流转数据可视化性能瓶颈与折线重叠问题,实现了基于Spark框架的并行处理算法。该算法采用K-Means聚类思想对流转数据聚类,增强了平行坐标系的视觉效果。实验证明,所提出的可视化方法能够真实有效地可视化流转数据,同时也适用于对海量流转数据集的可视化。
Flow data is an important data category,which contains abundant regular knowledge.A key problem is how to use data visualization technology to analyze the flow data.In this paper,a visualization method for flow data based on parallel coordinates is proposed.A general visualization model for flow data was defined.The visualization process was abstracted into three main models:flow data set,matrix model and visual system of parallel coordinate system.The flow data set was a visual data object,the matrix model was a visual internal representation,and the parallel coordinate system was a visual graphical element.The three models were mapped each other by transformation algorithms.In addition,in order to handle massive flow data,a parallel processing algorithm using Spark framework was implemented.This algorithm adopted the K-means clustering principle to enhance the visual effect of the parallel coordinates.The experiments showed that the proposed method could effectively visualize the flow data and was also applicable to massive flow data set.
作者
张元鸣
高亚琳
蒋建波
陆佳炜
徐俊
肖刚
Zhang Yuanming;Gao Yalin;Jiang Jianbo;Lu Jiawei;Xu Jun ;Xiao Gang(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,Zhejiang,China)
出处
《计算机应用与软件》
北大核心
2018年第4期55-60,116,共7页
Computer Applications and Software
基金
浙江省公益性技术项目(2017C31014)
浙江省重大科技专项项目(2014C01048)