期刊文献+

时间序列下超大规模数据流聚类方法研究 被引量:1

Research on Method of Super-large Scale Data Flow Clustering in Time Series
下载PDF
导出
摘要 研究超大规模数据流聚类效率优化问题。时间序列下的数据流中元素的访问是单次线性的,即数据元素只能按其流入顺序依次读取一次。传统的方法对时间序列数据流聚类时,只是在方向上按照时间序列数据流元素的顺序进行聚类,效率较低。提出了基于微簇距离加权和属性信息贡献度的时序数据流聚类算法。在该算法中,首先以时间滑块窗口为时间单位对数据流数据进行实时获取,在线阶段对其数据流信息进行微簇的实时生成,并根据微簇的更新和删除对微簇集合进行维护;然后在离线阶段对微簇样本数据集合,依据样本数据的属性信息贡献度及其与样本类别的距离加权,对微簇进行实时聚类。实验证明,改进算法具有较高的执行效率,较高的吞吐量,并有效的降低了内存负载。 The efficiency optimization problem of super large scale data flow clustering was studied in this pa per. This paper proposed a time series data flow clustering algorithm based on micro cluster distance weighting and the degree of attribute information contribution. In the algorithm, first of all, the time slider window was looked as the unit of time to obtain the data flow in real time. In online phase, the data flow information was generated to mi cro cluster in real time. And according to the update and deletion of micro cluster, micro cluster set was main tained. Then, in the offiine phase, micro cluster sample data were collected. Based on the attribute information con tribution degree of sample data and its distance weighting with the sample category, real time clustering for micro clusters wer carried out. Experiment proves that the improved algorithm has higher execution efficiency, higher throughput, and effectively reduces the memory load.
出处 《计算机仿真》 CSCD 北大核心 2014年第4期273-276,共4页 Computer Simulation
关键词 数据流 聚类 微簇 距离加权 属性信息贡献度 滑动窗口 Data flow Clustering Micro cluster Distance weighting Attribute information contribution degree Sliding window
  • 相关文献

参考文献8

二级参考文献96

共引文献121

同被引文献9

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部