期刊文献+

基于欧拉核的数据流聚类算法 被引量:5

Euler Kernel-based Data Stream Clustering Algorithm
下载PDF
导出
摘要 随着云计算、物联网的快速发展,数据采集变得更加快捷和自动化。许多新型的应用领域中,诸如实时监控系统、车辆交通监控系统、电力消耗记录以及网络流量监控等,每时每刻都在产生大量的流数据,对数据流挖掘的研究成为了热点问题。聚类分析作为数据流挖掘领域的一个重要问题,在近期被高度重视并得到广泛研究。不同于传统的静态数据聚类问题,数据流聚类受到有限内存、一遍扫描、实时响应和概念漂移等许多约束。为此,文中基于欧拉核提出了一种针对数据流的聚类算法。首先通过欧拉核显式地将数据映射到相同维度的复数特征空间,然后在特征空间中基于GNG模型进行聚类。欧拉核依赖于非线性鲁棒的cosine度量,故对野值低敏感;显式的映射避免了一般的核聚类算法需要使用核技巧而无法处理数据流的问题。实验数据表明,基于欧拉核的数据流聚类算法不仅表现出了较好的聚类性能,还识别了数据的结构信息。 With the advance of both cloud computing and internet of things,many applications generate huge amounts of data streams at fast speed.Examples include real-time surveillance systems,vehicle traffic monitoring systems,electricity consumption recording,and network traffic monitoring.Data stream mining has become a hot research topic.Its goal is to extract hidden knowledge/patterns from continuous data streams.Clustering,one of the most important problems in stream mining,has been highly explored.Different from traditional data clustering algorithm where given datasets are generally static and can be repeatedly read and processed,clustering data streams face more challenges due to having to satisfy such constraints as bounded memory,single-pass,real-time response and concept-drift detection.This paper pre-sented a new clustering algorithm for data streams,called EG-Stream,by combining the Euler kernel method with the Growing Neural Gas(GNG)model.It can not only maintain the benefit of nonlinear modeling using kernel function,but also significantly solve the large scale computational problem in kernel-based clustering.Euler kernel is relying on a nonlinear and robust cosine metric that is less sensitive to outliers.More important,it intrinsically induces an empirical map which maps data onto a complex space of the same dimension,and it takes these advantages to measure the similari-ty between data in a robust way without increasing the dimensionality of data,which avoids the problem that other kernel clustering algorithms can not deal with data streams.Although this method is embarrassingly simple just by incorporating the Euler kernel into GNG,the experimental results on variety of UCI datasets indicate that this method can still achieve comparable or even better performance than G-Stream algorithm,and identify the structural information from stream data.
作者 朱颖雯 杨君 ZHU Ying-wen;YANG Jun(School of Computer Science and Engineering,Sanjiang University,Nanjing 210012,China;College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China)
出处 《计算机科学》 CSCD 北大核心 2019年第12期74-82,共9页 Computer Science
基金 江苏省三江学院校科研项目(2018SJKY026) 江苏省普通高校自然科学研究资助项目(17KJD520007) 江苏省高等学校自然科学研究面上项目(18KJB520042)资助
关键词 GNG 数据流聚类 欧拉核 核方法 GNG Data stream clustering Euler kernel Kernel method
  • 相关文献

同被引文献24

引证文献5

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部