期刊文献+

一种基于高斯混合模型的不确定数据流聚类方法 被引量:6

A Method for Clustering Uncertain Data Streams Based on GMM
下载PDF
导出
摘要 传感器的广泛应用产生了大量的不确定数据流,在聚类应用中,当输入数据为连续型随机变量时,现有基于离散型随机变量的聚类方法无法满足数据流应用在效率和精度上的要求.使用高斯混合模型作为不确定数据的基本表示形式,仅需要保存不同组件的描述信息即可,可以更好地利用存储空间,完成对真实情况的逼近,在此基础上提出了一种可以发现时间维度上的不确定数据流聚类方法cumicro,该算法将时间直接作为数据属性,可直接查询某个时间维度的聚簇,避免了传统基于划分的聚类中较难发现非球状聚簇的问题.通过实验与经典算法umicro进行比较,证明了本文算法的有效性,并分析了不同K值、τ值下的聚类结果.最后得出结论,原始数据较密集时,相较原有基于离散模型的聚类,该算法具有准确度上的优势. With the sensors widely used,it brings a lot of uncertain data streams.When the input datas are continuously random variables,the existing clustering methods based on discrete random variables can not meet the requirements of efficiency and accuracy.In order to solve the problem mentioned above,a new method named cumicro algorithm is proposed.First,the Gaussian mixture model as the basic representation of uncertain data streams is used.Second,a clustering method which can find clustering in time dimension is proposed.This method can make up for the deficiency of traditional clustering which can't find the non-spherical clustering.Third,the influence of the different parameter values is discussed by experiment.Finally,the compared result shows that the proposed algorithm promotes the accuracy of clustering.
出处 《计算机研究与发展》 EI CSCD 北大核心 2014年第S2期102-109,共8页 Journal of Computer Research and Development
基金 国家科技支撑计划基金项目(2012BAH26B01)
关键词 高斯混合模型 不确定数据流 聚类 大数据 概要结构 Gaussian mixture model uncertain data streams clustering big data synopsis
  • 相关文献

参考文献15

二级参考文献61

  • 1张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量:176
  • 2金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 3Keogh E,Kasetty S.On the need for time series data mining benchmarks:A survey and empirical demonstration.Data Mining and Knowledge Discovery,2003,7(4):349-371.[doi:10.1023/A:1024988512476].
  • 4Guha S,Meyerson A,Mishra N,Motwani R,O'Callaghan L.Clustering data streams:Theory and practice.IEEE Trans.on Knowledge and Data Engineering,2003,15(3):515-528.[doi:10.1109/TKDE.2003.1198387].
  • 5Aggarwal CC,Han J,Wang J,Yu PS.A framework for clustering evolving data streams.In:Johann CF,Peter CL,Serge A,Michael JC,Patricia GS,Andreas H,eds.Proc.of the 29th Int'l Conf on Very Large Data Base.San Francisco:Morgan Kaufmann Publishers,2003.81-92.
  • 6Charikar M,O'Callaghan L,Panigrahy R.Better streaming algorithms for clustering problems.In:Proc.of 35th ACM Symp.on Theory of Computing.New York:ACM Press,2003.30-39.http://doi.acm.org/10.1145/780542.780548.
  • 7Beringer J,Hullermeier E.Online clustering of parallel data streams.Data & Knowledge Engineering,2006,58(2):180-204.[doi:10.1016/j.datak.2005.05.009].
  • 8Matias Y,Vitter JS,Wang M.Wavelet-Based histograms for selectivity estimation.In:Tiwary A,Franklin M,eds.Proc.of the 1998 ACM SIGMOD Int'l Conf.on Management of Data.New York:ACM Press,1998.448-459.
  • 9Boggess A,Narcowich FJ,Wrote; Rui GS,et al.,Trans.A First Course in Wavelets with Fourier Analysis.Beijing:Publishing House of Electronics Industry,2004 (in Chinese).
  • 10Gilbert AC,Kotidis Y,Muthukrishnan S,Strauss M.One-Pass wavelet decompositions of data streams.IEEE Trans.on Knowledge and Data Engineering,2003,15(3):541-554.[doi:10.1109/TKDE.2003.1198389].

共引文献113

同被引文献60

引证文献6

二级引证文献39

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部