摘要
传感器的广泛应用产生了大量的不确定数据流,在聚类应用中,当输入数据为连续型随机变量时,现有基于离散型随机变量的聚类方法无法满足数据流应用在效率和精度上的要求.使用高斯混合模型作为不确定数据的基本表示形式,仅需要保存不同组件的描述信息即可,可以更好地利用存储空间,完成对真实情况的逼近,在此基础上提出了一种可以发现时间维度上的不确定数据流聚类方法cumicro,该算法将时间直接作为数据属性,可直接查询某个时间维度的聚簇,避免了传统基于划分的聚类中较难发现非球状聚簇的问题.通过实验与经典算法umicro进行比较,证明了本文算法的有效性,并分析了不同K值、τ值下的聚类结果.最后得出结论,原始数据较密集时,相较原有基于离散模型的聚类,该算法具有准确度上的优势.
With the sensors widely used,it brings a lot of uncertain data streams.When the input datas are continuously random variables,the existing clustering methods based on discrete random variables can not meet the requirements of efficiency and accuracy.In order to solve the problem mentioned above,a new method named cumicro algorithm is proposed.First,the Gaussian mixture model as the basic representation of uncertain data streams is used.Second,a clustering method which can find clustering in time dimension is proposed.This method can make up for the deficiency of traditional clustering which can't find the non-spherical clustering.Third,the influence of the different parameter values is discussed by experiment.Finally,the compared result shows that the proposed algorithm promotes the accuracy of clustering.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2014年第S2期102-109,共8页
Journal of Computer Research and Development
基金
国家科技支撑计划基金项目(2012BAH26B01)