摘要
为解决倾斜分布的数据流聚类这一难题,提出了时态密度概念,给出其度量,揭示了其包括可增量计算在内的一系列数学性质;设计了时态密度树结构,提高了聚类时的存储和检索效率;设计了能够以实时或异步方式捕捉数据倾斜分布的数据流时态特征的聚类算法TDCA(temporal density based clustering algorithm),其时间复杂度为O(c×m×lgm).实验结果表明,该算法不仅有较强的功能,而且具有较好的规模可伸缩性.
To solve the problem of clustering this paper proposes a concept of temporal density, which reveals a set of mathematical properties, especially the incremental computation. A clustering algorithm named TDCA (temporal density based clustering algorithm) with time complexity of O(c×m×lgm) is created with a tree structure implemented for both storage and retrieve efficiency. TDCA is capable of capturing the temporal features of a data stream with skew data distribution either in real time or on demand. The experimental results show that TDCA is functionable and scalable.
出处
《软件学报》
EI
CSCD
北大核心
2010年第5期1031-1041,共11页
Journal of Software
基金
国家自然科学基金No.600773169
国家"十一五"科技支撑计划No.2006BAI05A01~~
关键词
数据流聚类
时态密度
倾斜分布
data stream clustering
temporal density
skew distribution