摘要
基于时空划分的思想,设计概要数据结构的在线生成算法。概要数据结构保存流数据不同时刻的分布状态,以支持离线阶段的分类、聚类和关联规则发现等数据挖掘操作。研究时间粒度、量化向量调整和子区域索引等3项内存需求控制策略,以平衡概要数据结构的内存需求和内外存之间的I/O次数。
Based on the idea of time and space partitioning, this paper designs synopsis data structures which contains the distributed status of data stream to support different data mining tasks such as classifying, clustering and association rules discovery. Three kinds of measures are researched to control the potential huge requirement of memory caused by space partitioning, so that the synopsis' memory requirement and the number of I/O are balanced.
出处
《计算机工程》
CAS
CSCD
北大核心
2010年第7期61-62,65,共3页
Computer Engineering
基金
国家"863"计划基金资助项目(2007AA12Z226)
重庆自然科学基金资助项目(CSTC2007BB2446)
关键词
数据流
时空划分
概要数据结构
聚类
data stream
time and space partitioning
synopsis data structure
clustering