摘要
提出一种基于免疫原理、对不确定数据流进行聚类的算法——IUMicro.IUMicro针对不确定数据流上元组级不确定性问题,引入动态更新以适应数据变化的免疫模型,其中包括一种有效的在线收集数据流统计信息的B细胞特征结构及其更新策略.为兼顾元组存在概率与元组间的距离两方面因素,定义概率识别半径,为每个不断到达的数据元组找到合理的候选簇.离线聚类根据免疫细胞识别区域的空间关系,进行任意形状的无监督聚类.实验结果表明,IUMicro能有效抑制噪声,具有良好的聚类质量和较快的处理速度.
An algorithm based on immune principle, named IUMicro, is proposed to cluster uncertain data streams. IUMicro applies a dynamically updated immune model to adapt to the data streams. An effective B-cell feature vector and updating strategy are used to collect statistical information of data streams on line by this model. To choose the optimal candidate cluster for each increasing tuple in the data stream, IUMicro defines a probability radius of a B-cell' s recognition zone to address both uncertainty and distance metric. The offline clustering is an arbitrary-shape unsupervised clustering based on immune B-cells' spatial relationship between regions. The experimental results show that IUMicro effectively suppresses noise and gains better clustering quality at a high processing speed.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2012年第5期826-834,共9页
Pattern Recognition and Artificial Intelligence
基金
福建省自然科学基金项目(No.2010J01329)
福建省高校产学研合作科技重大项目(No.2010H6012)资助
关键词
免疫原理
不确定数据流
聚类
概率识别半径
Immune Principle, Uncertain Data Stream, Clustering, Probabilistic Radius of Recognition