摘要
传统特征选择算法在多维Web数据中由于其数据对象自身固有的稀缺性而常常失效。在典型多维Web数据挖掘应用中,不同数据对象集合对于不同维度集合而言可能聚类会更好,且在每个簇的具体子空间中维度数将可能非常大。事实上,为所有簇查找出单个的小维度集合是不可能的。本文应用映射簇的概念来明确簇与维度的关系,将聚类问题转化为映射簇问题,从而简化计算提高挖掘效率。最后给出相应的算法。
Traditional feature selection algorithms trends to break down in high dimensional Web spaces because of the (inherent) sparsity of the data object. In the typical high dimensional Web data mining applications different sets of points may cluster better for different subsets of dimensions and the number of dimensions in each such cluster-specific subspace may also vary. In fact, it may be impossible to find a single small subset of dimensions for all the clusters. So in the paper we use the concept of projected cluster to discuss the relation of cluster and its dimensions, and realize clustering in high (dimensional) data by solving the projected cluster problem. Finally, corresponding fast algorithm is developed based on (Projected Cluster.)
出处
《系统工程》
CSCD
北大核心
2004年第7期80-83,共4页
Systems Engineering
基金
国家自然科学基金委国家杰出青年科学基金资助项目(70125002)
关键词
多维Web数据
WEB数据挖掘
聚类
映射簇
High Dimensional Web Data
Web Data Mining
Clustering
Projected Cluster