Clustering in high-dimensional space is an important domain in data mining. It is the process of discovering groups in a high-dimensional dataset, in such way, that the similarity between the elements of the same clus...Clustering in high-dimensional space is an important domain in data mining. It is the process of discovering groups in a high-dimensional dataset, in such way, that the similarity between the elements of the same cluster is maximum and between different clusters is minimal. Many clustering algorithms are not applicable to high-dimensional space for its sparseness and decline properties. Dimensionality reduction is an effective method to solve this problem. The paper proposes a novel clustering algorithm CFSBC based on closed frequent itemsets derived from association rule mining, which can get the clustering attributes with high efficiency. The algorithm has several advantages. First, it deals effectively with the problem of dimensionality reduction. Second, it is applicable to different kinds of attributes. Third, it is suitable for very large data sets. Experiment shows that the proposed algorithm is effective and efficient. Key words clustering - closed frequent itemsets - association rule - clustering attributes CLC number TP 311 Foundation item: Supported by the National Natural Science Foundation of China (70371015)Biography: NI Wei-wei (1979-), male, Ph. D candidate, research direction: data mining and knowledge discovery.展开更多
文摘Clustering in high-dimensional space is an important domain in data mining. It is the process of discovering groups in a high-dimensional dataset, in such way, that the similarity between the elements of the same cluster is maximum and between different clusters is minimal. Many clustering algorithms are not applicable to high-dimensional space for its sparseness and decline properties. Dimensionality reduction is an effective method to solve this problem. The paper proposes a novel clustering algorithm CFSBC based on closed frequent itemsets derived from association rule mining, which can get the clustering attributes with high efficiency. The algorithm has several advantages. First, it deals effectively with the problem of dimensionality reduction. Second, it is applicable to different kinds of attributes. Third, it is suitable for very large data sets. Experiment shows that the proposed algorithm is effective and efficient. Key words clustering - closed frequent itemsets - association rule - clustering attributes CLC number TP 311 Foundation item: Supported by the National Natural Science Foundation of China (70371015)Biography: NI Wei-wei (1979-), male, Ph. D candidate, research direction: data mining and knowledge discovery.