Clustering in high-dimensional space is an important domain in data mining. It is the process of discovering groups in a high-dimensional dataset, in such way, that the similarity between the elements of the same clus...Clustering in high-dimensional space is an important domain in data mining. It is the process of discovering groups in a high-dimensional dataset, in such way, that the similarity between the elements of the same cluster is maximum and between different clusters is minimal. Many clustering algorithms are not applicable to high-dimensional space for its sparseness and decline properties. Dimensionality reduction is an effective method to solve this problem. The paper proposes a novel clustering algorithm CFSBC based on closed frequent itemsets derived from association rule mining, which can get the clustering attributes with high efficiency. The algorithm has several advantages. First, it deals effectively with the problem of dimensionality reduction. Second, it is applicable to different kinds of attributes. Third, it is suitable for very large data sets. Experiment shows that the proposed algorithm is effective and efficient. Key words clustering - closed frequent itemsets - association rule - clustering attributes CLC number TP 311 Foundation item: Supported by the National Natural Science Foundation of China (70371015)Biography: NI Wei-wei (1979-), male, Ph. D candidate, research direction: data mining and knowledge discovery.展开更多
Cloud computing has developed as an important information technology paradigm which can provide on-demand services. Meanwhile,its energy consumption problem has attracted a grow-ing attention both from academic and in...Cloud computing has developed as an important information technology paradigm which can provide on-demand services. Meanwhile,its energy consumption problem has attracted a grow-ing attention both from academic and industrial communities. In this paper,from the perspective of cloud tasks,the relationship between cloud tasks and cloud platform energy consumption is established and analyzed on the basis of the multidimensional attributes of cloud tasks. Furthermore,a three-way clustering algorithm of cloud tasks is proposed for saving energy. In the algorithm,f irst,t he cloud tasks are classified into three categories according to the content properties of the cloud tasks and resources respectively. Next,cloud tasks and cloud resources are clustered according to their computation characteristics( e. g. computation-intensive,data-intensive). Subsequently,greedy scheduling is performed. The simulation results showthat the proposed algorithm can significantly reduce the energy cost and improve resources utilization,compared with the general greedy scheduling algorithm.展开更多
Many recently proposed subspace clustering methods suffer from two severe problems.First,the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters.Second,the ...Many recently proposed subspace clustering methods suffer from two severe problems.First,the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters.Second,the clustering results are often sensitive to input parameters.In this paper,a fast algorithm of subspace clustering using attribute clustering is proposed to overcome these limitations.This algorithm first filters out redundant attributes by computing the Gini coef-ficient.To evaluate the correlation of every two non-redundant attributes,the relation matrix of non-redund-ant attributes is constructed based on the relation function of two dimensional united Gini coefficients.After applying an overlapping clustering algorithm on the relation matrix,the candidate of all interesting subspaces is achieved.Finally,all subspace clusters can be derived by clustering on interesting subspaces.Experiments on both synthesis and real datasets show that the new algorithm not only achieves a significant gain of runtime and quality to find subspace clusters,but also is insensitive to input parameters.展开更多
文摘Clustering in high-dimensional space is an important domain in data mining. It is the process of discovering groups in a high-dimensional dataset, in such way, that the similarity between the elements of the same cluster is maximum and between different clusters is minimal. Many clustering algorithms are not applicable to high-dimensional space for its sparseness and decline properties. Dimensionality reduction is an effective method to solve this problem. The paper proposes a novel clustering algorithm CFSBC based on closed frequent itemsets derived from association rule mining, which can get the clustering attributes with high efficiency. The algorithm has several advantages. First, it deals effectively with the problem of dimensionality reduction. Second, it is applicable to different kinds of attributes. Third, it is suitable for very large data sets. Experiment shows that the proposed algorithm is effective and efficient. Key words clustering - closed frequent itemsets - association rule - clustering attributes CLC number TP 311 Foundation item: Supported by the National Natural Science Foundation of China (70371015)Biography: NI Wei-wei (1979-), male, Ph. D candidate, research direction: data mining and knowledge discovery.
基金Supported by the Harbin Technology Bureau Youth Talented Project(2014RFQXJ073)China Postdoctoral Fund Projects(2014M561330)
文摘Cloud computing has developed as an important information technology paradigm which can provide on-demand services. Meanwhile,its energy consumption problem has attracted a grow-ing attention both from academic and industrial communities. In this paper,from the perspective of cloud tasks,the relationship between cloud tasks and cloud platform energy consumption is established and analyzed on the basis of the multidimensional attributes of cloud tasks. Furthermore,a three-way clustering algorithm of cloud tasks is proposed for saving energy. In the algorithm,f irst,t he cloud tasks are classified into three categories according to the content properties of the cloud tasks and resources respectively. Next,cloud tasks and cloud resources are clustered according to their computation characteristics( e. g. computation-intensive,data-intensive). Subsequently,greedy scheduling is performed. The simulation results showthat the proposed algorithm can significantly reduce the energy cost and improve resources utilization,compared with the general greedy scheduling algorithm.
基金This work was supported by the National Basic Research Program of China(No.2007CB307100)the National Natural Science Foundation of China(Grant No.60432010).
文摘Many recently proposed subspace clustering methods suffer from two severe problems.First,the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters.Second,the clustering results are often sensitive to input parameters.In this paper,a fast algorithm of subspace clustering using attribute clustering is proposed to overcome these limitations.This algorithm first filters out redundant attributes by computing the Gini coef-ficient.To evaluate the correlation of every two non-redundant attributes,the relation matrix of non-redund-ant attributes is constructed based on the relation function of two dimensional united Gini coefficients.After applying an overlapping clustering algorithm on the relation matrix,the candidate of all interesting subspaces is achieved.Finally,all subspace clusters can be derived by clustering on interesting subspaces.Experiments on both synthesis and real datasets show that the new algorithm not only achieves a significant gain of runtime and quality to find subspace clusters,but also is insensitive to input parameters.