One of the most important problems of clustering is to define the number of classes. In fact, it is not easy to find an appropriate method to measure whether the cluster configuration is acceptable or not. In this pap...One of the most important problems of clustering is to define the number of classes. In fact, it is not easy to find an appropriate method to measure whether the cluster configuration is acceptable or not. In this paper we propose a possible and non-automatic solution considering different criteria of clustering and comparing their results. In this way robust structures of an analyzed dataset can be often caught (or established) and an optimal cluster configuration, which presents a meaningful association, may be defined. In particular, we also focus on the variables which may be used in cluster analysis. In fact, variables which contain little clustering information can cause misleading and not-robustness results. Therefore, three algorithms are employed in this study: K-means partitioning methods, Partitioning Around Medoids (PAM) and the Heuristic Identification of Noisy Variables (HINoV). The results are compared with robust methods ones.展开更多
Dual clustering performs object clustering in both spatial and non-spatial domains that cannot be dealt with well by traditional clustering methods.However,recent dual clustering research has often omitted spatial out...Dual clustering performs object clustering in both spatial and non-spatial domains that cannot be dealt with well by traditional clustering methods.However,recent dual clustering research has often omitted spatial outliers,subjectively determined the weights of hybrid distance measures,and produced diverse clustering results.In this study,we first redefined the dual clustering problem and related concepts to highlight the clustering criteria.We then presented a self-organizing dual clustering algorithm (SDC) based on the self-organizing feature map and certain spatial analysis operations,including the Voronoi diagram and polygon aggregation and amalgamation.The algorithm employs a hybrid distance measure that combines geometric distance and non-spatial similarity,while the clustering spectrum analysis helps to determine the weight of non-spatial similarity in the measure.A case study was conducted on a spatial database of urban land price samples in Wuhan,China.SDC detected spatial outliers and clustered the points into spatially connective and attributively homogenous sub-groups.In particular,SDC revealed zonal areas that describe the actual distribution of land prices but were not demonstrated by other methods.SDC reduced the subjectivity in dual clustering.展开更多
文摘One of the most important problems of clustering is to define the number of classes. In fact, it is not easy to find an appropriate method to measure whether the cluster configuration is acceptable or not. In this paper we propose a possible and non-automatic solution considering different criteria of clustering and comparing their results. In this way robust structures of an analyzed dataset can be often caught (or established) and an optimal cluster configuration, which presents a meaningful association, may be defined. In particular, we also focus on the variables which may be used in cluster analysis. In fact, variables which contain little clustering information can cause misleading and not-robustness results. Therefore, three algorithms are employed in this study: K-means partitioning methods, Partitioning Around Medoids (PAM) and the Heuristic Identification of Noisy Variables (HINoV). The results are compared with robust methods ones.
基金supported by the National Natural Science Foundation of China(Grant No.40901188)the Key Laboratory of Geo-informatics of the State Bureau of Surveying and Mapping(Grant No.200906)the Fundamental Research Funds for the Central Universities(Grant No.4082002)
文摘Dual clustering performs object clustering in both spatial and non-spatial domains that cannot be dealt with well by traditional clustering methods.However,recent dual clustering research has often omitted spatial outliers,subjectively determined the weights of hybrid distance measures,and produced diverse clustering results.In this study,we first redefined the dual clustering problem and related concepts to highlight the clustering criteria.We then presented a self-organizing dual clustering algorithm (SDC) based on the self-organizing feature map and certain spatial analysis operations,including the Voronoi diagram and polygon aggregation and amalgamation.The algorithm employs a hybrid distance measure that combines geometric distance and non-spatial similarity,while the clustering spectrum analysis helps to determine the weight of non-spatial similarity in the measure.A case study was conducted on a spatial database of urban land price samples in Wuhan,China.SDC detected spatial outliers and clustered the points into spatially connective and attributively homogenous sub-groups.In particular,SDC revealed zonal areas that describe the actual distribution of land prices but were not demonstrated by other methods.SDC reduced the subjectivity in dual clustering.