In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering a...In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering algorithm is proposed. First, the concept of a silhouette coefficient is introduced, and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values. Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed. Finally, the clustering is completed by the traditional k-means clustering. By the theoretical analysis, it is proved that the improved k-means clustering algorithm has proper computational complexity. The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently, and the entropy generated by the algorithm is lower.展开更多
The homogeneous risk characteristics within a sub-area and the heterogeneous from one sub-area to another are unclear using existing environmental risk zoning methods. This study presents a new zoning method by determ...The homogeneous risk characteristics within a sub-area and the heterogeneous from one sub-area to another are unclear using existing environmental risk zoning methods. This study presents a new zoning method by determining and categorizing the risk characteristics using the k-means clustering data mining technology. The study constructs indices and develops index quantification models for environmental risk zoning by analyzing the mechanism of environmental risk occurrence. We calculate the source risk index, air risk field index, water risk field index, and target vulnerability of the study area with Nanjing Chemical Industrial Park using a 100 m - 100 m mesh grid as the basic zoning unit, and then use k-means clustering to analyze the environmental risk in the area. We obtain the optimal clustering number with the largest average silhouette coefficient by calculating the average silhouette coefficients of clustering at different k-values. The clustering result with the optimal clustering number is then used for the environmental risk zoning, and the zoning result is mapped using the geographic information system. The study area is divided into five sub-areas. The common environmental risk characteristics within the same sub-area, as well as the differences between sub- areas, are presented. The zoning is helpful in risk management and is convenient for decision makers to distribute limited resources to different sub-areas in the design of risk reducing intervention.展开更多
基金The National Natural Science Foundation of China(No50674086)Specialized Research Fund for the Doctoral Program of Higher Education (No20060290508)the Youth Scientific Research Foundation of China University of Mining and Technology (No2006A047)
文摘In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm, an improved k-means clustering algorithm is proposed. First, the concept of a silhouette coefficient is introduced, and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values. Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed. Finally, the clustering is completed by the traditional k-means clustering. By the theoretical analysis, it is proved that the improved k-means clustering algorithm has proper computational complexity. The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently, and the entropy generated by the algorithm is lower.
文摘The homogeneous risk characteristics within a sub-area and the heterogeneous from one sub-area to another are unclear using existing environmental risk zoning methods. This study presents a new zoning method by determining and categorizing the risk characteristics using the k-means clustering data mining technology. The study constructs indices and develops index quantification models for environmental risk zoning by analyzing the mechanism of environmental risk occurrence. We calculate the source risk index, air risk field index, water risk field index, and target vulnerability of the study area with Nanjing Chemical Industrial Park using a 100 m - 100 m mesh grid as the basic zoning unit, and then use k-means clustering to analyze the environmental risk in the area. We obtain the optimal clustering number with the largest average silhouette coefficient by calculating the average silhouette coefficients of clustering at different k-values. The clustering result with the optimal clustering number is then used for the environmental risk zoning, and the zoning result is mapped using the geographic information system. The study area is divided into five sub-areas. The common environmental risk characteristics within the same sub-area, as well as the differences between sub- areas, are presented. The zoning is helpful in risk management and is convenient for decision makers to distribute limited resources to different sub-areas in the design of risk reducing intervention.