Based on the analysis of features of the grid-based clustering method-clustering in quest (CLIQUE) and density-based clustering method-density-based spatial clustering of applications with noise (DBSCAN), a new cl...Based on the analysis of features of the grid-based clustering method-clustering in quest (CLIQUE) and density-based clustering method-density-based spatial clustering of applications with noise (DBSCAN), a new clustering algorithm named cooperative clustering based on grid and density (CLGRID) is presented. The new algorithm adopts an equivalent rule of regional inquiry and density unit identification. The central region of one class is calculated by the grid-based method and the margin region by a density-based method. By clustering in two phases and using only a small number of seed objects in representative units to expand the cluster, the frequency of region query can be decreased, and consequently the cost of time is reduced. The new algorithm retains positive features of both grid-based and density-based methods and avoids the difficulty of parameter searching. It can discover clusters of arbitrary shape with high efficiency and is not sensitive to noise. The application of CLGRID on test data sets demonstrates its validity and higher efficiency, which contrast with tradi- tional DBSCAN with R tree.展开更多
In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Associ...In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Association rules were used to analyze correlation and check consistency between indices. This study shows that the judgment obtained by weak association rules or non-association rules is more accurate and more credible than that obtained by strong association rules. When the testing grades of two indices in the weak association rules are inconsistent, the testing grades of indices are more likely to be erroneous, and the mistakes are often caused by human factors. Clustering data mining technology was used to analyze the reliability of a diagnosis, or to perform health diagnosis directly. Analysis showed that the clustering results are related to the indices selected, and that if the indices selected are more significant, the characteristics of clustering results are also more significant, and the analysis or diagnosis is more credible. The indices and diagnosis analysis function produced by this study provide a necessary theoretical foundation and new ideas for the development of hydraulic metal structure health diagnosis technology.展开更多
Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experien...Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.展开更多
Clustering is one of the most widely used data mining techniques that can be used to create homogeneous clusters.K-means is one of the popular clustering algorithms that,despite its inherent simplicity,has also some m...Clustering is one of the most widely used data mining techniques that can be used to create homogeneous clusters.K-means is one of the popular clustering algorithms that,despite its inherent simplicity,has also some major problems.One way to resolve these problems and improve the k-means algorithm is the use of evolutionary algorithms in clustering.In this study,the Imperialist Competitive Algorithm(ICA) is developed and then used in the clustering process.Clustering of IRIS,Wine and CMC datasets using developed ICA and comparing them with the results of clustering by the original ICA,GA and PSO algorithms,demonstrate the improvement of Imperialist competitive algorithm.展开更多
Recent emergence of diverse services have led to explosive traffic growth in cellular data networks. Understanding the service dynamics in large cellular networks is important for network design, trouble shooting, qua...Recent emergence of diverse services have led to explosive traffic growth in cellular data networks. Understanding the service dynamics in large cellular networks is important for network design, trouble shooting, quality of service(Qo E) support, and resource allocation. In this paper, we present our study to reveal the distributions and temporal patterns of different services in cellular data network from two different perspectives, namely service request times and service duration. Our study is based on big traffic data, which is parsed to readable records by our Hadoop-based packet parsing platform, captured over a week-long period from a tier-1 mobile operator's network in China. We propose a Zipf's ranked model to characterize the distributions of traffic volume, packet, request times and duration of cellular services. Two-stage method(Self-Organizing Map combined with kmeans) is first used to cluster time series of service into four request patterns and three duration patterns. These seven patterns are combined together to better understand the fine-grained temporal patterns of service in cellular network. Results of our distribution models and temporal patterns present cellular network operators with a better understanding of the request and duration characteristics of service, which of great importance in network design, service generation and resource allocation.展开更多
This paper outlines the results obtained from real time microseismic monitoring of an opencast coal mine in South India.The objective of the study is to investigate the stress changes within the rockmass along the slo...This paper outlines the results obtained from real time microseismic monitoring of an opencast coal mine in South India.The objective of the study is to investigate the stress changes within the rockmass along the slope due to underground mine development operation and their impact on the stability of the highwall slope.The installed microseismic systems recorded the seismic triggerings down toà2 moment magnitude.In general,most of the events recorded during the monitoring period are weak in seismic energy.The study adopts a simple and more reliable tool to characterize the seismically active zone for assessing the stability of the highwall in real time.The impact of underground working on the slope is studied on the basis of the seismic event impact contours and seismic clusters.During the monitoring period,it is observed that the intensity of the overall microseismic activity along the slope due to the mine development operations did not cause any adverse impact on the highwall stability.展开更多
Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-o...Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-of-the-art link analysis tech-niques,we propose a two-state model to approximate how CCs tangle with core modules.According to this model,we obtain scatter and centralization scores for each program element.Espe-cially,the scatter scores are adopted to select CC seeds.Further-more,to identify composite CCs,we adopt a novel similarity measurement and develop an undirected graph clustering to group these seeds.Finally,we compare it with the previous work and illustrate its effectiveness in identifying composite CCs.展开更多
The distance dynamics model is excellent tool for uncovering the community structure of a complex network. However, one issue that must be addressed by this model is its very long computation time in large-scale netwo...The distance dynamics model is excellent tool for uncovering the community structure of a complex network. However, one issue that must be addressed by this model is its very long computation time in large-scale networks. To identify the community structure of a large-scale network with high speed and high quality, in this paper, we propose a fast community detection algorithm, the F-Attractor, which is based on the distance dynamics model. The main contributions of the F-Attractor are as follows. First, we propose the use of two prejudgment rules from two different perspectives: node and edge. Based on these two rules, we develop a strategy of internal edge prejudgment for predicting the internal edges of the network. Internal edge prejudgment can reduce the number of edges and their neighbors that participate in the distance dynamics model. Second, we introduce a triangle distance to further enhance the speed of the interaction process in the distance dynamics model. This triangle distance uses two known distances to measure a third distance without any extra computation. We combine the above techniques to improve the distance dynamics model and then describe the community detection process of the F-Attractor. The results of an extensive series of experiments demonstrate that the F-Attractor offers high-speed community detection and high partition quality.展开更多
基金This project is supported by National Natural Science Foundation of China(No.50575153).
文摘Based on the analysis of features of the grid-based clustering method-clustering in quest (CLIQUE) and density-based clustering method-density-based spatial clustering of applications with noise (DBSCAN), a new clustering algorithm named cooperative clustering based on grid and density (CLGRID) is presented. The new algorithm adopts an equivalent rule of regional inquiry and density unit identification. The central region of one class is calculated by the grid-based method and the margin region by a density-based method. By clustering in two phases and using only a small number of seed objects in representative units to expand the cluster, the frequency of region query can be decreased, and consequently the cost of time is reduced. The new algorithm retains positive features of both grid-based and density-based methods and avoids the difficulty of parameter searching. It can discover clusters of arbitrary shape with high efficiency and is not sensitive to noise. The application of CLGRID on test data sets demonstrates its validity and higher efficiency, which contrast with tradi- tional DBSCAN with R tree.
基金supported by the Key Program of the National Natural Science Foundation of China(Grant No.50539010)the Special Fund for Public Welfare Industry of the Ministry of Water Resources of China(Grant No.200801019)
文摘In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Association rules were used to analyze correlation and check consistency between indices. This study shows that the judgment obtained by weak association rules or non-association rules is more accurate and more credible than that obtained by strong association rules. When the testing grades of two indices in the weak association rules are inconsistent, the testing grades of indices are more likely to be erroneous, and the mistakes are often caused by human factors. Clustering data mining technology was used to analyze the reliability of a diagnosis, or to perform health diagnosis directly. Analysis showed that the clustering results are related to the indices selected, and that if the indices selected are more significant, the characteristics of clustering results are also more significant, and the analysis or diagnosis is more credible. The indices and diagnosis analysis function produced by this study provide a necessary theoretical foundation and new ideas for the development of hydraulic metal structure health diagnosis technology.
文摘Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.
文摘Clustering is one of the most widely used data mining techniques that can be used to create homogeneous clusters.K-means is one of the popular clustering algorithms that,despite its inherent simplicity,has also some major problems.One way to resolve these problems and improve the k-means algorithm is the use of evolutionary algorithms in clustering.In this study,the Imperialist Competitive Algorithm(ICA) is developed and then used in the clustering process.Clustering of IRIS,Wine and CMC datasets using developed ICA and comparing them with the results of clustering by the original ICA,GA and PSO algorithms,demonstrate the improvement of Imperialist competitive algorithm.
基金supported by the National Basic Research Program of China (973 Program: 2013CB329004)
文摘Recent emergence of diverse services have led to explosive traffic growth in cellular data networks. Understanding the service dynamics in large cellular networks is important for network design, trouble shooting, quality of service(Qo E) support, and resource allocation. In this paper, we present our study to reveal the distributions and temporal patterns of different services in cellular data network from two different perspectives, namely service request times and service duration. Our study is based on big traffic data, which is parsed to readable records by our Hadoop-based packet parsing platform, captured over a week-long period from a tier-1 mobile operator's network in China. We propose a Zipf's ranked model to characterize the distributions of traffic volume, packet, request times and duration of cellular services. Two-stage method(Self-Organizing Map combined with kmeans) is first used to cluster time series of service into four request patterns and three duration patterns. These seven patterns are combined together to better understand the fine-grained temporal patterns of service in cellular network. Results of our distribution models and temporal patterns present cellular network operators with a better understanding of the request and duration characteristics of service, which of great importance in network design, service generation and resource allocation.
基金the S&T project ‘‘High resolution microseismic monitoring for early detection and analysis of slope failure in opencast mines’’ funded by inistry of Coal,Government of IndiaThe Singareni Collieries Co Ltd (SCCL),Andhra Pradesh
文摘This paper outlines the results obtained from real time microseismic monitoring of an opencast coal mine in South India.The objective of the study is to investigate the stress changes within the rockmass along the slope due to underground mine development operation and their impact on the stability of the highwall slope.The installed microseismic systems recorded the seismic triggerings down toà2 moment magnitude.In general,most of the events recorded during the monitoring period are weak in seismic energy.The study adopts a simple and more reliable tool to characterize the seismically active zone for assessing the stability of the highwall in real time.The impact of underground working on the slope is studied on the basis of the seismic event impact contours and seismic clusters.During the monitoring period,it is observed that the intensity of the overall microseismic activity along the slope due to the mine development operations did not cause any adverse impact on the highwall stability.
基金Supported by the National Pre-research Project (513150601)
文摘Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-of-the-art link analysis tech-niques,we propose a two-state model to approximate how CCs tangle with core modules.According to this model,we obtain scatter and centralization scores for each program element.Espe-cially,the scatter scores are adopted to select CC seeds.Further-more,to identify composite CCs,we adopt a novel similarity measurement and develop an undirected graph clustering to group these seeds.Finally,we compare it with the previous work and illustrate its effectiveness in identifying composite CCs.
基金supported by the National Natural Science Foundation of China(Nos.61573299,61174140,61472127,and 61272395)the Social Science Foundation of Hunan Province(No.16ZDA07)+2 种基金China Postdoctoral Science Foundation(Nos.2013M540628and 2014T70767)the Natural Science Foundation of Hunan Province(Nos.14JJ3107 and 2017JJ5064)the Excellent Youth Scholars Project of Hunan Province(No.15B087)
文摘The distance dynamics model is excellent tool for uncovering the community structure of a complex network. However, one issue that must be addressed by this model is its very long computation time in large-scale networks. To identify the community structure of a large-scale network with high speed and high quality, in this paper, we propose a fast community detection algorithm, the F-Attractor, which is based on the distance dynamics model. The main contributions of the F-Attractor are as follows. First, we propose the use of two prejudgment rules from two different perspectives: node and edge. Based on these two rules, we develop a strategy of internal edge prejudgment for predicting the internal edges of the network. Internal edge prejudgment can reduce the number of edges and their neighbors that participate in the distance dynamics model. Second, we introduce a triangle distance to further enhance the speed of the interaction process in the distance dynamics model. This triangle distance uses two known distances to measure a third distance without any extra computation. We combine the above techniques to improve the distance dynamics model and then describe the community detection process of the F-Attractor. The results of an extensive series of experiments demonstrate that the F-Attractor offers high-speed community detection and high partition quality.