期刊文献+
共找到14篇文章
< 1 >
每页显示 20 50 100
CLUSTERING VALIDITY BASED ON THE IMPROVED S_DBW INDEX 被引量:1
1
作者 Tong Jianhua Tan Hongzhou 《Journal of Electronics(China)》 2009年第2期258-264,共7页
For many clustering algorithms,it is very important to determine an appropriate number of clusters,which is called cluster validity problem.In this paper,a new clustering validity assessment index is proposed based on... For many clustering algorithms,it is very important to determine an appropriate number of clusters,which is called cluster validity problem.In this paper,a new clustering validity assessment index is proposed based on a novel method to select the margin point between two clusters for in-ter-cluster similarity more accurately,and provides an improved scatter function for intra-cluster similarity.Simulation results show the effectiveness of the proposed index on the data sets under consideration regardless of the choice of a clustering algorithm. 展开更多
关键词 clustering validity Inter-cluster similarity Intra-cluster similarity
下载PDF
Internal Validity Index for Fuzzy Clustering Based on Relative Uncertainty
2
作者 Refik Tanju Sirmen Burak Berk Üstündag 《Computers, Materials & Continua》 SCIE EI 2022年第8期2909-2926,共18页
Unsupervised clustering and clustering validity are used as essential instruments of data analytics.Despite clustering being realized under uncertainty,validity indices do not deliver any quantitative evaluation of th... Unsupervised clustering and clustering validity are used as essential instruments of data analytics.Despite clustering being realized under uncertainty,validity indices do not deliver any quantitative evaluation of the uncertainties in the suggested partitionings.Also,validity measures may be biased towards the underlying clustering method.Moreover,neglecting a confidence requirement may result in over-partitioning.In the absence of an error estimate or a confidence parameter,probable clustering errors are forwarded to the later stages of the system.Whereas,having an uncertainty margin of the projected labeling can be very fruitful for many applications such as machine learning.Herein,the validity issue was approached through estimation of the uncertainty and a novel low complexity index proposed for fuzzy clustering.It involves only uni-dimensional membership weights,regardless of the data dimension,stipulates no specific distribution,and is independent of the underlying similarity measure.Inclusive tests and comparisons returned that it can reliably estimate the optimum number of partitions under different data distributions,besides behaving more robust to over partitioning.Also,in the comparative correlation analysis between true clustering error rates and some known internal validity indices,the suggested index exhibited the highest strong correlations.This relationship has been also proven stable through additional statistical acceptance tests.Thus the provided relative uncertainty measure can be used as a probable error estimate in the clustering as well.Besides,it is the only method known that can exclusively identify data points in dubiety and is adjustable according to the required confidence level. 展开更多
关键词 Machine learning data science clustering validity fuzzy clustering UNCERTAINTY intelligent systems data analytics
下载PDF
NEW SHADOWED C-MEANS CLUSTERING WITH FEATURE WEIGHTS 被引量:2
3
作者 王丽娜 王建东 姜坚 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI 2012年第3期273-283,共11页
Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the ... Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the prototype of each cluster. By integrating feature weights, a formula for weight calculation is introduced to the clustering algorithm. The selection of weight exponent is crucial for good result and the weights are updated iteratively with each partition of clusters. The convergence of the weighted algorithms is given, and the feasible cluster validity indices of data mining application are utilized. Experimental results on both synthetic and real-life numerical data with different feature weights demonstrate that the weighted algorithm is better than the other unweighted algorithms. 展开更多
关键词 fuzzy C-means shadowed sets shadowed C-means feature weights cluster validity index
下载PDF
The Effective Clustering Partition Algorithm Based on the Genetic Evolution 被引量:1
4
作者 廖芹 李希雯 《Journal of Donghua University(English Edition)》 EI CAS 2006年第6期43-46,共4页
To the problem that it is hard to determine the clustering number and the abnormal points by using the clustering validity function, an effective clustering partition model based on the genetic algorithm is built in t... To the problem that it is hard to determine the clustering number and the abnormal points by using the clustering validity function, an effective clustering partition model based on the genetic algorithm is built in this paper. The solution to the problem is formed by the combination of the clustering partition and the encoding samples, and the fitness function is defined by the distances among and within clusters. The clustering number and the samples in each cluster are determined and the abnormal points are distinguished by implementing the triple random crossover operator and the mutation. Based on the known sample data, the results of the novel method and the clustering validity function are compared. Numerical experiments are given and the results show that the novel method is more effective. 展开更多
关键词 clustering validity genetic algorithm clustering number abnormal point.
下载PDF
Analysis of users’ electricity consumption behavior based on ensemble clustering 被引量:7
5
作者 Qi Zhao Haolin Li +2 位作者 Xinying Wang Tianjiao Pu Jiye Wang 《Global Energy Interconnection》 2019年第6期479-489,共11页
Due to the increase in the number of smart meter devices,a power grid generates a large amount of data.Analyzing the data can help in understanding the users’electricity consumption behavior and demands;thus,enabling... Due to the increase in the number of smart meter devices,a power grid generates a large amount of data.Analyzing the data can help in understanding the users’electricity consumption behavior and demands;thus,enabling better service to be provided to them.Performing power load profile clustering is the basis for mining the users’electricity consumption behavior.By examining the complexity,randomness,and uncertainty of the users’electricity consumption behavior,this paper proposes an ensemble clustering method to analyze this behavior.First,principle component analysis(PCA)is used to reduce the dimensions of the data.Subsequently,the single clustering method is used,and the majority is selected for integrated clustering.As a result,the users’electricity consumption behavior is classified into different modes,and their characteristics are analyzed in detail.This paper examines the electricity power data of 19 real users in China for simulation purposes.This manuscript provides a thorough analysis along with suggestions for the users’weekly electricity consumption behavior.The results verify the effectiveness of the proposed method. 展开更多
关键词 Users’electricity consumption Ensemble clustering Dimensionality reduction Cluster validity
下载PDF
Development of slope mass rating system using K-means and fuzzy c-means clustering algorithms 被引量:1
6
作者 Jalali Zakaria 《International Journal of Mining Science and Technology》 SCIE EI CSCD 2016年第6期959-966,共8页
Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experien... Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions. 展开更多
关键词 SMR based on continuous functions Slope stability analysis K-means and FCM clustering algorithms Validation of clustering algorithms Sangan iron ore mines
下载PDF
An Efficient Agglomerative Clustering Algorithm for Web Navigation Pattern Identification
7
作者 A. Anitha 《Circuits and Systems》 2016年第9期2349-2356,共9页
Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketin... Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketing, for web personalization, to predict web access sequence etc. In this paper, a new agglomerative clustering technique is proposed to identify users with similar interest, and to determine the motivation for visiting a website. Using this approach, web usage mining is done through different stages namely data cleaning, preprocessing, pattern discovery and pattern analysis. Results are given to explain how this approach produces tight usage clusters than the existing web usage mining techniques. Rather than traditional distance based clustering, the similarity measure is considered during clustering process in order to reduce computational complexity. This paper also deals with the problem of assessing the quality of user session clusters and cluster validity is measured by using statistical test, which measures the distances of clusters distributions to infer their dissimilarity and distinguish level. Using such statistical measures, it is proved that cluster accuracy is improved to the extent of 0.83, over existing k-means clustering with validity measure 0.26, FCM (Fuzzy C Means) clustering with validity measure 0.56. Rough set based clustering with validity measure 0.54 Generation of dense clusters is essential for finding interesting patterns needed for further mining and analysis. 展开更多
关键词 Agglomerative clustering Similarity Measure Cluster validity Clickstream Sequence TRANSACTION
下载PDF
Evolutionary Multi-Tasking Optimization for High-Efficiency Time Series Data Clustering
8
作者 Rui Wang Wenhua Li +2 位作者 Kaili Shen Tao Zhang Xiangke Liao 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2024年第2期343-355,共13页
Time series clustering is a challenging problem due to the large-volume,high-dimensional,and warping characteristics of time series data.Traditional clustering methods often use a single criterion or distance measure,... Time series clustering is a challenging problem due to the large-volume,high-dimensional,and warping characteristics of time series data.Traditional clustering methods often use a single criterion or distance measure,which may not capture all the features of the data.This paper proposes a novel method for time series clustering based on evolutionary multi-tasking optimization,termed i-MFEA,which uses an improved multifactorial evolutionary algorithm to optimize multiple clustering tasks simultaneously,each with a different validity index or distance measure.Therefore,i-MFEA can produce diverse and robust clustering solutions that satisfy various preferences of decision-makers.Experiments on two artificial datasets show that i-MFEA outperforms single-objective evolutionary algorithms and traditional clustering methods in terms of convergence speed and clustering quality.The paper also discusses how i-MFEA can address two long-standing issues in time series clustering:the choice of appropriate similarity measure and the number of clusters. 展开更多
关键词 time series clustering evolutionary multi-tasking multifactorial optimization clustering validity index distance measure
原文传递
Application of Two-Order Difference to Gap Statistic
9
作者 岳士弘 王秀秀 魏苗苗 《Transactions of Tianjin University》 EI CAS 2008年第3期217-221,共5页
Gap statistic is a well-known index of clustering validity, but its realization is difficult to be comprehended and accurately determined. A direct method is presented to improve the performance of the Gap statistic, ... Gap statistic is a well-known index of clustering validity, but its realization is difficult to be comprehended and accurately determined. A direct method is presented to improve the performance of the Gap statistic, which applies the two-order difference of within-cluster dispersion to replace the constructed null reference distribution in the Gap statistic. Hence, the realization of the Gap statistic becomes easy and is reformulated, and its uncertainty in applications is reduced. Also, the limitation of the Gap statistic is analyzed by two typical examples, that is, the Gap statistic is difficult to be applied to the dataset that contains strong-overlap or uneven-density clusters. Experiments verify the usefulness of the proposed method. 展开更多
关键词 clustering validity Gap statistic data structure
下载PDF
A FUZZY CLOPE ALGORITHM AND ITS OPTIMAL PARAMETER CHOICE 被引量:1
10
作者 Li Jie Gao Xinbo Jiao Licheng 《Journal of Electronics(China)》 2006年第3期384-388,共5页
Among the available clustering algorithms in data mining, the CLOPE algorithm attracts much more attention with its high speed and good performance. However, the proper choice of some parameters in the CLOPE algorithm... Among the available clustering algorithms in data mining, the CLOPE algorithm attracts much more attention with its high speed and good performance. However, the proper choice of some parameters in the CLOPE algorithm directly affects the validity of the clustering results, which is still an open issue. For this purpose, this paper proposes a fuzzy CLOPE algorithm, and presents a method for the optimal parameter choice by defining a modified partition fuzzy degree as a clustering validity function. The experimental results with real data set illustrate the effectiveness of the proposed fuzzy CLOPE algorithm and optimal parameter choice method based on the modified partition fuzzy degree. 展开更多
关键词 Data mining Cluster analysis Cluster validity Categorical attributes Optimal parameter choice
下载PDF
Electric Load Clustering in Smart Grid:Methodologies,Applications,and Future Trends 被引量:6
11
作者 Caomingzhe Si Shenglan Xu +3 位作者 Can Wan Dawei Chen Wenkang Cui Junhua Zhao 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2021年第2期237-252,共16页
With the increasingly widespread of advanced metering infrastructure,electric load clustering is becoming more essential for its great potential in analytics of consumers’energy consumption patterns and preference th... With the increasingly widespread of advanced metering infrastructure,electric load clustering is becoming more essential for its great potential in analytics of consumers’energy consumption patterns and preference through data mining.Moreover,a variety of electric load clustering techniques have been put into practice to obtain the distribution of load data,observe the characteristics of load clusters,and classify the components of the total load.This can give rise to the development of related techniques and research in the smart grid,such as demand-side response.This paper summarizes the basic concepts and the general process in electric load clustering.Several similarity measurements and five major categories in electric load clustering are then comprehensively summarized along with their advantages and disadvantages.Afterwards,eight indices widely used to evaluate the validity of electric load clustering are described.Finally,vital applications are discussed thoroughly along with future trends including the tariff design,anomaly detection,load forecasting,data security and big data,etc. 展开更多
关键词 Electric load clustering similarity measurement clustering technique cluster validity indicator smart grid
原文传递
Novel Cluster Validity Index for FCM Algorithm 被引量:6
12
作者 于剑 李翠霞 《Journal of Computer Science & Technology》 SCIE EI CSCD 2006年第1期137-140,共4页
How to determine an appropriate number of clusters is very important when implementing a specific clustering algorithm, like c-means, fuzzy c-means (FCM). In the literature, most cluster validity indices are origina... How to determine an appropriate number of clusters is very important when implementing a specific clustering algorithm, like c-means, fuzzy c-means (FCM). In the literature, most cluster validity indices are originated from partition or geometrical property of the data set. In this paper, the authors developed a novel cluster validity index for FCM, based on the optimality test of FCM. Unlike the previous cluster validity indices, this novel cluster validity index is inherent in FCM itself. Comparison experiments show that the stability index can be used as cluster validity index for the fuzzy c-means. 展开更多
关键词 cluster validity optimality test FCM
原文传递
The upper bound of the optimal number of clusters in fuzzy clustering 被引量:6
13
作者 于剑 程乾生 《Science in China(Series F)》 2001年第2期119-125,共7页
The upper bound of the optimal number of clusters in clustering algorithm is studied in this paper. A new method is proposed to solve this issue. This method shows that the rule cmax≤N^(1/N), which is popular in curr... The upper bound of the optimal number of clusters in clustering algorithm is studied in this paper. A new method is proposed to solve this issue. This method shows that the rule cmax≤N^(1/N), which is popular in current papers, is reasonable in some sense. The above conclusion is tested and analyzed by some typical examples in the literature, which demonstrates the validity of the new method. 展开更多
关键词 clustering algorithm cluster validity the optimal number of clusters UNCERTAINTY fuzzy clustering.
原文传递
A new cluster validity index using maximum cluster spread based compactness measure
14
作者 M.Arif Wani Romana Riyaz 《International Journal of Intelligent Computing and Cybernetics》 EI 2016年第2期179-204,共26页
Purpose-The most commonly used approaches for cluster validation are based on indices but the majority of the existing cluster validity indices do not work well on data sets of different complexities.The purpose of th... Purpose-The most commonly used approaches for cluster validation are based on indices but the majority of the existing cluster validity indices do not work well on data sets of different complexities.The purpose of this paper is to propose a new cluster validity index(ARSD index)that works well on all types of data sets.Design/methodology/approach-The authors introduce a new compactness measure that depicts the typical behaviour of a cluster where more points are located around the centre and lesser points towards the outer edge of the cluster.A novel penalty function is proposed for determining the distinctness measure of clusters.Random linear search-algorithm is employed to evaluate and compare the performance of the five commonly known validity indices and the proposed validity index.The values of the six indices are computed for all nc ranging from(nc_(min),nc_(max))to obtain the optimal number of clusters present in a data set.The data sets used in the experiments include shaped,Gaussian-like and real data sets.Findings-Through extensive experimental study,it is observed that the proposed validity index is found to be more consistent and reliable in indicating the correct number of clusters compared to other validity indices.This is experimentally demonstrated on 11 data sets where the proposed index has achieved better results.Originality/value-The originality of the research paper includes proposing a novel cluster validity index which is used to determine the optimal number of clusters present in data sets of different complexities. 展开更多
关键词 clustering Cluster analysis Cluster validity Compactness measure Optimal number Distinctness measure
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部