期刊文献+
共找到8,388篇文章
< 1 2 250 >
每页显示 20 50 100
Sparse Reconstructive Evidential Clustering for Multi-View Data
1
作者 Chaoyu Gong Yang You 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第2期459-473,共15页
Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, t... Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, these existing algorithms create only the hard and fuzzy partitions for multi-view objects,which are often located in highly-overlapping areas of multi-view feature space. The adoption of hard and fuzzy partition ignores the ambiguity and uncertainty in the assignment of objects, likely leading to performance degradation. To address these issues, we propose a novel sparse reconstructive multi-view evidential clustering algorithm(SRMVEC). Based on a sparse reconstructive procedure, SRMVEC learns a shared affinity matrix across views, and maps multi-view objects to a 2-dimensional humanreadable chart by calculating 2 newly defined mathematical metrics for each object. From this chart, users can detect the number of clusters and select several objects existing in the dataset as cluster centers. Then, SRMVEC derives a credal partition under the framework of evidence theory, improving the fault tolerance of clustering. Ablation studies show the benefits of adopting the sparse reconstructive procedure and evidence theory. Besides,SRMVEC delivers effectiveness on benchmark datasets by outperforming some state-of-the-art methods. 展开更多
关键词 Evidence theory multi-view clustering(MVC) optimization sparse reconstruction
下载PDF
A novel method for clustering cellular data to improve classification
2
作者 Diek W.Wheeler Giorgio A.Ascoli 《Neural Regeneration Research》 SCIE CAS 2025年第9期2697-2705,共9页
Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subse... Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons. 展开更多
关键词 cellular data clustering dendrogram data classification Levene's one-tailed statistical test unsupervised hierarchical clustering
下载PDF
Contrastive Consistency and Attentive Complementarity for Deep Multi-View Subspace Clustering
3
作者 Jiao Wang Bin Wu Hongying Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第4期143-160,共18页
Deep multi-view subspace clustering (DMVSC) based on self-expression has attracted increasing attention dueto its outstanding performance and nonlinear application. However, most existing methods neglect that viewpriv... Deep multi-view subspace clustering (DMVSC) based on self-expression has attracted increasing attention dueto its outstanding performance and nonlinear application. However, most existing methods neglect that viewprivatemeaningless information or noise may interfere with the learning of self-expression, which may lead to thedegeneration of clustering performance. In this paper, we propose a novel framework of Contrastive Consistencyand Attentive Complementarity (CCAC) for DMVsSC. CCAC aligns all the self-expressions of multiple viewsand fuses them based on their discrimination, so that it can effectively explore consistent and complementaryinformation for achieving precise clustering. Specifically, the view-specific self-expression is learned by a selfexpressionlayer embedded into the auto-encoder network for each view. To guarantee consistency across views andreduce the effect of view-private information or noise, we align all the view-specific self-expressions by contrastivelearning. The aligned self-expressions are assigned adaptive weights by channel attention mechanism according totheir discrimination. Then they are fused by convolution kernel to obtain consensus self-expression withmaximumcomplementarity ofmultiple views. Extensive experimental results on four benchmark datasets and one large-scaledataset of the CCAC method outperformother state-of-the-artmethods, demonstrating its clustering effectiveness. 展开更多
关键词 Deep multi-view subspace clustering contrastive learning adaptive fusion self-expression learning
下载PDF
Improved Data Stream Clustering Method: Incorporating KD-Tree for Typicality and Eccentricity-Based Approach
4
作者 Dayu Xu Jiaming Lu +1 位作者 Xuyao Zhang Hongtao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第2期2557-2573,共17页
Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims... Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research. 展开更多
关键词 data stream clustering TEDA KD-TREE scapegoat tree
下载PDF
Low-Rank Multi-View Subspace Clustering Based on Sparse Regularization
5
作者 Yan Sun Fanlong Zhang 《Journal of Computer and Communications》 2024年第4期14-30,共17页
Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The signif... Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The significance of low-rank prior in MVSC is emphasized, highlighting its role in capturing the global data structure across views for improved performance. However, it faces challenges with outlier sensitivity due to its reliance on the Frobenius norm for error measurement. Addressing this, our paper proposes a Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization (LMVSC- Sparse) approach. Sparse regularization helps in selecting the most relevant features or views for clustering while ignoring irrelevant or noisy ones. This leads to a more efficient and effective representation of the data, improving the clustering accuracy and robustness, especially in the presence of outliers or noisy data. By incorporating sparse regularization, LMVSC-Sparse can effectively handle outlier sensitivity, which is a common challenge in traditional MVSC methods relying solely on low-rank priors. Then Alternating Direction Method of Multipliers (ADMM) algorithm is employed to solve the proposed optimization problems. Our comprehensive experiments demonstrate the efficiency and effectiveness of LMVSC-Sparse, offering a robust alternative to traditional MVSC methods. 展开更多
关键词 clustering multi-view Subspace clustering Low-Rank Prior Sparse Regularization
下载PDF
An air combat maneuver pattern extraction based on time series segmentation and clustering analysis
6
作者 Zhifei Xi Yingxin Kou +2 位作者 Zhanwu Li Yue Lv You Li 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第6期149-162,共14页
Target maneuver recognition is a prerequisite for air combat situation awareness,trajectory prediction,threat assessment and maneuver decision.To get rid of the dependence of the current target maneuver recognition me... Target maneuver recognition is a prerequisite for air combat situation awareness,trajectory prediction,threat assessment and maneuver decision.To get rid of the dependence of the current target maneuver recognition method on empirical criteria and sample data,and automatically and adaptively complete the task of extracting the target maneuver pattern,in this paper,an air combat maneuver pattern extraction based on time series segmentation and clustering analysis is proposed by combining autoencoder,G-G clustering algorithm and the selective ensemble clustering analysis algorithm.Firstly,the autoencoder is used to extract key features of maneuvering trajectory to remove the impacts of redundant variables and reduce the data dimension;Then,taking the time information into account,the segmentation of Maneuver characteristic time series is realized with the improved FSTS-AEGG algorithm,and a large number of maneuver primitives are extracted;Finally,the maneuver primitives are grouped into some categories by using the selective ensemble multiple time series clustering algorithm,which can prove that each class represents a maneuver action.The maneuver pattern extraction method is applied to small scale air combat trajectory and can recognize and correctly partition at least 71.3%of maneuver actions,indicating that the method is effective and satisfies the requirements for engineering accuracy.In addition,this method can provide data support for various target maneuvering recognition methods proposed in the literature,greatly reduce the workload and improve the recognition accuracy. 展开更多
关键词 Maneuver pattern extraction data mining Fuzzy segmentation Selective ensemble clustering
下载PDF
Subspace Clustering in High-Dimensional Data Streams:A Systematic Literature Review
7
作者 Nur Laila Ab Ghani Izzatdin Abdul Aziz Said Jadid AbdulKadir 《Computers, Materials & Continua》 SCIE EI 2023年第5期4649-4668,共20页
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac... Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams. 展开更多
关键词 clustering subspace clustering projected clustering data stream stream clustering high dimensionality evolving data stream concept drift
下载PDF
Metaheuristic Based Clustering with Deep Learning Model for Big Data Classification
8
作者 R.Krishnaswamy Kamalraj Subramaniam +3 位作者 V.Nandini K.Vijayalakshmi Seifedine Kadry Yunyoung Nam 《Computer Systems Science & Engineering》 SCIE EI 2023年第1期391-406,共16页
Recently,a massive quantity of data is being produced from a distinct number of sources and the size of the daily created on the Internet has crossed two Exabytes.At the same time,clustering is one of the efficient te... Recently,a massive quantity of data is being produced from a distinct number of sources and the size of the daily created on the Internet has crossed two Exabytes.At the same time,clustering is one of the efficient techniques for mining big data to extract the useful and hidden patterns that exist in it.Density-based clustering techniques have gained significant attention owing to the fact that it helps to effectively recognize complex patterns in spatial dataset.Big data clustering is a trivial process owing to the increasing quantity of data which can be solved by the use of Map Reduce tool.With this motivation,this paper presents an efficient Map Reduce based hybrid density based clustering and classification algorithm for big data analytics(MR-HDBCC).The proposed MR-HDBCC technique is executed on Map Reduce tool for handling the big data.In addition,the MR-HDBCC technique involves three distinct processes namely pre-processing,clustering,and classification.The proposed model utilizes the Density-Based Spatial Clustering of Applications with Noise(DBSCAN)techni-que which is capable of detecting random shapes and diverse clusters with noisy data.For improving the performance of the DBSCAN technique,a hybrid model using cockroach swarm optimization(CSO)algorithm is developed for the exploration of the search space and determine the optimal parameters for density based clustering.Finally,bidirectional gated recurrent neural network(BGRNN)is employed for the classification of big data.The experimental validation of the proposed MR-HDBCC technique takes place using the benchmark dataset and the simulation outcomes demonstrate the promising performance of the proposed model interms of different measures. 展开更多
关键词 Big data data classification clustering MAPREDUCE dbscan algorithm
下载PDF
Combination of density-clustering and supervised classification for event identification in single-molecule force spectroscopy data
9
作者 袁泳怡 梁嘉伦 +3 位作者 谭创 杨雪滢 杨东尼 马杰 《Chinese Physics B》 SCIE EI CAS CSCD 2023年第10期749-755,共7页
Single-molecule force spectroscopy(SMFS)measurements of the dynamics of biomolecules typically require identifying massive events and states from large data sets,such as extracting rupture forces from force-extension ... Single-molecule force spectroscopy(SMFS)measurements of the dynamics of biomolecules typically require identifying massive events and states from large data sets,such as extracting rupture forces from force-extension curves(FECs)in pulling experiments and identifying states from extension-time trajectories(ETTs)in force-clamp experiments.The former is often accomplished manually and hence is time-consuming and laborious while the latter is always impeded by the presence of baseline drift.In this study,we attempt to accurately and automatically identify the events and states from SMFS experiments with a machine learning approach,which combines clustering and classification for event identification of SMFS(ACCESS).As demonstrated by analysis of a series of data sets,ACCESS can extract the rupture forces from FECs containing multiple unfolding steps and classify the rupture forces into the corresponding conformational transitions.Moreover,ACCESS successfully identifies the unfolded and folded states even though the ETTs display severe nonmonotonic baseline drift.Besides,ACCESS is straightforward in use as it requires only three easy-to-interpret parameters.As such,we anticipate that ACCESS will be a useful,easy-to-implement and high-performance tool for event and state identification across a range of single-molecule experiments. 展开更多
关键词 single-molecule force spectroscopy data analysis density-based clustering supervised classification
下载PDF
Power Incomplete Data Clustering Based on Fuzzy Fusion Algorithm
10
作者 Yutian Hong Yuping Yan 《Energy Engineering》 EI 2023年第1期245-261,共17页
With the rapid development of the economy,the scale of the power grid is expanding.The number of power equipment that constitutes the power grid has been very large,which makes the state data of power equipment grow e... With the rapid development of the economy,the scale of the power grid is expanding.The number of power equipment that constitutes the power grid has been very large,which makes the state data of power equipment grow explosively.These multi-source heterogeneous data have data differences,which lead to data variation in the process of transmission and preservation,thus forming the bad information of incomplete data.Therefore,the research on data integrity has become an urgent task.This paper is based on the characteristics of random chance and the Spatio-temporal difference of the system.According to the characteristics and data sources of the massive data generated by power equipment,the fuzzy mining model of power equipment data is established,and the data is divided into numerical and non-numerical data based on numerical data.Take the text data of power equipment defects as the mining material.Then,the Apriori algorithm based on an array is used to mine deeply.The strong association rules in incomplete data of power equipment are obtained and analyzed.From the change trend of NRMSE metrics and classification accuracy,most of the filling methods combined with the two frameworks in this method usually show a relatively stable filling trend,and will not fluctuate greatly with the growth of the missing rate.The experimental results show that the proposed algorithm model can effectively improve the filling effect of the existing filling methods on most data sets,and the filling effect fluctuates greatly with the increase of the missing rate,that is,with the increase of the missing rate,the improvement effect of the model for the existing filling methods is higher than 4.3%.Through the incomplete data clustering technology studied in this paper,a more innovative state assessment of smart grid reliability operation is carried out,which has good research value and reference significance. 展开更多
关键词 Power system equipment parameter incomplete data fuzzy analysis data clustering
下载PDF
Picture-Neutrosophic Trusted Safe Semi-Supervised Fuzzy Clustering for Noisy Data
11
作者 Pham Huy Thong Florentin Smarandache +5 位作者 Phung The Huan Tran Manh Tuan Tran Thi Ngan Vu Duc Thai Nguyen Long Giang Le Hoang Son 《Computer Systems Science & Engineering》 SCIE EI 2023年第8期1981-1997,共17页
Clustering is a crucial method for deciphering data structure and producing new information.Due to its significance in revealing fundamental connections between the human brain and events,it is essential to utilize cl... Clustering is a crucial method for deciphering data structure and producing new information.Due to its significance in revealing fundamental connections between the human brain and events,it is essential to utilize clustering for cognitive research.Dealing with noisy data caused by inaccurate synthesis from several sources or misleading data production processes is one of the most intriguing clustering difficulties.Noisy data can lead to incorrect object recognition and inference.This research aims to innovate a novel clustering approach,named Picture-Neutrosophic Trusted Safe Semi-Supervised Fuzzy Clustering(PNTS3FCM),to solve the clustering problem with noisy data using neutral and refusal degrees in the definition of Picture Fuzzy Set(PFS)and Neutrosophic Set(NS).Our contribution is to propose a new optimization model with four essential components:clustering,outlier removal,safe semi-supervised fuzzy clustering and partitioning with labeled and unlabeled data.The effectiveness and flexibility of the proposed technique are estimated and compared with the state-of-art methods,standard Picture fuzzy clustering(FC-PFS)and Confidence-weighted safe semi-supervised clustering(CS3FCM)on benchmark UCI datasets.The experimental results show that our method is better at least 10/15 datasets than the compared methods in terms of clustering quality and computational time. 展开更多
关键词 Safe semi-supervised fuzzy clustering picture fuzzy set neutrosophic set data partition with noises fuzzy clustering
下载PDF
Energy Aware Clustering with Medical Data Classification Model in IoT Environment
12
作者 R.Bharathi T.Abirami 《Computer Systems Science & Engineering》 SCIE EI 2023年第1期797-811,共15页
With the exponential developments of wireless networking and inexpensive Internet of Things(IoT),a wide range of applications has been designed to attain enhanced services.Due to the limited energy capacity of IoT dev... With the exponential developments of wireless networking and inexpensive Internet of Things(IoT),a wide range of applications has been designed to attain enhanced services.Due to the limited energy capacity of IoT devices,energy-aware clustering techniques can be highly preferable.At the same time,artificial intelligence(AI)techniques can be applied to perform appropriate disease diagnostic processes.With this motivation,this study designs a novel squirrel search algorithm-based energy-aware clustering with a medical data classification(SSAC-MDC)model in an IoT environment.The goal of the SSAC-MDC technique is to attain maximum energy efficiency and disease diagnosis in the IoT environment.The proposed SSAC-MDC technique involves the design of the squirrel search algorithm-based clustering(SSAC)technique to choose the proper set of cluster heads(CHs)and construct clusters.Besides,the medical data classification process involves three different subprocesses namely pre-processing,autoencoder(AE)based classification,and improved beetle antenna search(IBAS)based parameter tuning.The design of the SSAC technique and IBAS based parameter optimization processes show the novelty of the work.For show-casing the improved performance of the SSAC-MDC technique,a series of experiments were performed and the comparative results highlighted the supremacy of the SSAC-MDC technique over the recent methods. 展开更多
关键词 Internet of things healthcare medical data classification energy efficiency clustering autoencoder
下载PDF
Unsupervised Functional Data Clustering Based on Adaptive Weights
13
作者 Yutong Gao Shuang Chen 《Open Journal of Statistics》 2023年第2期212-221,共10页
In recent years, functional data has been widely used in finance, medicine, biology and other fields. The current clustering analysis can solve the problems in finite-dimensional space, but it is difficult to be direc... In recent years, functional data has been widely used in finance, medicine, biology and other fields. The current clustering analysis can solve the problems in finite-dimensional space, but it is difficult to be directly used for the clustering of functional data. In this paper, we propose a new unsupervised clustering algorithm based on adaptive weights. In the absence of initialization parameter, we use entropy-type penalty terms and fuzzy partition matrix to find the optimal number of clusters. At the same time, we introduce a measure based on adaptive weights to reflect the difference in information content between different clustering metrics. Simulation experiments show that the proposed algorithm has higher purity than some algorithms. 展开更多
关键词 Functional data Unsupervised Learning clustering Functional Principal Component Analysis Adaptive Weight
下载PDF
Time series clustering of COVID-19 pandemic-related data
14
作者 Zhixue Luo Lin Zhang +1 位作者 Na Liu Ye Wu 《Data Science and Management》 2023年第2期79-87,共9页
The COVID-19 pandemic continues to impact daily life worldwide.It would be helpful and valuable if we could obtain valid information from the COVID-19 pandemic sequential data itself for characterizing the pandemic.He... The COVID-19 pandemic continues to impact daily life worldwide.It would be helpful and valuable if we could obtain valid information from the COVID-19 pandemic sequential data itself for characterizing the pandemic.Here,we aim to demonstrate that it is feasible to analyze the patterns of the pandemic using a time-series clustering approach.In this work,we use dynamic time warping distance and hierarchical clustering to cluster time series of daily new cases and deaths from different countries into four patterns.It is found that geographic factors have a large but not decisive influence on the pattern of pandemic development.Moreover,the age structure of the population may also influence the formation of cluster patterns.Our proven valid method may provide a different but very useful perspective for other scholars and researchers. 展开更多
关键词 Pandemic time series SARS-CoV-2 COVID-19 Time-series clustering Sequence data
下载PDF
Clustering algorithm for multiple data streams based on spectral component similarity 被引量:1
15
作者 邹凌君 陈崚 屠莉 《Journal of Southeast University(English Edition)》 EI CAS 2008年第3期264-266,共3页
A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR... A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR) modeling technique to measure correlations between data streams.It exploits estimated frequencies spectra to extract the essential features of streams.Each stream is represented as the sum of spectral components and the correlation is measured component-wise.Each spectral component is described by four parameters,namely,amplitude,phase,damping rate and frequency.The ε-lag-correlation between two spectral components is calculated.The algorithm uses such information as similarity measures in clustering data streams.Based on a sliding window model,the algorithm can continuously report the most recent clustering results and adjust the number of clusters.Experiments on real and synthetic streams show that the proposed clustering method has a higher speed and clustering quality than other similar methods. 展开更多
关键词 data streams clustering AR model spectral component
下载PDF
Adaptive Density-Based Spatial Clustering of Applications with Noise(ADBSCAN)for Clusters of Different Densities 被引量:3
16
作者 Ahmed Fahim 《Computers, Materials & Continua》 SCIE EI 2023年第5期3695-3712,共18页
Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Sp... Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Spatial Clustering of Applications with Noise(DBSCAN).It identifies clusters by grouping the densely connected objects into one group and discarding the noise objects.It requires two input parameters:epsilon(fixed neighborhood radius)and MinPts(the lowest number of objects in epsilon).However,it can’t handle clusters of various densities since it uses a global value for epsilon.This article proposes an adaptation of the DBSCAN method so it can discover clusters of varied densities besides reducing the required number of input parameters to only one.Only user input in the proposed method is the MinPts.Epsilon on the other hand,is computed automatically based on statistical information of the dataset.The proposed method finds the core distance for each object in the dataset,takes the average of these distances as the first value of epsilon,and finds the clusters satisfying this density level.The remaining unclustered objects will be clustered using a new value of epsilon that equals the average core distances of unclustered objects.This process continues until all objects have been clustered or the remaining unclustered objects are less than 0.006 of the dataset’s size.The proposed method requires MinPts only as an input parameter because epsilon is computed from data.Benchmark datasets were used to evaluate the effectiveness of the proposed method that produced promising results.Practical experiments demonstrate that the outstanding ability of the proposed method to detect clusters of different densities even if there is no separation between them.The accuracy of the method ranges from 92%to 100%for the experimented datasets. 展开更多
关键词 Adaptive DBSCAN(ADBSCAN) Density-based clustering data clustering Varied density clusters
下载PDF
Fully Automated Density-Based Clustering Method 被引量:1
17
作者 Bilal Bataineh Ahmad A.Alzahrani 《Computers, Materials & Continua》 SCIE EI 2023年第8期1833-1851,共19页
Cluster analysis is a crucial technique in unsupervised machine learning,pattern recognition,and data analysis.However,current clustering algorithms suffer from the need for manual determination of parameter values,lo... Cluster analysis is a crucial technique in unsupervised machine learning,pattern recognition,and data analysis.However,current clustering algorithms suffer from the need for manual determination of parameter values,low accuracy,and inconsistent performance concerning data size and structure.To address these challenges,a novel clustering algorithm called the fully automated density-based clustering method(FADBC)is proposed.The FADBC method consists of two stages:parameter selection and cluster extraction.In the first stage,a proposed method extracts optimal parameters for the dataset,including the epsilon size and a minimum number of points thresholds.These parameters are then used in a density-based technique to scan each point in the dataset and evaluate neighborhood densities to find clusters.The proposed method was evaluated on different benchmark datasets andmetrics,and the experimental results demonstrate its competitive performance without requiring manual inputs.The results show that the FADBC method outperforms well-known clustering methods such as the agglomerative hierarchical method,k-means,spectral clustering,DBSCAN,FCDCSD,Gaussian mixtures,and density-based spatial clustering methods.It can handle any kind of data set well and perform excellently. 展开更多
关键词 Automated clustering data mining density-based clustering unsupervised machine learning
下载PDF
Scaling up the DBSCAN Algorithm for Clustering Large Spatial Databases Based on Sampling Technique 被引量:9
18
作者 Guan Ji hong 1, Zhou Shui geng 2, Bian Fu ling 3, He Yan xiang 1 1. School of Computer, Wuhan University, Wuhan 430072, China 2.State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China 3.College of Remote Sensin 《Wuhan University Journal of Natural Sciences》 CAS 2001年第Z1期467-473,共7页
Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni... Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases. 展开更多
关键词 spatial databases data mining clustering sampling DBSCAN algorithm
下载PDF
Clustering Structure Analysis in Time-Series Data With Density-Based Clusterability Measure 被引量:6
19
作者 Juho Jokinen Tomi Raty Timo Lintonen 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2019年第6期1332-1343,共12页
Clustering is used to gain an intuition of the struc tures in the data.Most of the current clustering algorithms pro duce a clustering structure even on data that do not possess such structure.In these cases,the algor... Clustering is used to gain an intuition of the struc tures in the data.Most of the current clustering algorithms pro duce a clustering structure even on data that do not possess such structure.In these cases,the algorithms force a structure in the data instead of discovering one.To avoid false structures in the relations of data,a novel clusterability assessment method called density-based clusterability measure is proposed in this paper.I measures the prominence of clustering structure in the data to evaluate whether a cluster analysis could produce a meaningfu insight to the relationships in the data.This is especially useful in time-series data since visualizing the structure in time-series data is hard.The performance of the clusterability measure is evalu ated against several synthetic data sets and time-series data sets which illustrate that the density-based clusterability measure can successfully indicate clustering structure of time-series data. 展开更多
关键词 clustering EXPLORATORY data analysis time-series UNSUPERVISED LEARNING
下载PDF
Local and global approaches of affinity propagation clustering for large scale data 被引量:15
20
作者 Ding-yin XIA Fei WU +1 位作者 Xu-qing ZHAN Yue-ting ZHUANG 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2008年第10期1373-1381,共9页
Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster ... Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagation (LAP). PAP passes messages in the subsets of data first and then merges them as the number of initial step of iterations; it can effectively reduce the number of iterations of clustering. LAP passes messages between the landmark data points first and then clusters non-landmark data points; it is a large global approximation method to speed up clustering. Experiments are conducted on many datasets, such as random data points, manifold subspaces, images of faces and Chinese calligraphy, and the results demonstrate that the two ap-proaches are feasible and practicable. 展开更多
关键词 clustering Affinity propagation Large scale data Partition affinity propagation Landmark affinity propagation
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部