In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world da...In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world data,particularly in the field of medical imaging.Traditional deep subspace clustering algorithms,which are mostly unsupervised,are limited in their ability to effectively utilize the inherent prior knowledge in medical images.Our MAS-DSC algorithm incorporates a semi-supervised learning framework that uses a small amount of labeled data to guide the clustering process,thereby enhancing the discriminative power of the feature representations.Additionally,the multi-scale feature extraction mechanism is designed to adapt to the complexity of medical imaging data,resulting in more accurate clustering performance.To address the difficulty of hyperparameter selection in deep subspace clustering,this paper employs a Bayesian optimization algorithm for adaptive tuning of hyperparameters related to subspace clustering,prior knowledge constraints,and model loss weights.Extensive experiments on standard clustering datasets,including ORL,Coil20,and Coil100,validate the effectiveness of the MAS-DSC algorithm.The results show that with its multi-scale network structure and Bayesian hyperparameter optimization,MAS-DSC achieves excellent clustering results on these datasets.Furthermore,tests on a brain tumor dataset demonstrate the robustness of the algorithm and its ability to leverage prior knowledge for efficient feature extraction and enhanced clustering performance within a semi-supervised learning framework.展开更多
Deep multi-view subspace clustering (DMVSC) based on self-expression has attracted increasing attention dueto its outstanding performance and nonlinear application. However, most existing methods neglect that viewpriv...Deep multi-view subspace clustering (DMVSC) based on self-expression has attracted increasing attention dueto its outstanding performance and nonlinear application. However, most existing methods neglect that viewprivatemeaningless information or noise may interfere with the learning of self-expression, which may lead to thedegeneration of clustering performance. In this paper, we propose a novel framework of Contrastive Consistencyand Attentive Complementarity (CCAC) for DMVsSC. CCAC aligns all the self-expressions of multiple viewsand fuses them based on their discrimination, so that it can effectively explore consistent and complementaryinformation for achieving precise clustering. Specifically, the view-specific self-expression is learned by a selfexpressionlayer embedded into the auto-encoder network for each view. To guarantee consistency across views andreduce the effect of view-private information or noise, we align all the view-specific self-expressions by contrastivelearning. The aligned self-expressions are assigned adaptive weights by channel attention mechanism according totheir discrimination. Then they are fused by convolution kernel to obtain consensus self-expression withmaximumcomplementarity ofmultiple views. Extensive experimental results on four benchmark datasets and one large-scaledataset of the CCAC method outperformother state-of-the-artmethods, demonstrating its clustering effectiveness.展开更多
Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The signif...Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The significance of low-rank prior in MVSC is emphasized, highlighting its role in capturing the global data structure across views for improved performance. However, it faces challenges with outlier sensitivity due to its reliance on the Frobenius norm for error measurement. Addressing this, our paper proposes a Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization (LMVSC- Sparse) approach. Sparse regularization helps in selecting the most relevant features or views for clustering while ignoring irrelevant or noisy ones. This leads to a more efficient and effective representation of the data, improving the clustering accuracy and robustness, especially in the presence of outliers or noisy data. By incorporating sparse regularization, LMVSC-Sparse can effectively handle outlier sensitivity, which is a common challenge in traditional MVSC methods relying solely on low-rank priors. Then Alternating Direction Method of Multipliers (ADMM) algorithm is employed to solve the proposed optimization problems. Our comprehensive experiments demonstrate the efficiency and effectiveness of LMVSC-Sparse, offering a robust alternative to traditional MVSC methods.展开更多
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac...Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.展开更多
In recent years,the soft subspace clustering algorithm has shown good results for high-dimensional data,which can assign different weights to each cluster class and use weights to measure the contribution of each dime...In recent years,the soft subspace clustering algorithm has shown good results for high-dimensional data,which can assign different weights to each cluster class and use weights to measure the contribution of each dimension in various features.The enhanced soft subspace clustering algorithm combines interclass separation and intraclass tightness information,which has strong results for image segmentation,but the clustering algorithm is vulnerable to noisy data and dependence on the initialized clustering center.However,the clustering algorithmis susceptible to the influence of noisydata and reliance on initializedclustering centers andfalls into a local optimum;the clustering effect is poor for brain MR images with unclear boundaries and noise effects.To address these problems,a soft subspace clustering algorithm for brain MR images based on genetic algorithm optimization is proposed,which combines the generalized noise technique,relaxes the equational weight constraint in the objective function as the boundary constraint,and uses a genetic algorithm as a method to optimize the initialized clustering center.The genetic algorithm finds the best clustering center and reduces the algorithm’s dependence on the initial clustering center.The experiment verifies the robustness of the algorithm,as well as the noise immunity in various ways and shows good results on the common dataset and the brain MR images provided by the Changshu First People’s Hospital with specific high accuracy for clinical medicine.展开更多
Subspace clustering methods which embrace a self-expressive model that represents each data point as a linear combination of other data points in the dataset provide powerful unsupervised learning techniques.However,w...Subspace clustering methods which embrace a self-expressive model that represents each data point as a linear combination of other data points in the dataset provide powerful unsupervised learning techniques.However,when dealing with large datasets,representation of each data point by referring to all data points via a dictionary suffers from high computational complexity.To alleviate this issue,we introduce a parallelizable multi-subset based self-expressive model(PMS)which represents each data point by combining multiple subsets,with each consisting of only a small proportion of the samples.The adoption of PMS in subspace clustering(PMSSC)leads to computational advantages because the optimization problems decomposed over each subset are small,and can be solved efficiently in parallel.Furthermore,PMSSC is able to combine multiple self-expressive coefficient vectors obtained from subsets,which contributes to an improvement in self-expressiveness.Extensive experiments on synthetic and real-world datasets show the efficiency and effectiveness of our approach in comparison to other methods.展开更多
Sparse subspace clustering(SSC)is a spectral clustering methodology.Since high-dimensional data are often dispersed over the union of many low-dimensional subspaces,their representation in a suitable dictionary is spa...Sparse subspace clustering(SSC)is a spectral clustering methodology.Since high-dimensional data are often dispersed over the union of many low-dimensional subspaces,their representation in a suitable dictionary is sparse.Therefore,SSC is an effective technology for diagnosing mechanical system faults.Its main purpose is to create a representation model that can reveal the real subspace structure of high-dimensional data,construct a similarity matrix by using the sparse representation coefficients of high-dimensional data,and then cluster the obtained representation coefficients and similarity matrix in subspace.However,the design of SSC algorithm is based on global expression in which each data point is represented by all possible cluster data points.This leads to nonzero terms in nondiagonal blocks of similar matrices,which reduces the recognition performance of matrices.To improve the clustering ability of SSC for rolling bearing and the robustness of the algorithm in the presence of a large number of background noise,a simultaneous dimensionality reduction subspace clustering technology is provided in this work.Through the feature extraction of envelope signal,the dimension of the feature matrix is reduced by singular value decomposition,and the Euclidean distance between samples is replaced by correlation distance.A dimension reduction graph-based SSC technology is established.Simulation and bearing data of Western Reserve University show that the proposed algorithm can improve the accuracy and compactness of clustering.展开更多
Many recently proposed subspace clustering methods suffer from two severe problems.First,the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters.Second,the ...Many recently proposed subspace clustering methods suffer from two severe problems.First,the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters.Second,the clustering results are often sensitive to input parameters.In this paper,a fast algorithm of subspace clustering using attribute clustering is proposed to overcome these limitations.This algorithm first filters out redundant attributes by computing the Gini coef-ficient.To evaluate the correlation of every two non-redundant attributes,the relation matrix of non-redund-ant attributes is constructed based on the relation function of two dimensional united Gini coefficients.After applying an overlapping clustering algorithm on the relation matrix,the candidate of all interesting subspaces is achieved.Finally,all subspace clusters can be derived by clustering on interesting subspaces.Experiments on both synthesis and real datasets show that the new algorithm not only achieves a significant gain of runtime and quality to find subspace clusters,but also is insensitive to input parameters.展开更多
As a kind of statistical method, the technique of Hidden Markov Model (HMM) is widely used for speech recognition. In order to train the HMM to be more effective with much less amount of data, the Subspace Distribut...As a kind of statistical method, the technique of Hidden Markov Model (HMM) is widely used for speech recognition. In order to train the HMM to be more effective with much less amount of data, the Subspace Distribution Clustering Hidden Markov Model (SDCHMM), derived from the Continuous Density Hidden Markov Model (CDHMM), is introduced. With parameter tying, a new method to train SDCHMMs is described. Compared with the conventional training method, an SDCHMM recognizer trained by means of the new method achieves higher accuracy and speed. Experiment results show that the SDCHMM recognizer outperforms the CDHMM recognizer on speech recognition of Chinese digits.展开更多
The problem of pattern-based subspace clustering, a special type of subspace clustering that uses pattern similarity as a measure of similarity, is studied. Unlike most traditional clustering algorithms that group the...The problem of pattern-based subspace clustering, a special type of subspace clustering that uses pattern similarity as a measure of similarity, is studied. Unlike most traditional clustering algorithms that group the close values of objects in all the dimensions or a set of dimensions, clustering by pattern similarity shows an interesting pattern, where objects exhibit a coherent pattern of rise and fall in subspaces. A novel approach, named EMaPle to mine the maximal pattern-based subspace clusters, is designed. The EMaPle searches clusters only in the attribute enumeration spaces which are relatively few compared to the large number of row combinations in the typical datasets, and it exploits novel pruning techniques. EMaPle can find the clusters satisfying coherent constraints, size constraints and sign constraints neglected in MaPle. Both synthetic data sets and real data sets are used to evaluate EMaPle and demonstrate that it is more effective and scalable than MaPle.展开更多
High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this prob...High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this problem.The basic idea was to search the line manifold clusters hidden in datasets,and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters.The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure.The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time.The proposed method is able to obtain high clustering accuracy for various data sets with different sizes,manifold dimensions and noise ratios,which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data.展开更多
A lot of 3D shape descriptors for 3D shape retrieval have been presented so far. This paper proposes a new mechanism, which employs several existing global and local 3D shape descriptors as input. With the sparse theo...A lot of 3D shape descriptors for 3D shape retrieval have been presented so far. This paper proposes a new mechanism, which employs several existing global and local 3D shape descriptors as input. With the sparse theory, some descriptors which play the most important role in measuring similarity between query model and the model in the dataset are selected automatically and an affinity matrix is constructed. Spectral clustering method can be implemented to this affinity matrix. Spectral embedding of this affinity matrix can be applied to retrieval, which integrating almost all the advantages of selected descriptors. In order to verify the performance of our approach, we perform experimental comparisons on Princeton Shape Benchmark database. Test results show that our method is a pose-oblivious, efficient and robustness method for either complete or incomplete models.展开更多
Subspace clustering addresses an important problem in clustering multi-dimensional data. In sparse multi-dimensional data, many dimensions are irrelevant and obscure the cluster boundaries. Subspace clustering helps b...Subspace clustering addresses an important problem in clustering multi-dimensional data. In sparse multi-dimensional data, many dimensions are irrelevant and obscure the cluster boundaries. Subspace clustering helps by mining the clusters present in only locally relevant subsets of dimensions. However, understanding the result of subspace clustering by analysts is not trivial. In addition to the grouping information, relevant sets of dimensions and overlaps between groups, both in terms of dimensions and records, need to be analyzed. We introduce a visual subspace cluster analysis system called ClustNails. It integrates several novel visualization techniques with various user interaction facilities to support navigating and interpreting the result of subspace clustering. We demonstrate the effectiveness of the proposed system by applying it to the analysis of real world data and comparing it with existing visual subspace cluster analysis systems.展开更多
In this paper, we study a band constrained nonnegative matrix factorization (band NMF) problem: for a given nonnegative matrix Y, decompose it as Y ≈ AX with A a nonnegative matrix and X a nonnegative block band m...In this paper, we study a band constrained nonnegative matrix factorization (band NMF) problem: for a given nonnegative matrix Y, decompose it as Y ≈ AX with A a nonnegative matrix and X a nonnegative block band matrix. This factorization model extends a single low rank subspace model to a mixture of several overlapping low rank subspaces, which not only can provide sparse representation, but also can capture signifi- cant grouping structure from a dataset. Based on overlapping subspace clustering and the capture of the level of overlap between neighbouring subspaces, two simple and practical algorithms are presented to solve the band NMF problem. Numerical experiments on both synthetic data and real images data show that band NMF enhances the performance of NMF in data representation and processing.展开更多
基金supported in part by the National Natural Science Foundation of China under Grant 62171203in part by the Jiangsu Province“333 Project”High-Level Talent Cultivation Subsidized Project+2 种基金in part by the SuzhouKey Supporting Subjects for Health Informatics under Grant SZFCXK202147in part by the Changshu Science and Technology Program under Grants CS202015 and CS202246in part by Changshu Key Laboratory of Medical Artificial Intelligence and Big Data under Grants CYZ202301 and CS202314.
文摘In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world data,particularly in the field of medical imaging.Traditional deep subspace clustering algorithms,which are mostly unsupervised,are limited in their ability to effectively utilize the inherent prior knowledge in medical images.Our MAS-DSC algorithm incorporates a semi-supervised learning framework that uses a small amount of labeled data to guide the clustering process,thereby enhancing the discriminative power of the feature representations.Additionally,the multi-scale feature extraction mechanism is designed to adapt to the complexity of medical imaging data,resulting in more accurate clustering performance.To address the difficulty of hyperparameter selection in deep subspace clustering,this paper employs a Bayesian optimization algorithm for adaptive tuning of hyperparameters related to subspace clustering,prior knowledge constraints,and model loss weights.Extensive experiments on standard clustering datasets,including ORL,Coil20,and Coil100,validate the effectiveness of the MAS-DSC algorithm.The results show that with its multi-scale network structure and Bayesian hyperparameter optimization,MAS-DSC achieves excellent clustering results on these datasets.Furthermore,tests on a brain tumor dataset demonstrate the robustness of the algorithm and its ability to leverage prior knowledge for efficient feature extraction and enhanced clustering performance within a semi-supervised learning framework.
文摘Deep multi-view subspace clustering (DMVSC) based on self-expression has attracted increasing attention dueto its outstanding performance and nonlinear application. However, most existing methods neglect that viewprivatemeaningless information or noise may interfere with the learning of self-expression, which may lead to thedegeneration of clustering performance. In this paper, we propose a novel framework of Contrastive Consistencyand Attentive Complementarity (CCAC) for DMVsSC. CCAC aligns all the self-expressions of multiple viewsand fuses them based on their discrimination, so that it can effectively explore consistent and complementaryinformation for achieving precise clustering. Specifically, the view-specific self-expression is learned by a selfexpressionlayer embedded into the auto-encoder network for each view. To guarantee consistency across views andreduce the effect of view-private information or noise, we align all the view-specific self-expressions by contrastivelearning. The aligned self-expressions are assigned adaptive weights by channel attention mechanism according totheir discrimination. Then they are fused by convolution kernel to obtain consensus self-expression withmaximumcomplementarity ofmultiple views. Extensive experimental results on four benchmark datasets and one large-scaledataset of the CCAC method outperformother state-of-the-artmethods, demonstrating its clustering effectiveness.
文摘Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The significance of low-rank prior in MVSC is emphasized, highlighting its role in capturing the global data structure across views for improved performance. However, it faces challenges with outlier sensitivity due to its reliance on the Frobenius norm for error measurement. Addressing this, our paper proposes a Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization (LMVSC- Sparse) approach. Sparse regularization helps in selecting the most relevant features or views for clustering while ignoring irrelevant or noisy ones. This leads to a more efficient and effective representation of the data, improving the clustering accuracy and robustness, especially in the presence of outliers or noisy data. By incorporating sparse regularization, LMVSC-Sparse can effectively handle outlier sensitivity, which is a common challenge in traditional MVSC methods relying solely on low-rank priors. Then Alternating Direction Method of Multipliers (ADMM) algorithm is employed to solve the proposed optimization problems. Our comprehensive experiments demonstrate the efficiency and effectiveness of LMVSC-Sparse, offering a robust alternative to traditional MVSC methods.
文摘Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.
基金This work was supported in part by the National Natural Science Foundation of China under Grant 62171203in part by the Suzhou Key Supporting Subjects[Health Informatics(No.SZFCXK202147)]+2 种基金in part by the Changshu Science and Technology Program[No.CS202015,CS202246]in part by the Changshu City Health and Health Committee Science and Technology Program[No.csws201913]in part by the“333 High Level Personnel Training Project of Jiangsu Province”.
文摘In recent years,the soft subspace clustering algorithm has shown good results for high-dimensional data,which can assign different weights to each cluster class and use weights to measure the contribution of each dimension in various features.The enhanced soft subspace clustering algorithm combines interclass separation and intraclass tightness information,which has strong results for image segmentation,but the clustering algorithm is vulnerable to noisy data and dependence on the initialized clustering center.However,the clustering algorithmis susceptible to the influence of noisydata and reliance on initializedclustering centers andfalls into a local optimum;the clustering effect is poor for brain MR images with unclear boundaries and noise effects.To address these problems,a soft subspace clustering algorithm for brain MR images based on genetic algorithm optimization is proposed,which combines the generalized noise technique,relaxes the equational weight constraint in the objective function as the boundary constraint,and uses a genetic algorithm as a method to optimize the initialized clustering center.The genetic algorithm finds the best clustering center and reduces the algorithm’s dependence on the initial clustering center.The experiment verifies the robustness of the algorithm,as well as the noise immunity in various ways and shows good results on the common dataset and the brain MR images provided by the Changshu First People’s Hospital with specific high accuracy for clinical medicine.
基金supported by JSPS KAKENHI Grant Number JP20K19568.
文摘Subspace clustering methods which embrace a self-expressive model that represents each data point as a linear combination of other data points in the dataset provide powerful unsupervised learning techniques.However,when dealing with large datasets,representation of each data point by referring to all data points via a dictionary suffers from high computational complexity.To alleviate this issue,we introduce a parallelizable multi-subset based self-expressive model(PMS)which represents each data point by combining multiple subsets,with each consisting of only a small proportion of the samples.The adoption of PMS in subspace clustering(PMSSC)leads to computational advantages because the optimization problems decomposed over each subset are small,and can be solved efficiently in parallel.Furthermore,PMSSC is able to combine multiple self-expressive coefficient vectors obtained from subsets,which contributes to an improvement in self-expressiveness.Extensive experiments on synthetic and real-world datasets show the efficiency and effectiveness of our approach in comparison to other methods.
基金The present work is supported by the National Key R&D Program(No.2020YFB2007700)the National Natural Science Foundation of China(Nos.11790282,11802184,11902205,12002221,12032017)+1 种基金the S&T Program of Hebei(No.20310803D)the Natural Science Foundation of Hebei Province(No.A2020210028).
文摘Sparse subspace clustering(SSC)is a spectral clustering methodology.Since high-dimensional data are often dispersed over the union of many low-dimensional subspaces,their representation in a suitable dictionary is sparse.Therefore,SSC is an effective technology for diagnosing mechanical system faults.Its main purpose is to create a representation model that can reveal the real subspace structure of high-dimensional data,construct a similarity matrix by using the sparse representation coefficients of high-dimensional data,and then cluster the obtained representation coefficients and similarity matrix in subspace.However,the design of SSC algorithm is based on global expression in which each data point is represented by all possible cluster data points.This leads to nonzero terms in nondiagonal blocks of similar matrices,which reduces the recognition performance of matrices.To improve the clustering ability of SSC for rolling bearing and the robustness of the algorithm in the presence of a large number of background noise,a simultaneous dimensionality reduction subspace clustering technology is provided in this work.Through the feature extraction of envelope signal,the dimension of the feature matrix is reduced by singular value decomposition,and the Euclidean distance between samples is replaced by correlation distance.A dimension reduction graph-based SSC technology is established.Simulation and bearing data of Western Reserve University show that the proposed algorithm can improve the accuracy and compactness of clustering.
基金This work was supported by the National Basic Research Program of China(No.2007CB307100)the National Natural Science Foundation of China(Grant No.60432010).
文摘Many recently proposed subspace clustering methods suffer from two severe problems.First,the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters.Second,the clustering results are often sensitive to input parameters.In this paper,a fast algorithm of subspace clustering using attribute clustering is proposed to overcome these limitations.This algorithm first filters out redundant attributes by computing the Gini coef-ficient.To evaluate the correlation of every two non-redundant attributes,the relation matrix of non-redund-ant attributes is constructed based on the relation function of two dimensional united Gini coefficients.After applying an overlapping clustering algorithm on the relation matrix,the candidate of all interesting subspaces is achieved.Finally,all subspace clusters can be derived by clustering on interesting subspaces.Experiments on both synthesis and real datasets show that the new algorithm not only achieves a significant gain of runtime and quality to find subspace clusters,but also is insensitive to input parameters.
基金Supported by the National Natural Science Foundation of China (No.60172048)
文摘As a kind of statistical method, the technique of Hidden Markov Model (HMM) is widely used for speech recognition. In order to train the HMM to be more effective with much less amount of data, the Subspace Distribution Clustering Hidden Markov Model (SDCHMM), derived from the Continuous Density Hidden Markov Model (CDHMM), is introduced. With parameter tying, a new method to train SDCHMMs is described. Compared with the conventional training method, an SDCHMM recognizer trained by means of the new method achieves higher accuracy and speed. Experiment results show that the SDCHMM recognizer outperforms the CDHMM recognizer on speech recognition of Chinese digits.
基金The National Natural Science Foundation of China(No60273075)
文摘The problem of pattern-based subspace clustering, a special type of subspace clustering that uses pattern similarity as a measure of similarity, is studied. Unlike most traditional clustering algorithms that group the close values of objects in all the dimensions or a set of dimensions, clustering by pattern similarity shows an interesting pattern, where objects exhibit a coherent pattern of rise and fall in subspaces. A novel approach, named EMaPle to mine the maximal pattern-based subspace clusters, is designed. The EMaPle searches clusters only in the attribute enumeration spaces which are relatively few compared to the large number of row combinations in the typical datasets, and it exploits novel pruning techniques. EMaPle can find the clusters satisfying coherent constraints, size constraints and sign constraints neglected in MaPle. Both synthetic data sets and real data sets are used to evaluate EMaPle and demonstrate that it is more effective and scalable than MaPle.
基金Project(60835005) supported by the National Nature Science Foundation of China
文摘High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this problem.The basic idea was to search the line manifold clusters hidden in datasets,and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters.The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure.The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time.The proposed method is able to obtain high clustering accuracy for various data sets with different sizes,manifold dimensions and noise ratios,which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data.
基金Supported by National Natural Science Foundation of China(61222206,61173102,U0935004)the One Hundred Talent Project of the Chinese Academy of Sciences
文摘A lot of 3D shape descriptors for 3D shape retrieval have been presented so far. This paper proposes a new mechanism, which employs several existing global and local 3D shape descriptors as input. With the sparse theory, some descriptors which play the most important role in measuring similarity between query model and the model in the dataset are selected automatically and an affinity matrix is constructed. Spectral clustering method can be implemented to this affinity matrix. Spectral embedding of this affinity matrix can be applied to retrieval, which integrating almost all the advantages of selected descriptors. In order to verify the performance of our approach, we perform experimental comparisons on Princeton Shape Benchmark database. Test results show that our method is a pose-oblivious, efficient and robustness method for either complete or incomplete models.
基金Supported by the German Research Foundation,by receivingfunding from the DFG-664/11 Project
文摘Subspace clustering addresses an important problem in clustering multi-dimensional data. In sparse multi-dimensional data, many dimensions are irrelevant and obscure the cluster boundaries. Subspace clustering helps by mining the clusters present in only locally relevant subsets of dimensions. However, understanding the result of subspace clustering by analysts is not trivial. In addition to the grouping information, relevant sets of dimensions and overlaps between groups, both in terms of dimensions and records, need to be analyzed. We introduce a visual subspace cluster analysis system called ClustNails. It integrates several novel visualization techniques with various user interaction facilities to support navigating and interpreting the result of subspace clustering. We demonstrate the effectiveness of the proposed system by applying it to the analysis of real world data and comparing it with existing visual subspace cluster analysis systems.
文摘In this paper, we study a band constrained nonnegative matrix factorization (band NMF) problem: for a given nonnegative matrix Y, decompose it as Y ≈ AX with A a nonnegative matrix and X a nonnegative block band matrix. This factorization model extends a single low rank subspace model to a mixture of several overlapping low rank subspaces, which not only can provide sparse representation, but also can capture signifi- cant grouping structure from a dataset. Based on overlapping subspace clustering and the capture of the level of overlap between neighbouring subspaces, two simple and practical algorithms are presented to solve the band NMF problem. Numerical experiments on both synthetic data and real images data show that band NMF enhances the performance of NMF in data representation and processing.