Finding a suitable space is one of the most critical problems for dimensionality reduction. Each space corresponds to a distance metric defined on the sample attributes, and thus finding a suitable space can be conver...Finding a suitable space is one of the most critical problems for dimensionality reduction. Each space corresponds to a distance metric defined on the sample attributes, and thus finding a suitable space can be converted to develop an effective distance metric. Most existing dimensionality reduction methods use a fixed pre-specified distance metric. However, this easy treatment has some limitations in practice due to the fact the pre-specified metric is not going to warranty that the closest samples are the truly similar ones. In this paper, we present an adaptive metric learning method for dimensionality reduction, called AML. The adaptive metric learning model is developed by maximizing the difference of the distances between the data pairs in cannot-links and those in must-links. Different from many existing papers that use the traditional Euclidean distance, we use the more generalized l<sub>2,p</sub>-norm distance to reduce sensitivity to noise and outliers, which incorporates additional flexibility and adaptability due to the selection of appropriate p-values for different data sets. Moreover, considering traditional metric learning methods usually project samples into a linear subspace, which is overstrict. We extend the basic linear method to a more powerful nonlinear kernel case so that well capturing complex nonlinear relationship between data. To solve our objective, we have derived an efficient iterative algorithm. Extensive experiments for dimensionality reduction are provided to demonstrate the superiority of our method over state-of-the-art approaches.展开更多
As a kind of weaker supervisory information, pairwise constraints can be exploited to guide the data analysis process, such as data clustering. This paper formulates pairwise constraint propagation, which aims to pred...As a kind of weaker supervisory information, pairwise constraints can be exploited to guide the data analysis process, such as data clustering. This paper formulates pairwise constraint propagation, which aims to predict the large quantity of unknown constraints from scarce known constraints, as a low-rank matrix recovery(LMR) problem. Although recent advances in transductive learning based on matrix completion can be directly adopted to solve this problem, our work intends to develop a more general low-rank matrix recovery solution for pairwise constraint propagation, which not only completes the unknown entries in the constraint matrix but also removes the noise from the data matrix. The problem can be effectively solved using an augmented Lagrange multiplier method. Experimental results on constrained clustering tasks based on the propagated pairwise constraints have shown that our method can obtain more stable results than state-of-the-art algorithms,and outperform them.展开更多
Clustering is widely exploited in data mining.It has been proved that embedding weak label prior into clustering is effective to promote its performance.Previous researches mainly focus on only one type of prior.Howev...Clustering is widely exploited in data mining.It has been proved that embedding weak label prior into clustering is effective to promote its performance.Previous researches mainly focus on only one type of prior.However,in many real scenarios,two kinds of weak label prior information,e.g.,pairwise constraints and cluster ratio,are easily obtained or already available.How to incorporate them to improve clustering performance is important but rarely studied.We propose a novel constrained Clustering with Weak Label Prior method(CWLP),which is an integrated framework.Within the unified spectral clustering model,the pairwise constraints are employed as a regularizer in spectral embedding and label proportion is added as a constraint in spectral rotation.To approximate a variant of the embedding matrix more precisely,we replace a cluster indicator matrix with its scaled version.Instead of fixing an initial similarity matrix,we propose a new similarity matrix that is more suitable for deriving clustering results.Except for the theoretical convergence and computational complexity analyses,we validate the effectiveness of CWLP through several benchmark datasets,together with its ability to discriminate suspected breast cancer patients from healthy controls.The experimental evaluation illustrates the superiority of our proposed approach.展开更多
Over the past few decades, latent variable model (LVM)-based algorithms have attracted consid- erable attention for the purpose of data diInensional- ity reduction, which plays an important role in machine learning,...Over the past few decades, latent variable model (LVM)-based algorithms have attracted consid- erable attention for the purpose of data diInensional- ity reduction, which plays an important role in machine learning, pattern recognition, and computer vision. LVM is an effective tool for modeling density of the observed data. It has been used in dimensionality reduction for dealing with the sparse observed samples. In this paper, two LVM-based dimensionality reduction algorithms are presented firstly, i.e., supervised Gaussian process la- tent variable model and senti-supervised Gaussian pro- cess latent variable model. Then, we propose an LVM- based transfer learning model to cope with the case that samples are not independent identically distributed. In the end of each part, experimental results are given to demonstrate the validity of the proposed dimensionality reduction algorithms.展开更多
文摘Finding a suitable space is one of the most critical problems for dimensionality reduction. Each space corresponds to a distance metric defined on the sample attributes, and thus finding a suitable space can be converted to develop an effective distance metric. Most existing dimensionality reduction methods use a fixed pre-specified distance metric. However, this easy treatment has some limitations in practice due to the fact the pre-specified metric is not going to warranty that the closest samples are the truly similar ones. In this paper, we present an adaptive metric learning method for dimensionality reduction, called AML. The adaptive metric learning model is developed by maximizing the difference of the distances between the data pairs in cannot-links and those in must-links. Different from many existing papers that use the traditional Euclidean distance, we use the more generalized l<sub>2,p</sub>-norm distance to reduce sensitivity to noise and outliers, which incorporates additional flexibility and adaptability due to the selection of appropriate p-values for different data sets. Moreover, considering traditional metric learning methods usually project samples into a linear subspace, which is overstrict. We extend the basic linear method to a more powerful nonlinear kernel case so that well capturing complex nonlinear relationship between data. To solve our objective, we have derived an efficient iterative algorithm. Extensive experiments for dimensionality reduction are provided to demonstrate the superiority of our method over state-of-the-art approaches.
基金supported by the National Natural Science Foundation of China (No. 61300164)
文摘As a kind of weaker supervisory information, pairwise constraints can be exploited to guide the data analysis process, such as data clustering. This paper formulates pairwise constraint propagation, which aims to predict the large quantity of unknown constraints from scarce known constraints, as a low-rank matrix recovery(LMR) problem. Although recent advances in transductive learning based on matrix completion can be directly adopted to solve this problem, our work intends to develop a more general low-rank matrix recovery solution for pairwise constraint propagation, which not only completes the unknown entries in the constraint matrix but also removes the noise from the data matrix. The problem can be effectively solved using an augmented Lagrange multiplier method. Experimental results on constrained clustering tasks based on the propagated pairwise constraints have shown that our method can obtain more stable results than state-of-the-art algorithms,and outperform them.
基金supported by the National Key R&D Program(No.2022ZD0114803)the National Natural Science Foundation of China(Grant Nos.62136005,61922087).
文摘Clustering is widely exploited in data mining.It has been proved that embedding weak label prior into clustering is effective to promote its performance.Previous researches mainly focus on only one type of prior.However,in many real scenarios,two kinds of weak label prior information,e.g.,pairwise constraints and cluster ratio,are easily obtained or already available.How to incorporate them to improve clustering performance is important but rarely studied.We propose a novel constrained Clustering with Weak Label Prior method(CWLP),which is an integrated framework.Within the unified spectral clustering model,the pairwise constraints are employed as a regularizer in spectral embedding and label proportion is added as a constraint in spectral rotation.To approximate a variant of the embedding matrix more precisely,we replace a cluster indicator matrix with its scaled version.Instead of fixing an initial similarity matrix,we propose a new similarity matrix that is more suitable for deriving clustering results.Except for the theoretical convergence and computational complexity analyses,we validate the effectiveness of CWLP through several benchmark datasets,together with its ability to discriminate suspected breast cancer patients from healthy controls.The experimental evaluation illustrates the superiority of our proposed approach.
文摘Over the past few decades, latent variable model (LVM)-based algorithms have attracted consid- erable attention for the purpose of data diInensional- ity reduction, which plays an important role in machine learning, pattern recognition, and computer vision. LVM is an effective tool for modeling density of the observed data. It has been used in dimensionality reduction for dealing with the sparse observed samples. In this paper, two LVM-based dimensionality reduction algorithms are presented firstly, i.e., supervised Gaussian process la- tent variable model and senti-supervised Gaussian pro- cess latent variable model. Then, we propose an LVM- based transfer learning model to cope with the case that samples are not independent identically distributed. In the end of each part, experimental results are given to demonstrate the validity of the proposed dimensionality reduction algorithms.