Clustering is widely exploited in data mining.It has been proved that embedding weak label prior into clustering is effective to promote its performance.Previous researches mainly focus on only one type of prior.Howev...Clustering is widely exploited in data mining.It has been proved that embedding weak label prior into clustering is effective to promote its performance.Previous researches mainly focus on only one type of prior.However,in many real scenarios,two kinds of weak label prior information,e.g.,pairwise constraints and cluster ratio,are easily obtained or already available.How to incorporate them to improve clustering performance is important but rarely studied.We propose a novel constrained Clustering with Weak Label Prior method(CWLP),which is an integrated framework.Within the unified spectral clustering model,the pairwise constraints are employed as a regularizer in spectral embedding and label proportion is added as a constraint in spectral rotation.To approximate a variant of the embedding matrix more precisely,we replace a cluster indicator matrix with its scaled version.Instead of fixing an initial similarity matrix,we propose a new similarity matrix that is more suitable for deriving clustering results.Except for the theoretical convergence and computational complexity analyses,we validate the effectiveness of CWLP through several benchmark datasets,together with its ability to discriminate suspected breast cancer patients from healthy controls.The experimental evaluation illustrates the superiority of our proposed approach.展开更多
Label distribution learning(LDL)is a new learning paradigm to deal with label ambiguity and many researches have achieved the prominent performances.Compared with traditional supervised learning scenarios,the annotati...Label distribution learning(LDL)is a new learning paradigm to deal with label ambiguity and many researches have achieved the prominent performances.Compared with traditional supervised learning scenarios,the annotation with label distribution is more expensive.Direct use of existing active learning(AL)approaches,which aim to reduce the annotation cost in traditional learning,may lead to the degradation of their performance.To deal with the problem of high annotation cost in LDL,we propose the active label distribution learning via kernel maximum mean discrepancy(ALDL-kMMD)method to tackle this crucial but rarely studied problem.ALDL-kMMD captures the structural information of both data and label,extracts the most representative instances from the unlabeled ones by incorporating the nonlinear model and marginal probability distribution matching.Besides,it is also able to markedly decrease the amount of queried unlabeled instances.Meanwhile,an effective solution is proposed for the original optimization problem of ALDL-kMMD by constructing auxiliary variables.The effectiveness of our method is validated with experiments on the real-world datasets.展开更多
Image classification is vital and basic in many data analysis domains.Since real-world images generally contain multiple diverse semantic labels,it amounts to a typical multi-label classification problem.Traditional m...Image classification is vital and basic in many data analysis domains.Since real-world images generally contain multiple diverse semantic labels,it amounts to a typical multi-label classification problem.Traditional multi-label image classification relies on a large amount of training data with plenty of labels,which requires a lot of human and financial costs.By contrast,one can easily obtain a correlation matrix of concerned categories in current scene based on the historical image data in other application scenarios.How to perform image classification with only label correlation priors,without specific and costly annotated labels,is an important but rarely studied problem.In this paper,we propose a model to classify images with this kind of weak correlation prior.We use label correlation to recapitulate the sample similarity,employ the prior information to decompose the projection matrix when regressing the label indication matrix,and introduce the L_(2,1) norm to select features for each image.Finally,experimental results on several image datasets demonstrate that the proposed model has distinct advantages over current state-of-the-art multi-label classification methods.展开更多
基金supported by the National Key R&D Program(No.2022ZD0114803)the National Natural Science Foundation of China(Grant Nos.62136005,61922087).
文摘Clustering is widely exploited in data mining.It has been proved that embedding weak label prior into clustering is effective to promote its performance.Previous researches mainly focus on only one type of prior.However,in many real scenarios,two kinds of weak label prior information,e.g.,pairwise constraints and cluster ratio,are easily obtained or already available.How to incorporate them to improve clustering performance is important but rarely studied.We propose a novel constrained Clustering with Weak Label Prior method(CWLP),which is an integrated framework.Within the unified spectral clustering model,the pairwise constraints are employed as a regularizer in spectral embedding and label proportion is added as a constraint in spectral rotation.To approximate a variant of the embedding matrix more precisely,we replace a cluster indicator matrix with its scaled version.Instead of fixing an initial similarity matrix,we propose a new similarity matrix that is more suitable for deriving clustering results.Except for the theoretical convergence and computational complexity analyses,we validate the effectiveness of CWLP through several benchmark datasets,together with its ability to discriminate suspected breast cancer patients from healthy controls.The experimental evaluation illustrates the superiority of our proposed approach.
基金partially supported by the National Natural Science Fundation of China(Grant Nos.61922087,61906201 and 62006238)the Science and Technology Innovation Program of Hunan Province(2021RC3070).
文摘Label distribution learning(LDL)is a new learning paradigm to deal with label ambiguity and many researches have achieved the prominent performances.Compared with traditional supervised learning scenarios,the annotation with label distribution is more expensive.Direct use of existing active learning(AL)approaches,which aim to reduce the annotation cost in traditional learning,may lead to the degradation of their performance.To deal with the problem of high annotation cost in LDL,we propose the active label distribution learning via kernel maximum mean discrepancy(ALDL-kMMD)method to tackle this crucial but rarely studied problem.ALDL-kMMD captures the structural information of both data and label,extracts the most representative instances from the unlabeled ones by incorporating the nonlinear model and marginal probability distribution matching.Besides,it is also able to markedly decrease the amount of queried unlabeled instances.Meanwhile,an effective solution is proposed for the original optimization problem of ALDL-kMMD by constructing auxiliary variables.The effectiveness of our method is validated with experiments on the real-world datasets.
基金supported by the National Natural Science Foundation of China(Nos.61922087,61906201,62006238,and 62136005)the Natural Science Fund for Distinguished Young Scholars of Hunan Province(No.2019JJ20020).
文摘Image classification is vital and basic in many data analysis domains.Since real-world images generally contain multiple diverse semantic labels,it amounts to a typical multi-label classification problem.Traditional multi-label image classification relies on a large amount of training data with plenty of labels,which requires a lot of human and financial costs.By contrast,one can easily obtain a correlation matrix of concerned categories in current scene based on the historical image data in other application scenarios.How to perform image classification with only label correlation priors,without specific and costly annotated labels,is an important but rarely studied problem.In this paper,we propose a model to classify images with this kind of weak correlation prior.We use label correlation to recapitulate the sample similarity,employ the prior information to decompose the projection matrix when regressing the label indication matrix,and introduce the L_(2,1) norm to select features for each image.Finally,experimental results on several image datasets demonstrate that the proposed model has distinct advantages over current state-of-the-art multi-label classification methods.