Label correlations are an essential technique for data mining that solves the possible correlation problem between different labels in multi-label classification.Although this technique is widely used in multi-label c...Label correlations are an essential technique for data mining that solves the possible correlation problem between different labels in multi-label classification.Although this technique is widely used in multi-label classification problems,batch learning deals with most issues,which consumes a lot of time and space resources.Unlike traditional batch learning methods,online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale datasets.However,existing online learning research has done little to consider correlations between labels.On the basis of existing research,this paper proposes a multi-label online learning algorithm based on label correlations by maximizing the interval between related labels and unrelated labels in multi-label samples.We evaluate the performance of the proposed algorithm on several public datasets.Experiments show the effectiveness of our algorithm.展开更多
Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on t...Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on the benchmark datasets have been proposed for multi-label classification task in the literature.Furthermore,several open-source tools implementing these approaches have also been developed.However,the characteristics of real-world multi-label patent and publication datasets are not completely in line with those of benchmark ones.Therefore,the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.Research limitations:Three real-world datasets differ in the following aspects:statement,data quality,and purposes.Additionally,open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection,which in turn impacts the performance of a multi-label classification approach.In the near future,we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings.Practical implications:The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets,underscoring the complexity of real-world multi-label classification tasks.Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels.With ongoing enhancements in deep learning algorithms and large-scale models,it is expected that the efficacy of multi-label classification tasks will be significantly improved,reaching a level of practical utility in the foreseeable future.Originality/value:(1)Seven multi-label classification methods are comprehensively compared on three real-world datasets.(2)The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution.(3)The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution.展开更多
The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challengi...The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challenging.Feature selection aims to mitigate the adverse impacts of high dimensionality in multi-label data by eliminating redundant and irrelevant features.The ant colony optimization algorithm has demonstrated encouraging outcomes in multi-label feature selection,because of its simplicity,efficiency,and similarity to reinforcement learning.Nevertheless,existing methods do not consider crucial correlation information,such as dynamic redundancy and label correlation.To tackle these concerns,the paper proposes a multi-label feature selection technique based on ant colony optimization algorithm(MFACO),focusing on dynamic redundancy and label correlation.Initially,the dynamic redundancy is assessed between the selected feature subset and potential features.Meanwhile,the ant colony optimization algorithm extracts label correlation from the label set,which is then combined into the heuristic factor as label weights.Experimental results demonstrate that our proposed strategies can effectively enhance the optimal search ability of ant colony,outperforming the other algorithms involved in the paper.展开更多
In recent years,multi-label learning has received a lot of attention.However,most of the existing methods only consider global label correlation or local label correlation.In fact,on the one hand,both global and local...In recent years,multi-label learning has received a lot of attention.However,most of the existing methods only consider global label correlation or local label correlation.In fact,on the one hand,both global and local label correlations can appear in real-world situation at same time.On the other hand,we should not be limited to pairwise labels while ignoring the high-order label correlation.In this paper,we propose a novel and effective method called GLLCBN for multi-label learning.Firstly,we obtain the global label correlation by exploiting label semantic similarity.Then,we analyze the pairwise labels in the label space of the data set to acquire the local correlation.Next,we build the original version of the label dependency model by global and local label correlations.After that,we use graph theory,probability theory and Bayesian networks to eliminate redundant dependency structure in the initial version model,so as to get the optimal label dependent model.Finally,we obtain the feature extraction model by adjusting the Inception V3 model of convolution neural network and combine it with the GLLCBN model to achieve the multi-label learning.The experimental results show that our proposed model has better performance than other multi-label learning methods in performance evaluating.展开更多
Multi-label classification problems arise frequently in text categorization, and many other related applications. Like conventional categorization problems, multi-label categorization tasks suffer from the curse of hi...Multi-label classification problems arise frequently in text categorization, and many other related applications. Like conventional categorization problems, multi-label categorization tasks suffer from the curse of high dimensionality. Existing multi-label dimensionality reduction methods mainly suffer from two limitations. First, latent nonlinear structures are not utilized in the input space. Second, the label information is not fully exploited. This paper proposes a new method, multi-label local discriminative embedding (MLDE), which exploits latent structures to minimize intraclass distances and maximize interclass distances on the basis of label correlations. The latent structures are extracted by constructing two sets of adjacency graphs to make use of nonlinear information. Non-symmetric label correlations, which are the case in real applications, are adopted. The problem is formulated into a global objective function and a linear mapping is achieved to solve out-of-sample problems. Empirical studies across 11 Yahoo sub-tasks, Enron and Bibtex are conducted to validate the superiority of MLDE to state-of-art multi-label dimensionality reduction methods.展开更多
Partial label learning aims to learn a multi-class classifier,where each training example corresponds to a set of candidate labels among which only one is correct.Most studies in the label space have only focused on t...Partial label learning aims to learn a multi-class classifier,where each training example corresponds to a set of candidate labels among which only one is correct.Most studies in the label space have only focused on the difference between candidate labels and non-candidate labels.So far,however,there has been little discussion about the label correlation in the partial label learning.This paper begins with a research on the label correlation,followed by the establishment of a unified framework that integrates the label correlation,the adaptive graph,and the semantic difference maximization criterion.This work generates fresh insight into the acquisition of the learning information from the label space.Specifically,the label correlation is calculated from the candidate label set and is utilized to obtain the similarity of each pair of instances in the label space.After that,the labeling confidence for each instance is updated by the smoothness assumption that two instances should be similar outputs in the label space if they are close in the feature space.At last,an effective optimization program is utilized to solve the unified framework.Extensive experiments on artificial and real-world data sets indicate the superiority of our proposed method to state-of-art partial label learning methods.展开更多
Multi-label classification is a challenging problem that has attracted significant attention from researchers, particularly in the domain of image and text attribute annotation. However, multi-label datasets are prone...Multi-label classification is a challenging problem that has attracted significant attention from researchers, particularly in the domain of image and text attribute annotation. However, multi-label datasets are prone to serious intra-class and inter-class imbalance problems, which can significantly degrade the classification performance. To address the above issues, we propose the multi-label weighted broad learning system(MLW-BLS) from the perspective of label imbalance weighting and label correlation mining. Further, we propose the multi-label adaptive weighted broad learning system(MLAW-BLS) to adaptively adjust the specific weights and values of labels of MLW-BLS and construct an efficient imbalanced classifier set. Extensive experiments are conducted on various datasets to evaluate the effectiveness of the proposed model, and the results demonstrate its superiority over other advanced approaches.展开更多
Multi-label learning deals with problems where each example is represented by a single instance while being associated with multiple class labels simultaneously. Binary relevance is arguably the most intuitive solutio...Multi-label learning deals with problems where each example is represented by a single instance while being associated with multiple class labels simultaneously. Binary relevance is arguably the most intuitive solution for learning from multi-label examples. It works by decomposing the multi-label learning task into a number of independent binary learning tasks (one per class label). In view of its potential weakness in ignoring correlations between labels, many correlation-enabling extensions to binary relevance have been proposed in the past decade. In this paper, we aim to review the state of the art of binary relevance from three perspectives. First, basic settings for multi-label learning and binary relevance solutions are briefly summarized. Second, representative strategies to provide binary relevance with label correlation exploitation abilities are discussed. Third, some of our recent studies on binary relevance aimed at issues other than label correlation exploitation are introduced. As a conclusion, we provide suggestions on future research directions.展开更多
Image classification is vital and basic in many data analysis domains.Since real-world images generally contain multiple diverse semantic labels,it amounts to a typical multi-label classification problem.Traditional m...Image classification is vital and basic in many data analysis domains.Since real-world images generally contain multiple diverse semantic labels,it amounts to a typical multi-label classification problem.Traditional multi-label image classification relies on a large amount of training data with plenty of labels,which requires a lot of human and financial costs.By contrast,one can easily obtain a correlation matrix of concerned categories in current scene based on the historical image data in other application scenarios.How to perform image classification with only label correlation priors,without specific and costly annotated labels,is an important but rarely studied problem.In this paper,we propose a model to classify images with this kind of weak correlation prior.We use label correlation to recapitulate the sample similarity,employ the prior information to decompose the projection matrix when regressing the label indication matrix,and introduce the L_(2,1) norm to select features for each image.Finally,experimental results on several image datasets demonstrate that the proposed model has distinct advantages over current state-of-the-art multi-label classification methods.展开更多
Estimating the proportion of land-use types in different regions is essential to promote the organization of a compact city and reduce energy consumption.However,existing research in this area has a few limitations:(1...Estimating the proportion of land-use types in different regions is essential to promote the organization of a compact city and reduce energy consumption.However,existing research in this area has a few limitations:(1)lack of consideration of land-use distribution-related factors other than POIs;(2)inability to extract complex relations from heterogeneous information;and(3)overlooking the correlation between land-use types.To overcome these limitations,we propose a knowledge-based approach for estimating land-use distributions.We designed a knowledge graph to display POIs and other related heterogeneous data and then utilized a knowledge embedding model to directly obtain the region embedding vectors by learning the complex and implicit relations present in the knowledge graph.Region embedding vectors were mapped to land-use distributions using a label distribution learning method integrating the correlation between land-use types.To prove the reliability and validity of our approach,we conducted a case study in Jinhua,China.The results indicated that the proposed model outperformed other algorithms in all evaluation indices,thus illustrating the potential of this method to achieve higher accuracy land-use distribution estimates.展开更多
基金Supported by the State Grid Technology Item(52460D230002)。
文摘Label correlations are an essential technique for data mining that solves the possible correlation problem between different labels in multi-label classification.Although this technique is widely used in multi-label classification problems,batch learning deals with most issues,which consumes a lot of time and space resources.Unlike traditional batch learning methods,online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale datasets.However,existing online learning research has done little to consider correlations between labels.On the basis of existing research,this paper proposes a multi-label online learning algorithm based on label correlations by maximizing the interval between related labels and unrelated labels in multi-label samples.We evaluate the performance of the proposed algorithm on several public datasets.Experiments show the effectiveness of our algorithm.
基金the Natural Science Foundation of China(Grant Numbers 72074014 and 72004012).
文摘Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on the benchmark datasets have been proposed for multi-label classification task in the literature.Furthermore,several open-source tools implementing these approaches have also been developed.However,the characteristics of real-world multi-label patent and publication datasets are not completely in line with those of benchmark ones.Therefore,the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.Research limitations:Three real-world datasets differ in the following aspects:statement,data quality,and purposes.Additionally,open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection,which in turn impacts the performance of a multi-label classification approach.In the near future,we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings.Practical implications:The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets,underscoring the complexity of real-world multi-label classification tasks.Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels.With ongoing enhancements in deep learning algorithms and large-scale models,it is expected that the efficacy of multi-label classification tasks will be significantly improved,reaching a level of practical utility in the foreseeable future.Originality/value:(1)Seven multi-label classification methods are comprehensively compared on three real-world datasets.(2)The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution.(3)The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution.
基金supported by National Natural Science Foundation of China(Grant Nos.62376089,62302153,62302154,62202147)the key Research and Development Program of Hubei Province,China(Grant No.2023BEB024).
文摘The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challenging.Feature selection aims to mitigate the adverse impacts of high dimensionality in multi-label data by eliminating redundant and irrelevant features.The ant colony optimization algorithm has demonstrated encouraging outcomes in multi-label feature selection,because of its simplicity,efficiency,and similarity to reinforcement learning.Nevertheless,existing methods do not consider crucial correlation information,such as dynamic redundancy and label correlation.To tackle these concerns,the paper proposes a multi-label feature selection technique based on ant colony optimization algorithm(MFACO),focusing on dynamic redundancy and label correlation.Initially,the dynamic redundancy is assessed between the selected feature subset and potential features.Meanwhile,the ant colony optimization algorithm extracts label correlation from the label set,which is then combined into the heuristic factor as label weights.Experimental results demonstrate that our proposed strategies can effectively enhance the optimal search ability of ant colony,outperforming the other algorithms involved in the paper.
文摘In recent years,multi-label learning has received a lot of attention.However,most of the existing methods only consider global label correlation or local label correlation.In fact,on the one hand,both global and local label correlations can appear in real-world situation at same time.On the other hand,we should not be limited to pairwise labels while ignoring the high-order label correlation.In this paper,we propose a novel and effective method called GLLCBN for multi-label learning.Firstly,we obtain the global label correlation by exploiting label semantic similarity.Then,we analyze the pairwise labels in the label space of the data set to acquire the local correlation.Next,we build the original version of the label dependency model by global and local label correlations.After that,we use graph theory,probability theory and Bayesian networks to eliminate redundant dependency structure in the initial version model,so as to get the optimal label dependent model.Finally,we obtain the feature extraction model by adjusting the Inception V3 model of convolution neural network and combine it with the GLLCBN model to achieve the multi-label learning.The experimental results show that our proposed model has better performance than other multi-label learning methods in performance evaluating.
基金supported by the National Natural Science Foundation of China(61472305)the Science Research Program,Xi’an,China(2017073CG/RC036CXDKD003)the Aeronautical Science Foundation of China(20151981009)
文摘Multi-label classification problems arise frequently in text categorization, and many other related applications. Like conventional categorization problems, multi-label categorization tasks suffer from the curse of high dimensionality. Existing multi-label dimensionality reduction methods mainly suffer from two limitations. First, latent nonlinear structures are not utilized in the input space. Second, the label information is not fully exploited. This paper proposes a new method, multi-label local discriminative embedding (MLDE), which exploits latent structures to minimize intraclass distances and maximize interclass distances on the basis of label correlations. The latent structures are extracted by constructing two sets of adjacency graphs to make use of nonlinear information. Non-symmetric label correlations, which are the case in real applications, are adopted. The problem is formulated into a global objective function and a linear mapping is achieved to solve out-of-sample problems. Empirical studies across 11 Yahoo sub-tasks, Enron and Bibtex are conducted to validate the superiority of MLDE to state-of-art multi-label dimensionality reduction methods.
基金supported by the National Natural Science Foundation of China(62176197,61806155)the National Natural Science Foundation of Shaanxi Province(2020GY-062).
文摘Partial label learning aims to learn a multi-class classifier,where each training example corresponds to a set of candidate labels among which only one is correct.Most studies in the label space have only focused on the difference between candidate labels and non-candidate labels.So far,however,there has been little discussion about the label correlation in the partial label learning.This paper begins with a research on the label correlation,followed by the establishment of a unified framework that integrates the label correlation,the adaptive graph,and the semantic difference maximization criterion.This work generates fresh insight into the acquisition of the learning information from the label space.Specifically,the label correlation is calculated from the candidate label set and is utilized to obtain the similarity of each pair of instances in the label space.After that,the labeling confidence for each instance is updated by the smoothness assumption that two instances should be similar outputs in the label space if they are close in the feature space.At last,an effective optimization program is utilized to solve the unified framework.Extensive experiments on artificial and real-world data sets indicate the superiority of our proposed method to state-of-art partial label learning methods.
基金supported in part by the National Key R&D Program of China (2023YFA1011601)the Major Key Project of PCL, China (PCL2023AS7-1)+3 种基金in part by the National Natural Science Foundation of China (U21A20478, 62106224, 92267203)in part by the Science and Technology Major Project of Guangzhou (202007030006)in part by the Major Key Project of PCL (PCL2021A09)in part by the Guangzhou Science and Technology Plan Project (2024A04J3749)。
文摘Multi-label classification is a challenging problem that has attracted significant attention from researchers, particularly in the domain of image and text attribute annotation. However, multi-label datasets are prone to serious intra-class and inter-class imbalance problems, which can significantly degrade the classification performance. To address the above issues, we propose the multi-label weighted broad learning system(MLW-BLS) from the perspective of label imbalance weighting and label correlation mining. Further, we propose the multi-label adaptive weighted broad learning system(MLAW-BLS) to adaptively adjust the specific weights and values of labels of MLW-BLS and construct an efficient imbalanced classifier set. Extensive experiments are conducted on various datasets to evaluate the effectiveness of the proposed model, and the results demonstrate its superiority over other advanced approaches.
基金Acknowledgements The authors would like to thank the associate editor and anonymous reviewers for their helpful comments and suggestions. This work was supported by the National Natural Science Foundation of China (Grant Nos. 61573104, 61622203), the Natural Science Foundation of Jiangsu Province (BK20141340), the Fundamental Research Funds for the Central Universities (2242017K40140), and partially supported by the Collaborative Innovation Center of Novel Software Technology and Industrialization.
文摘Multi-label learning deals with problems where each example is represented by a single instance while being associated with multiple class labels simultaneously. Binary relevance is arguably the most intuitive solution for learning from multi-label examples. It works by decomposing the multi-label learning task into a number of independent binary learning tasks (one per class label). In view of its potential weakness in ignoring correlations between labels, many correlation-enabling extensions to binary relevance have been proposed in the past decade. In this paper, we aim to review the state of the art of binary relevance from three perspectives. First, basic settings for multi-label learning and binary relevance solutions are briefly summarized. Second, representative strategies to provide binary relevance with label correlation exploitation abilities are discussed. Third, some of our recent studies on binary relevance aimed at issues other than label correlation exploitation are introduced. As a conclusion, we provide suggestions on future research directions.
基金supported by the National Natural Science Foundation of China(Nos.61922087,61906201,62006238,and 62136005)the Natural Science Fund for Distinguished Young Scholars of Hunan Province(No.2019JJ20020).
文摘Image classification is vital and basic in many data analysis domains.Since real-world images generally contain multiple diverse semantic labels,it amounts to a typical multi-label classification problem.Traditional multi-label image classification relies on a large amount of training data with plenty of labels,which requires a lot of human and financial costs.By contrast,one can easily obtain a correlation matrix of concerned categories in current scene based on the historical image data in other application scenarios.How to perform image classification with only label correlation priors,without specific and costly annotated labels,is an important but rarely studied problem.In this paper,we propose a model to classify images with this kind of weak correlation prior.We use label correlation to recapitulate the sample similarity,employ the prior information to decompose the projection matrix when regressing the label indication matrix,and introduce the L_(2,1) norm to select features for each image.Finally,experimental results on several image datasets demonstrate that the proposed model has distinct advantages over current state-of-the-art multi-label classification methods.
基金supported by N ational Natural Science Foundation of China[grant number 41801313].
文摘Estimating the proportion of land-use types in different regions is essential to promote the organization of a compact city and reduce energy consumption.However,existing research in this area has a few limitations:(1)lack of consideration of land-use distribution-related factors other than POIs;(2)inability to extract complex relations from heterogeneous information;and(3)overlooking the correlation between land-use types.To overcome these limitations,we propose a knowledge-based approach for estimating land-use distributions.We designed a knowledge graph to display POIs and other related heterogeneous data and then utilized a knowledge embedding model to directly obtain the region embedding vectors by learning the complex and implicit relations present in the knowledge graph.Region embedding vectors were mapped to land-use distributions using a label distribution learning method integrating the correlation between land-use types.To prove the reliability and validity of our approach,we conducted a case study in Jinhua,China.The results indicated that the proposed model outperformed other algorithms in all evaluation indices,thus illustrating the potential of this method to achieve higher accuracy land-use distribution estimates.