β-turn is one of the most important reverse turns because of its role in protein folding. Many computational methods have been studied for predicting β-turns and β-turn types. However, due to the imbalanced dataset...β-turn is one of the most important reverse turns because of its role in protein folding. Many computational methods have been studied for predicting β-turns and β-turn types. However, due to the imbalanced dataset, the performance is still inadequate. In this study, we proposed a novel over-sampling technique FOST to deal with the class-imbalance problem. Experimental results on three standard benchmark datasets showed that our method is comparable with state-of-the-art methods. In addition, we applied our algorithm to five benchmark datasets from UCI Machine Learning Repository and achieved significant improvement in G-mean and Sensitivity. It means that our method is also effective for various imbalanced data other than β-turns and β-turn types.展开更多
Multi-label learning deals with problems where each example is represented by a single instance while being associated with multiple class labels simultaneously. Binary relevance is arguably the most intuitive solutio...Multi-label learning deals with problems where each example is represented by a single instance while being associated with multiple class labels simultaneously. Binary relevance is arguably the most intuitive solution for learning from multi-label examples. It works by decomposing the multi-label learning task into a number of independent binary learning tasks (one per class label). In view of its potential weakness in ignoring correlations between labels, many correlation-enabling extensions to binary relevance have been proposed in the past decade. In this paper, we aim to review the state of the art of binary relevance from three perspectives. First, basic settings for multi-label learning and binary relevance solutions are briefly summarized. Second, representative strategies to provide binary relevance with label correlation exploitation abilities are discussed. Third, some of our recent studies on binary relevance aimed at issues other than label correlation exploitation are introduced. As a conclusion, we provide suggestions on future research directions.展开更多
Software defect detection aims to automatically identify defective software modules for efficient software test in order to improve the quality of a software system. Although many machine learning methods have been su...Software defect detection aims to automatically identify defective software modules for efficient software test in order to improve the quality of a software system. Although many machine learning methods have been successfully applied to the task, most of them fail to consider two practical yet important issues in software defect detection. First, it is rather difficult to collect a large amount of labeled training data for learning a well-performing model; second, in a software system there are usually much fewer defective modules than defect-free modules, so learning would have to be conducted over an imbalanced data set. In this paper~ we address these two practical issues simultaneously by proposing a novel semi-supervised learning approach named Rocus. This method exploits the abundant unlabeled examples to improve the detection accuracy, as well as employs under-sampling to tackle the class-imbalance problem in the learning process. Experimental results of real-world software defect detection tasks show that Rocus is effective for software defect detection. Its performance is better than a semi-supervised learning method that ignores the class-imbalance nature of the task and a class-imbalance learning method that does not make effective use of unlabeled data.展开更多
文摘β-turn is one of the most important reverse turns because of its role in protein folding. Many computational methods have been studied for predicting β-turns and β-turn types. However, due to the imbalanced dataset, the performance is still inadequate. In this study, we proposed a novel over-sampling technique FOST to deal with the class-imbalance problem. Experimental results on three standard benchmark datasets showed that our method is comparable with state-of-the-art methods. In addition, we applied our algorithm to five benchmark datasets from UCI Machine Learning Repository and achieved significant improvement in G-mean and Sensitivity. It means that our method is also effective for various imbalanced data other than β-turns and β-turn types.
基金Acknowledgements The authors would like to thank the associate editor and anonymous reviewers for their helpful comments and suggestions. This work was supported by the National Natural Science Foundation of China (Grant Nos. 61573104, 61622203), the Natural Science Foundation of Jiangsu Province (BK20141340), the Fundamental Research Funds for the Central Universities (2242017K40140), and partially supported by the Collaborative Innovation Center of Novel Software Technology and Industrialization.
文摘Multi-label learning deals with problems where each example is represented by a single instance while being associated with multiple class labels simultaneously. Binary relevance is arguably the most intuitive solution for learning from multi-label examples. It works by decomposing the multi-label learning task into a number of independent binary learning tasks (one per class label). In view of its potential weakness in ignoring correlations between labels, many correlation-enabling extensions to binary relevance have been proposed in the past decade. In this paper, we aim to review the state of the art of binary relevance from three perspectives. First, basic settings for multi-label learning and binary relevance solutions are briefly summarized. Second, representative strategies to provide binary relevance with label correlation exploitation abilities are discussed. Third, some of our recent studies on binary relevance aimed at issues other than label correlation exploitation are introduced. As a conclusion, we provide suggestions on future research directions.
基金supported by the National Natural Science Foundation of China under Grant Nos. 60975043,60903103,and 60721002
文摘Software defect detection aims to automatically identify defective software modules for efficient software test in order to improve the quality of a software system. Although many machine learning methods have been successfully applied to the task, most of them fail to consider two practical yet important issues in software defect detection. First, it is rather difficult to collect a large amount of labeled training data for learning a well-performing model; second, in a software system there are usually much fewer defective modules than defect-free modules, so learning would have to be conducted over an imbalanced data set. In this paper~ we address these two practical issues simultaneously by proposing a novel semi-supervised learning approach named Rocus. This method exploits the abundant unlabeled examples to improve the detection accuracy, as well as employs under-sampling to tackle the class-imbalance problem in the learning process. Experimental results of real-world software defect detection tasks show that Rocus is effective for software defect detection. Its performance is better than a semi-supervised learning method that ignores the class-imbalance nature of the task and a class-imbalance learning method that does not make effective use of unlabeled data.