Sentiment classification is a useful tool to classify reviews about sentiments and attitudes towards a product or service.Existing studies heavily rely on sentiment classification methods that require fully annotated ...Sentiment classification is a useful tool to classify reviews about sentiments and attitudes towards a product or service.Existing studies heavily rely on sentiment classification methods that require fully annotated inputs.However,there is limited labelled text available,making the acquirement process of the fully annotated input costly and labour-intensive.Lately,semi-supervised methods emerge as they require only partially labelled input but perform comparably to supervised methods.Nevertheless,some works reported that the performance of the semi-supervised model degraded after adding unlabelled instances into training.Literature also shows that not all unlabelled instances are equally useful;thus identifying the informative unlabelled instances is beneficial in training a semi-supervised model.To achieve this,an informative score is proposed and incorporated into semisupervised sentiment classification.The evaluation is performed on a semisupervised method without an informative score and with an informative score.By using the informative score in the instance selection strategy to identify informative unlabelled instances,semi-supervised models perform better compared to models that do not incorporate informative scores into their training.Although the performance of semi-supervised models incorporated with an informative score is not able to surpass the supervised models,the results are still found promising as the differences in performance are subtle with a small difference of 2%to 5%,but the number of labelled instances used is greatly reduced from100%to 40%.The best finding of the proposed instance selection strategy is achieved when incorporating an informative score with a baseline confidence score at a 0.5:0.5 ratio using only 40%labelled data.展开更多
Recommender system is a tool to suggest items to the users from the extensive history of the user’s feedback.Though,it is an emerging research area concerning academics and industries,where it suffers from sparsity,s...Recommender system is a tool to suggest items to the users from the extensive history of the user’s feedback.Though,it is an emerging research area concerning academics and industries,where it suffers from sparsity,scalability,and cold start problems.This paper addresses sparsity,and scalability problems of model-based collaborative recommender system based on ensemble learning approach and enhanced clustering algorithm for movie recommendations.In this paper,an effective movie recommendation system is proposed by Classification and Regression Tree(CART)algorithm,enhanced Balanced Iterative Reducing and Clustering using Hierarchies(BIRCH)algorithm and truncation method.In this research paper,a new hyper parameters tuning is added in BIRCH algorithm to enhance the cluster formation process,where the proposed algorithm is named as enhanced BIRCH.The proposed model yields quality movie recommendation to the new user using Gradient boost classification with broad coverage.In this paper,the proposed model is tested on Movielens dataset,and the performance is evaluated by means of Mean Absolute Error(MAE),precision,recall and f-measure.The experimental results showed the superiority of proposed model in movie recommendation compared to the existing models.The proposed model obtained 0.52 and 0.57 MAE value on Movielens 100k and 1M datasets.Further,the proposed model obtained 0.83 of precision,0.86 of recall and 0.86 of f-measure on Movielens 100k dataset,which are effective compared to the existing models in movie recommendation.展开更多
This paper presents a novel ontology mapping approach based on rough set theory and instance selection.In this appoach the construction approach of a rough set-based inference instance base in which the instance selec...This paper presents a novel ontology mapping approach based on rough set theory and instance selection.In this appoach the construction approach of a rough set-based inference instance base in which the instance selection(involving similarity distance,clustering set and redundancy degree)and discernibility matrix-based feature reduction are introduced respectively;and an ontology mapping approach based on multi-dimensional attribute value joint distribution is proposed.The core of this mapping aI overlapping of the inference instance space.Only valuable instances and important attributes can be selected into the ontology mapping based on the multi-dimensional attribute value joint distribution,so the sequently mapping efficiency is improved.The time complexity of the discernibility matrix-based method and the accuracy of the mapping approach are evaluated by an application example and a series of analyses and comparisons.展开更多
Graph-based semi-supervised learning is an important semi-supervised learning paradigm. Although graphbased semi-supervised learning methods have been shown to be helpful in various situations, they may adversely affe...Graph-based semi-supervised learning is an important semi-supervised learning paradigm. Although graphbased semi-supervised learning methods have been shown to be helpful in various situations, they may adversely affect performance when using unlabeled data. In this paper, we propose a new graph-based semi-supervised learning method based on instance selection in order to reduce the chances of performance degeneration. Our basic idea is that given a set of unlabeled instances, it is not the best approach to exploit all the unlabeled instances; instead, we should exploit the unlabeled instances that are highly likely to help improve the performance, while not taking into account the ones with high risk. We develop both transductive and inductive variants of our method. Experiments on a broad range of data sets show that the chances of performance degeneration of our proposed method are much smaller than those of many state-of-the-art graph-based semi-supervised learning methods.展开更多
Multiple-Instance Learning (MIL) is used to predict the unlabeled bags' label by learning the labeled positive training bags and negative training bags.Each bag is made up of several unlabeled instances.A bag is la...Multiple-Instance Learning (MIL) is used to predict the unlabeled bags' label by learning the labeled positive training bags and negative training bags.Each bag is made up of several unlabeled instances.A bag is labeled positive if at least one of its instances is positive,otherwise negative.Existing multiple-instance learning methods with instance selection ignore the representative degree of the selected instances.For example,if an instance has many similar instances with the same label around it,the instance should be more representative than others.Based on this idea,in this paper,a multiple-instance learning with instance selection via constructive covering algorithm (MilCa) is proposed.In MilCa,we firstly use maximal Hausdorff to select some initial positive instances from positive bags,then use a Constructive Covering Algorithm (CCA) to restructure the structure of the original instances of negative bags.Then an inverse testing process is employed to exclude the false positive instances from positive bags and to select the high representative degree instances ordered by the number of covered instances from training bags.Finally,a similarity measure function is used to convert the training bag into a single sample and CCA is again used to classification for the converted samples.Experimental results on synthetic data and standard benchmark datasets demonstrate that MilCa can decrease the number of the selected instances and it is competitive with the state-of-the-art MIL algorithms.展开更多
Active learning has been widely utilized to reduce the labeling cost of supervised learning.By selecting specific instances to train the model,the performance of the model was improved within limited steps.However,rar...Active learning has been widely utilized to reduce the labeling cost of supervised learning.By selecting specific instances to train the model,the performance of the model was improved within limited steps.However,rare work paid attention to the effectiveness of active learning on it.In this paper,we proposed a deep active learning model with bidirectional encoder representations from transformers(BERT)for text classification.BERT takes advantage of the self-attention mechanism to integrate contextual information,which is beneficial to accelerate the convergence of training.As for the process of active learning,we design an instance selection strategy based on posterior probabilities Margin,Intra-correlation and Inter-correlation(MII).Selected instances are characterized by small margin,low intra-cohesion and high inter-cohesion.We conduct extensive experiments and analytics with our methods.The effect of learner is compared while the effect of sampling strategy and text classification is assessed from three real datasets.The results show that our method outperforms the baselines in terms of accuracy.展开更多
For a semi-supervised classification system, with the increase of the training samples number, the system needs to be continually updated. As the size of samples set is increasing, many unreliable samples will also be...For a semi-supervised classification system, with the increase of the training samples number, the system needs to be continually updated. As the size of samples set is increasing, many unreliable samples will also be increased. In this paper, we use fuzzy c-means (FCM) clustering to take out some samples that are useless, and extract the intersection between the original training set and the cluster after using FCM clustering. The intersection between every class and cluster is reliable samples which we are looking for. The experiment result demonstrates that the superiority of the proposed algorithm is remarkable.展开更多
基金This research is supported by Fundamental Research Grant Scheme(FRGS),Ministry of Education Malaysia(MOE)under the project code,FRGS/1/2018/ICT02/USM/02/9 titled,Automated Big Data Annotation for Training Semi-Supervised Deep Learning Model in Sentiment Classification.
文摘Sentiment classification is a useful tool to classify reviews about sentiments and attitudes towards a product or service.Existing studies heavily rely on sentiment classification methods that require fully annotated inputs.However,there is limited labelled text available,making the acquirement process of the fully annotated input costly and labour-intensive.Lately,semi-supervised methods emerge as they require only partially labelled input but perform comparably to supervised methods.Nevertheless,some works reported that the performance of the semi-supervised model degraded after adding unlabelled instances into training.Literature also shows that not all unlabelled instances are equally useful;thus identifying the informative unlabelled instances is beneficial in training a semi-supervised model.To achieve this,an informative score is proposed and incorporated into semisupervised sentiment classification.The evaluation is performed on a semisupervised method without an informative score and with an informative score.By using the informative score in the instance selection strategy to identify informative unlabelled instances,semi-supervised models perform better compared to models that do not incorporate informative scores into their training.Although the performance of semi-supervised models incorporated with an informative score is not able to surpass the supervised models,the results are still found promising as the differences in performance are subtle with a small difference of 2%to 5%,but the number of labelled instances used is greatly reduced from100%to 40%.The best finding of the proposed instance selection strategy is achieved when incorporating an informative score with a baseline confidence score at a 0.5:0.5 ratio using only 40%labelled data.
文摘Recommender system is a tool to suggest items to the users from the extensive history of the user’s feedback.Though,it is an emerging research area concerning academics and industries,where it suffers from sparsity,scalability,and cold start problems.This paper addresses sparsity,and scalability problems of model-based collaborative recommender system based on ensemble learning approach and enhanced clustering algorithm for movie recommendations.In this paper,an effective movie recommendation system is proposed by Classification and Regression Tree(CART)algorithm,enhanced Balanced Iterative Reducing and Clustering using Hierarchies(BIRCH)algorithm and truncation method.In this research paper,a new hyper parameters tuning is added in BIRCH algorithm to enhance the cluster formation process,where the proposed algorithm is named as enhanced BIRCH.The proposed model yields quality movie recommendation to the new user using Gradient boost classification with broad coverage.In this paper,the proposed model is tested on Movielens dataset,and the performance is evaluated by means of Mean Absolute Error(MAE),precision,recall and f-measure.The experimental results showed the superiority of proposed model in movie recommendation compared to the existing models.The proposed model obtained 0.52 and 0.57 MAE value on Movielens 100k and 1M datasets.Further,the proposed model obtained 0.83 of precision,0.86 of recall and 0.86 of f-measure on Movielens 100k dataset,which are effective compared to the existing models in movie recommendation.
基金the National High Technology Research and Development Program of China(No.2002AA411420)the National Key Basic Research and Development Program of China(No.2003CB316905)the National Natural Science Foundation of China(No.60374071)
文摘This paper presents a novel ontology mapping approach based on rough set theory and instance selection.In this appoach the construction approach of a rough set-based inference instance base in which the instance selection(involving similarity distance,clustering set and redundancy degree)and discernibility matrix-based feature reduction are introduced respectively;and an ontology mapping approach based on multi-dimensional attribute value joint distribution is proposed.The core of this mapping aI overlapping of the inference instance space.Only valuable instances and important attributes can be selected into the ontology mapping based on the multi-dimensional attribute value joint distribution,so the sequently mapping efficiency is improved.The time complexity of the discernibility matrix-based method and the accuracy of the mapping approach are evaluated by an application example and a series of analyses and comparisons.
文摘Graph-based semi-supervised learning is an important semi-supervised learning paradigm. Although graphbased semi-supervised learning methods have been shown to be helpful in various situations, they may adversely affect performance when using unlabeled data. In this paper, we propose a new graph-based semi-supervised learning method based on instance selection in order to reduce the chances of performance degeneration. Our basic idea is that given a set of unlabeled instances, it is not the best approach to exploit all the unlabeled instances; instead, we should exploit the unlabeled instances that are highly likely to help improve the performance, while not taking into account the ones with high risk. We develop both transductive and inductive variants of our method. Experiments on a broad range of data sets show that the chances of performance degeneration of our proposed method are much smaller than those of many state-of-the-art graph-based semi-supervised learning methods.
基金supported by the National Natural Science Foundation of China (No. 61175046)the Provincial Natural Science Research Program of Higher Education Institutions of Anhui Province (No. KJ2013A016)+1 种基金the Outstanding Young Talents in Higher Education Institutions of Anhui Province (No. 2011SQRL146)the Recruitment Project of Anhui University for Academic and Technology Leader
文摘Multiple-Instance Learning (MIL) is used to predict the unlabeled bags' label by learning the labeled positive training bags and negative training bags.Each bag is made up of several unlabeled instances.A bag is labeled positive if at least one of its instances is positive,otherwise negative.Existing multiple-instance learning methods with instance selection ignore the representative degree of the selected instances.For example,if an instance has many similar instances with the same label around it,the instance should be more representative than others.Based on this idea,in this paper,a multiple-instance learning with instance selection via constructive covering algorithm (MilCa) is proposed.In MilCa,we firstly use maximal Hausdorff to select some initial positive instances from positive bags,then use a Constructive Covering Algorithm (CCA) to restructure the structure of the original instances of negative bags.Then an inverse testing process is employed to exclude the false positive instances from positive bags and to select the high representative degree instances ordered by the number of covered instances from training bags.Finally,a similarity measure function is used to convert the training bag into a single sample and CCA is again used to classification for the converted samples.Experimental results on synthetic data and standard benchmark datasets demonstrate that MilCa can decrease the number of the selected instances and it is competitive with the state-of-the-art MIL algorithms.
基金This work is supported by National Natural Science Foundation of China(61402225,61728204)Innovation Funding(NJ20160028,NT2018028,NS2018057)+1 种基金Aeronautical Science Foundation of China(2016551500)State Key Laboratory for smart grid protection and operation control Foundation,and the Science and Technology Funds from National State Grid Ltd.,China degree and Graduate Education Fund.
文摘Active learning has been widely utilized to reduce the labeling cost of supervised learning.By selecting specific instances to train the model,the performance of the model was improved within limited steps.However,rare work paid attention to the effectiveness of active learning on it.In this paper,we proposed a deep active learning model with bidirectional encoder representations from transformers(BERT)for text classification.BERT takes advantage of the self-attention mechanism to integrate contextual information,which is beneficial to accelerate the convergence of training.As for the process of active learning,we design an instance selection strategy based on posterior probabilities Margin,Intra-correlation and Inter-correlation(MII).Selected instances are characterized by small margin,low intra-cohesion and high inter-cohesion.We conduct extensive experiments and analytics with our methods.The effect of learner is compared while the effect of sampling strategy and text classification is assessed from three real datasets.The results show that our method outperforms the baselines in terms of accuracy.
基金supported by the National Natural Science Foundation under Grant No.61175055 and No.61105059support of research funds of Sichuan Key Laboratory of Intelligent Network Information Processing under Grant No.SGXZD1002-10Si chuan Key Technology Research and Development Program under Grant No.2012GZ0019 and No.2011FZ0051
文摘For a semi-supervised classification system, with the increase of the training samples number, the system needs to be continually updated. As the size of samples set is increasing, many unreliable samples will also be increased. In this paper, we use fuzzy c-means (FCM) clustering to take out some samples that are useless, and extract the intersection between the original training set and the cluster after using FCM clustering. The intersection between every class and cluster is reliable samples which we are looking for. The experiment result demonstrates that the superiority of the proposed algorithm is remarkable.