Sentiment analysis is the computational study of how opinions, attitudes, emotions, and perspectives are expressed in language, and has been the important task of natural language processing. Sentiment analysis is hig...Sentiment analysis is the computational study of how opinions, attitudes, emotions, and perspectives are expressed in language, and has been the important task of natural language processing. Sentiment analysis is highly valuable for both research and practical applications. The focuses were put on the difficulties in the construction of sentiment classifiers which normally need tremendous labeled domain training data, and a novel unsupervised framework was proposed to make use of the Chinese idiom resources to develop a general sentiment classifier. Furthermore, the domain adaption of general sentiment classifier was improved by taking the general classifier as the base of a self-training procedure to get a domain self-training sentiment classifier. To validate the effect of the unsupervised framework, several experiments were carried out on publicly available Chinese online reviews dataset. The experiments show that the proposed framework is effective and achieves encouraging results. Specifically, the general classifier outperforms two baselines(a Na?ve 50% baseline and a cross-domain classifier), and the bootstrapping self-training classifier approximates the upper bound domain-specific classifier with the lowest accuracy of 81.5%, but the performance is more stable and the framework needs no labeled training dataset.展开更多
On the semantic web, data interoperability and ontology heterogeneity are becoming ever more important issues. To resolve these problems, multiple classification methods can be used to learn the matching between ontol...On the semantic web, data interoperability and ontology heterogeneity are becoming ever more important issues. To resolve these problems, multiple classification methods can be used to learn the matching between ontologies. The paper uses the general statistic classification method to discover category features in data instances and use the first-order learning algorithm FOIL to exploit the semantic relations among data instances. When using multistrategy learning approach, a central problem is the evaluation of multistrategy classifiers. The goal and the conditions of using multistrategy classifiers within ontology matching are different from the ones for general text classification. This paper describes the combination rule of multiple classifiers called the Best Outstanding Champion, which is suitable for heterogeneous ontology mapping. On the prediction results of individual methods, the method can well accumulate the correct matching of alone classifier. The experiments show that the approach achieves high accuracy on real-world domain.展开更多
Although, researchers in the ATC field have done a wide range of work based on SVM, almost all existing approaches utilize an empirical model of selection algorithms. Their attempts to model automatic selection in pra...Although, researchers in the ATC field have done a wide range of work based on SVM, almost all existing approaches utilize an empirical model of selection algorithms. Their attempts to model automatic selection in practical, large-scale, text classification systems have been limited. In this paper, we propose a new model selection algorithm that utilizes the DDAG learning architecture. This architecture derives a new large-scale text classifier with very good performance. Experimental results show that the proposed algorithm has good efficiency and the necessary generalization capability while handling large-scale multi-class text classification tasks.展开更多
基金Projects(61170156,60933005)supported by the National Natural Science Foundation of China
文摘Sentiment analysis is the computational study of how opinions, attitudes, emotions, and perspectives are expressed in language, and has been the important task of natural language processing. Sentiment analysis is highly valuable for both research and practical applications. The focuses were put on the difficulties in the construction of sentiment classifiers which normally need tremendous labeled domain training data, and a novel unsupervised framework was proposed to make use of the Chinese idiom resources to develop a general sentiment classifier. Furthermore, the domain adaption of general sentiment classifier was improved by taking the general classifier as the base of a self-training procedure to get a domain self-training sentiment classifier. To validate the effect of the unsupervised framework, several experiments were carried out on publicly available Chinese online reviews dataset. The experiments show that the proposed framework is effective and achieves encouraging results. Specifically, the general classifier outperforms two baselines(a Na?ve 50% baseline and a cross-domain classifier), and the bootstrapping self-training classifier approximates the upper bound domain-specific classifier with the lowest accuracy of 81.5%, but the performance is more stable and the framework needs no labeled training dataset.
文摘On the semantic web, data interoperability and ontology heterogeneity are becoming ever more important issues. To resolve these problems, multiple classification methods can be used to learn the matching between ontologies. The paper uses the general statistic classification method to discover category features in data instances and use the first-order learning algorithm FOIL to exploit the semantic relations among data instances. When using multistrategy learning approach, a central problem is the evaluation of multistrategy classifiers. The goal and the conditions of using multistrategy classifiers within ontology matching are different from the ones for general text classification. This paper describes the combination rule of multiple classifiers called the Best Outstanding Champion, which is suitable for heterogeneous ontology mapping. On the prediction results of individual methods, the method can well accumulate the correct matching of alone classifier. The experiments show that the approach achieves high accuracy on real-world domain.
文摘Although, researchers in the ATC field have done a wide range of work based on SVM, almost all existing approaches utilize an empirical model of selection algorithms. Their attempts to model automatic selection in practical, large-scale, text classification systems have been limited. In this paper, we propose a new model selection algorithm that utilizes the DDAG learning architecture. This architecture derives a new large-scale text classifier with very good performance. Experimental results show that the proposed algorithm has good efficiency and the necessary generalization capability while handling large-scale multi-class text classification tasks.