With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are d...With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are different in language syntax, semantics and pragmatics, sentiment classification methods that are effective for English twitter may fail on Chinese micro-blog. In addition, the colloquialism and conciseness of short Chinese texts introduces additional challenges to sentiment classification. In this work, a novel hybrid learning model was proposed for sentiment classification of Chinese micro-blogs, which included two stages. In the first stage, emotional scores were calculated over the whole dataset by utilizing an improved Chinese-oriented sentiment dictionary classification method. Data with extremely high or low scores were directly labeled. In the second stage, the remaining data were labeled by using an integrated classification method based on sentiment dictionary, support vector machine(SVM) and k-nearest neighbor(KNN). An improved feature selection method was adopted to enhance the discriminative power of the selected features. The two-stage hybrid framework made the proposed method effective for sentiment classification of Chinese micro-blogs. Experiments on the COAE2014(Chinese Opinion Analysis Evaluation 2014) dataset show that the proposed method outperforms other schemes.展开更多
It is easy for teenagers to view pornographic pictures on social networks. Many researchers have studied the detection of real pornographic pictures, but there are few studies on those that are artificial. In this wor...It is easy for teenagers to view pornographic pictures on social networks. Many researchers have studied the detection of real pornographic pictures, but there are few studies on those that are artificial. In this work, we studied how to detect artificial pornographic pictures, especially when they are on social networks. The whole detection process can be divided into two stages: feature selection and picture detection. In the feature selection stage, seven types of features that favour picture detection were selected. In the picture detection stage, three steps were included. 1) In order to alleviate the imbalance in the number of artificial pornographic pictures and normal ones, the training dataset of artificial pornographic pictures was expanded. Therefore, the features which were extracted from the training dataset can also be expanded too. 2) In order to reduce the time of feature extraction, a fast method which extracted features based on the proportionally scaled picture rather than the original one was proposed. 3) Three tree models were compared and a gradient boost decision tree (GBDT) was selected for the final picture detection. Three sets of experimental results show that the proposed method can achieve better recognition precision and drastically reduce the time cost of the method.展开更多
基金Projects(61573380,61303185)supported by the National Natural Science Foundation of ChinaProject(13BTQ052)supported by the National Social Science Foundation of China+1 种基金Project(2016M592450)supported by the China Postdoctoral Science FoundationProject(2016JJ4119)supported by the Hunan Provincial Natural Science Foundation of China
文摘With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are different in language syntax, semantics and pragmatics, sentiment classification methods that are effective for English twitter may fail on Chinese micro-blog. In addition, the colloquialism and conciseness of short Chinese texts introduces additional challenges to sentiment classification. In this work, a novel hybrid learning model was proposed for sentiment classification of Chinese micro-blogs, which included two stages. In the first stage, emotional scores were calculated over the whole dataset by utilizing an improved Chinese-oriented sentiment dictionary classification method. Data with extremely high or low scores were directly labeled. In the second stage, the remaining data were labeled by using an integrated classification method based on sentiment dictionary, support vector machine(SVM) and k-nearest neighbor(KNN). An improved feature selection method was adopted to enhance the discriminative power of the selected features. The two-stage hybrid framework made the proposed method effective for sentiment classification of Chinese micro-blogs. Experiments on the COAE2014(Chinese Opinion Analysis Evaluation 2014) dataset show that the proposed method outperforms other schemes.
基金Projects(61573380,61303185) supported by the National Natural Science Foundation of ChinaProjects(2016M592450,2017M612585) supported by the China Postdoctoral Science FoundationProjects(2016JJ4119,2017JJ3416) supported by the Hunan Provincial Natural Science Foundation of China
文摘It is easy for teenagers to view pornographic pictures on social networks. Many researchers have studied the detection of real pornographic pictures, but there are few studies on those that are artificial. In this work, we studied how to detect artificial pornographic pictures, especially when they are on social networks. The whole detection process can be divided into two stages: feature selection and picture detection. In the feature selection stage, seven types of features that favour picture detection were selected. In the picture detection stage, three steps were included. 1) In order to alleviate the imbalance in the number of artificial pornographic pictures and normal ones, the training dataset of artificial pornographic pictures was expanded. Therefore, the features which were extracted from the training dataset can also be expanded too. 2) In order to reduce the time of feature extraction, a fast method which extracted features based on the proportionally scaled picture rather than the original one was proposed. 3) Three tree models were compared and a gradient boost decision tree (GBDT) was selected for the final picture detection. Three sets of experimental results show that the proposed method can achieve better recognition precision and drastically reduce the time cost of the method.