Although much progress has been made to date on sentiment classification, lacking annotated corpora remains a problem. In this paper we propose to expand corpora for Chinese polarity classification via opinion paraphr...Although much progress has been made to date on sentiment classification, lacking annotated corpora remains a problem. In this paper we propose to expand corpora for Chinese polarity classification via opinion paraphrase generation. To this end, we first exploit three strategies for opinion paraphrase generation, namely sentences re-ordering, opinion element substitution and explicit attribution implying. To improve the quality of the generated opinion paraphrases, we define four criteria for opinion paraphrase evaluation and thus present a filtering algorithm to discard improper opinion paraphrase candidates. To assess the proposed method, we further apply the expanded corpus to a SVM classifier for polarity classification. The experimental results show that the generated opinion paraphrases are beneficial to polarity classification.展开更多
Recent work on opinion mining typically focuses on subtasks such as aspect mining or polarity classification, ignoring the detailed explanatory evidences that account for one certain user opinion. In this paper, we st...Recent work on opinion mining typically focuses on subtasks such as aspect mining or polarity classification, ignoring the detailed explanatory evidences that account for one certain user opinion. In this paper, we study the extraction of explanatory expressions, by modeling the problem based on conditional random field (CRF). We compare the effectiveness of both discrete and neural features, and further integrate them.We evaluate the models on two datasets from two different domains which have been annotated with ground-truth explanatory expression.Results show that the neural CRF model performs better than the discrete CRF. After a combination of the discrete and neural features, our final CRF mode achieves the top-performing results.展开更多
How to mine the underlying reasons for opinions is a key issue on opinion mining. In this paper, we propose a CRF-based labeling approach to explanatory segment recognition in Chinese product reviews. To this end, we ...How to mine the underlying reasons for opinions is a key issue on opinion mining. In this paper, we propose a CRF-based labeling approach to explanatory segment recognition in Chinese product reviews. To this end, we first reformulate explanatory segments recognition as a labeling task on a sequence of words, and then explore various features from three linguistic levels, namely character, word and semantic under the framework of conditional random fields. Experimental results over product reviews from mobilephone and car domains show that the proposed approach significantly outperforms existing state-of-the-art methods for explanatory segment extraction.展开更多
Homophonic words are very popular in Chinese microblog, posing a new challenge for Chinese microblog text analysis. However, to date, there has been very little research conducted on Chinese homophonic words normaliza...Homophonic words are very popular in Chinese microblog, posing a new challenge for Chinese microblog text analysis. However, to date, there has been very little research conducted on Chinese homophonic words normalization. In this paper, we take Chinese homophonic word normalization as a process of language decoding and propose an n-gram based approach. To this end, we first employ homophonic–original word or character mapping tables to generate normalization candidates for a given sentence with homophonic words, and thus exploit n-gram language models to decode the best normalization from the candidate set. Our experimental results show that using the homophonic-original character mapping table and n-grams trained from the microblog corpus help improve performance in homophonic word recognition and restoration.展开更多
基金This study was supported by Natural Science Foundation of Heilongjiang Province under Grant No. F2016036, National Natural Science Foundation of China under Grant No. 61170148, and the Returned Scholar Foundation of Heilongjiang Province, respectively.
文摘Although much progress has been made to date on sentiment classification, lacking annotated corpora remains a problem. In this paper we propose to expand corpora for Chinese polarity classification via opinion paraphrase generation. To this end, we first exploit three strategies for opinion paraphrase generation, namely sentences re-ordering, opinion element substitution and explicit attribution implying. To improve the quality of the generated opinion paraphrases, we define four criteria for opinion paraphrase evaluation and thus present a filtering algorithm to discard improper opinion paraphrase candidates. To assess the proposed method, we further apply the expanded corpus to a SVM classifier for polarity classification. The experimental results show that the generated opinion paraphrases are beneficial to polarity classification.
文摘Recent work on opinion mining typically focuses on subtasks such as aspect mining or polarity classification, ignoring the detailed explanatory evidences that account for one certain user opinion. In this paper, we study the extraction of explanatory expressions, by modeling the problem based on conditional random field (CRF). We compare the effectiveness of both discrete and neural features, and further integrate them.We evaluate the models on two datasets from two different domains which have been annotated with ground-truth explanatory expression.Results show that the neural CRF model performs better than the discrete CRF. After a combination of the discrete and neural features, our final CRF mode achieves the top-performing results.
基金This study was supported by National Natural Science Foundation of China under Grant No.61170148 and No.60973081, the Returned Scholar Foundation of Heilongjiang Province, Harbin Innovative Foundation for Returnees under Grant No.2009RFLXG007, and the Graduate Innovative Research Projects of Heilongjiang University under Grant No. YJSCX2014-017HLJU, respectively
文摘How to mine the underlying reasons for opinions is a key issue on opinion mining. In this paper, we propose a CRF-based labeling approach to explanatory segment recognition in Chinese product reviews. To this end, we first reformulate explanatory segments recognition as a labeling task on a sequence of words, and then explore various features from three linguistic levels, namely character, word and semantic under the framework of conditional random fields. Experimental results over product reviews from mobilephone and car domains show that the proposed approach significantly outperforms existing state-of-the-art methods for explanatory segment extraction.
基金This study was supported by National Natural Science Foundation of China under Grant No.61170148 and No.60973081, the Returned Scholar Foundation of Heilongjiang Province, and Harbin Innovative Foundation for Returnees under Grant No.2009RFLXG007, respectively.
文摘Homophonic words are very popular in Chinese microblog, posing a new challenge for Chinese microblog text analysis. However, to date, there has been very little research conducted on Chinese homophonic words normalization. In this paper, we take Chinese homophonic word normalization as a process of language decoding and propose an n-gram based approach. To this end, we first employ homophonic–original word or character mapping tables to generate normalization candidates for a given sentence with homophonic words, and thus exploit n-gram language models to decode the best normalization from the candidate set. Our experimental results show that using the homophonic-original character mapping table and n-grams trained from the microblog corpus help improve performance in homophonic word recognition and restoration.