Opinion targets extraction of Chinese microblogs plays an important role in opinion mining. There has been a significant progress in this area recently, especially the method based on conditional random field (CRF)....Opinion targets extraction of Chinese microblogs plays an important role in opinion mining. There has been a significant progress in this area recently, especially the method based on conditional random field (CRF). However, this method only takes lexicon-related features into consideration and does not excavate the implied syntactic and semantic knowledge. We propose a novel approach which incorporates domain lexicon with groups of syntactical and semantic features. The approach acquires domain lexicon through a novel way which explores syntactic and semantic information through Part- of-Speech, dependency structure, phrase structure, semantic role and semantic similarity based on word embedding. And then we combine the domain lexicon with opinion targets extracted from CRF with groups of features for opinion targets extraction. Experimental results on COAE2014 dataset show the outperformance of the approach compared with other well-known methods on the task of opinion targets extraction.展开更多
Opinion target extraction is one of the core tasks in sentiment analysison text data. In recent years, dependency parser–based approaches have beencommonly studied for opinion target extraction. However, dependency p...Opinion target extraction is one of the core tasks in sentiment analysison text data. In recent years, dependency parser–based approaches have beencommonly studied for opinion target extraction. However, dependency parsersare limited by language and grammatical constraints. Therefore, in this work, asequential pattern-based rule mining model, which does not have such constraints,is proposed for cross-domain opinion target extraction from product reviews inunknown domains. Thus, knowing the domain of reviews while extracting opinion targets becomes no longer a requirement. The proposed model also revealsthe difference between the concepts of opinion target and aspect, which are commonly confused in the literature. The model consists of two stages. In the firststage, the aspects of reviews are extracted from the target domain using the rulesautomatically generated from source domains. The aspects are also transferredfrom the source domains to a target domain. Moreover, aspect pruning is appliedto further improve the performance of aspect extraction. In the second stage, theopinion target is extracted among the aspects extracted at the former stage usingthe rules automatically generated for opinion target extraction. The proposedmodel was evaluated on several benchmark datasets in different domains andcompared against the literature. The experimental results revealed that the opiniontargets of the reviews in unknown domains can be extracted with higher accuracythan those of the previous works.展开更多
In opinion mining of product reviews, an important task is to provide a summary of customers' opinions based on different opinion targets. Due to various knowledge backgrounds or linguistic habits, customers use a va...In opinion mining of product reviews, an important task is to provide a summary of customers' opinions based on different opinion targets. Due to various knowledge backgrounds or linguistic habits, customers use a variety of terms to describe the same opinion target. These terms are called as context-dependent synonyms. In order to provide a comprehensive summary, the first step is to classify these opinion target words into groups. In this article, we mainly focus on clustering context-dependent opinion target words in Chinese product reviews. We utilize three clustering methods based on distributional similarity and use four different co-occurrence matrices for experiments. According to the experimental results on a large number of reviews, we find that our proposed heuristic k-means clustering method using opinion target words co-occurrence matrix achieves the best clustering result with lower time complexity and less memory space. In addition, the accuracy is more stable when choosing different combinations of centroids. For some kinds of co-occurrence matrices, we also find that using small-size (low-dimensional) matrices achieves higher average clustering accuracy than using large-size (high-dimensional) matrices. Our findings provide a time-efficient and space-efficient way to cluster opinion targets with high accuracy.展开更多
基金The work was supported by the National Basic Research 973 Program of China under Grant Nos. 2013CB329605 and 2013CB329303, and the National Natural Science Foundation of China under Grant No. 61201351.
文摘Opinion targets extraction of Chinese microblogs plays an important role in opinion mining. There has been a significant progress in this area recently, especially the method based on conditional random field (CRF). However, this method only takes lexicon-related features into consideration and does not excavate the implied syntactic and semantic knowledge. We propose a novel approach which incorporates domain lexicon with groups of syntactical and semantic features. The approach acquires domain lexicon through a novel way which explores syntactic and semantic information through Part- of-Speech, dependency structure, phrase structure, semantic role and semantic similarity based on word embedding. And then we combine the domain lexicon with opinion targets extracted from CRF with groups of features for opinion targets extraction. Experimental results on COAE2014 dataset show the outperformance of the approach compared with other well-known methods on the task of opinion targets extraction.
文摘Opinion target extraction is one of the core tasks in sentiment analysison text data. In recent years, dependency parser–based approaches have beencommonly studied for opinion target extraction. However, dependency parsersare limited by language and grammatical constraints. Therefore, in this work, asequential pattern-based rule mining model, which does not have such constraints,is proposed for cross-domain opinion target extraction from product reviews inunknown domains. Thus, knowing the domain of reviews while extracting opinion targets becomes no longer a requirement. The proposed model also revealsthe difference between the concepts of opinion target and aspect, which are commonly confused in the literature. The model consists of two stages. In the firststage, the aspects of reviews are extracted from the target domain using the rulesautomatically generated from source domains. The aspects are also transferredfrom the source domains to a target domain. Moreover, aspect pruning is appliedto further improve the performance of aspect extraction. In the second stage, theopinion target is extracted among the aspects extracted at the former stage usingthe rules automatically generated for opinion target extraction. The proposedmodel was evaluated on several benchmark datasets in different domains andcompared against the literature. The experimental results revealed that the opiniontargets of the reviews in unknown domains can be extracted with higher accuracythan those of the previous works.
基金the Commonweal Technical Project of Zhejiang Province of China under Grant No. 2013C33063, the National Natural Science Foundation of China under Grant Nos. 61100183, 61402417, the Natural Science Foundation of Zhejiang Province of China under Grant No. LQ13F020014, and the 521 Talents Project of Zhejiang Sci-Tech University.
文摘In opinion mining of product reviews, an important task is to provide a summary of customers' opinions based on different opinion targets. Due to various knowledge backgrounds or linguistic habits, customers use a variety of terms to describe the same opinion target. These terms are called as context-dependent synonyms. In order to provide a comprehensive summary, the first step is to classify these opinion target words into groups. In this article, we mainly focus on clustering context-dependent opinion target words in Chinese product reviews. We utilize three clustering methods based on distributional similarity and use four different co-occurrence matrices for experiments. According to the experimental results on a large number of reviews, we find that our proposed heuristic k-means clustering method using opinion target words co-occurrence matrix achieves the best clustering result with lower time complexity and less memory space. In addition, the accuracy is more stable when choosing different combinations of centroids. For some kinds of co-occurrence matrices, we also find that using small-size (low-dimensional) matrices achieves higher average clustering accuracy than using large-size (high-dimensional) matrices. Our findings provide a time-efficient and space-efficient way to cluster opinion targets with high accuracy.