The rapid increase of user-generated content (UGC) is a rich source for reputation management of enti- ties, products, and services. Looking at online product re- views as a concrete example, in reviews, customers u...The rapid increase of user-generated content (UGC) is a rich source for reputation management of enti- ties, products, and services. Looking at online product re- views as a concrete example, in reviews, customers usually give opinions on multiple attributes of products, therefore the challenge is to automatically extract and cluster attributes that are mentioned. In this paper, we investigate efficient at- tribute extraction models using a semi-supervised approach. Specifically, we formulate the attribute extraction issue as a sequence labeling task and design a bootstrapped schema to train the extraction models by leveraging a small quantity of labeled reviews and a larger number of unlabeled reviews. In addition, we propose a clustering By committee (CBC) ap- proach to cluster attributes according to their semantic simi- larity. Experimental results on real world datasets show that the proposed approach is effective.展开更多
文摘The rapid increase of user-generated content (UGC) is a rich source for reputation management of enti- ties, products, and services. Looking at online product re- views as a concrete example, in reviews, customers usually give opinions on multiple attributes of products, therefore the challenge is to automatically extract and cluster attributes that are mentioned. In this paper, we investigate efficient at- tribute extraction models using a semi-supervised approach. Specifically, we formulate the attribute extraction issue as a sequence labeling task and design a bootstrapped schema to train the extraction models by leveraging a small quantity of labeled reviews and a larger number of unlabeled reviews. In addition, we propose a clustering By committee (CBC) ap- proach to cluster attributes according to their semantic simi- larity. Experimental results on real world datasets show that the proposed approach is effective.