Currently, mere are many onune review weo sites where consumers can freely write comments about different kinds of products and services. These comments are quite useful for other potential consumers. However, the num...Currently, mere are many onune review weo sites where consumers can freely write comments about different kinds of products and services. These comments are quite useful for other potential consumers. However, the number of online comments is often large and the number continues to grow as more and more consumers contribute. In addition, one comment may mention more than one product and con- tain opinions about different products, mentioning something good and something bad. However, they share only a single overall score, Therefore, it is not easy to know the quality of an individual product from these comments. This paper presents a novel approach to generate review summaries including scores and description snippets with re- spect to each individual product. From the large number of comments, we first extract the context (snippet) that includes a description of the products and choose those snippets that express consumer opinions on them. We then propose several methods to predict the rating (from 1 to 5 stars) of the snip- pets. Finally, we derive a generic framework for generating summaries from the snippets. We design a new snippet selec- tion algorithm to ensure that the returned results preserve the opinion-aspect statistical properties and attribute-aspect cov- erage based on a standard seat allocation algorithm. Through experiments we demonstrate empirically that our methods are effective. We also quantitatively evaluate each step of our ap- proach.展开更多
The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user's query, is to strike the optimal balance between relevance and diversity. In this paper...The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user's query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k re- trieval problem in the framework of facility location analysis and prove he submodularity of that objective function which provides a theoretical approximation guarantee of factor 1 -1/ε for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first ob- tains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently.展开更多
基金This work was partially supported by the National Science Foundation of China (Grant Nos. 61103039, 61232002, 61472345), National Basic Research Program of China (2010CB731402) and Wuhan Key Lab Research Foundation (SKLSE2012-09-16).
文摘Currently, mere are many onune review weo sites where consumers can freely write comments about different kinds of products and services. These comments are quite useful for other potential consumers. However, the number of online comments is often large and the number continues to grow as more and more consumers contribute. In addition, one comment may mention more than one product and con- tain opinions about different products, mentioning something good and something bad. However, they share only a single overall score, Therefore, it is not easy to know the quality of an individual product from these comments. This paper presents a novel approach to generate review summaries including scores and description snippets with re- spect to each individual product. From the large number of comments, we first extract the context (snippet) that includes a description of the products and choose those snippets that express consumer opinions on them. We then propose several methods to predict the rating (from 1 to 5 stars) of the snip- pets. Finally, we derive a generic framework for generating summaries from the snippets. We design a new snippet selec- tion algorithm to ensure that the returned results preserve the opinion-aspect statistical properties and attribute-aspect cov- erage based on a standard seat allocation algorithm. Through experiments we demonstrate empirically that our methods are effective. We also quantitatively evaluate each step of our ap- proach.
基金This work was supported by the National Natural Science Foundation of China (Grant Nos. 61572135 and 61170085), 973 project (2010CB328106), Program for New Century Excellent Talents in China (NCET-10-0388).
文摘The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user's query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k re- trieval problem in the framework of facility location analysis and prove he submodularity of that objective function which provides a theoretical approximation guarantee of factor 1 -1/ε for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first ob- tains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently.