This paper presents an unsupervised approach to cluster reviews of products collected from Amazon and then generates its labels of each cluster.Instead of using a complete review,this paper splits a review into senten...This paper presents an unsupervised approach to cluster reviews of products collected from Amazon and then generates its labels of each cluster.Instead of using a complete review,this paper splits a review into sentences and considers all sentences from the reviews as inputs for Clustering.Hierarchical Agglomerative Clustering(HAC)is used to cluster sentences.The approaches of cluster labeling are also unsupervised.For labeling,three different methods have been used to find a limited number of essential words for each cluster.Extracted essential words are used to construct phrases.Constructed phrases are used as labels for each cluster.This paper compares the result of the labeling method with baseline labeling.In the result evaluation,all the labeling methods outperform the baseline method.The aim of this research is cluster labeling that makes a set of labels to describe a cluster content and distinguishes the labels from other cluster labels.展开更多
文摘This paper presents an unsupervised approach to cluster reviews of products collected from Amazon and then generates its labels of each cluster.Instead of using a complete review,this paper splits a review into sentences and considers all sentences from the reviews as inputs for Clustering.Hierarchical Agglomerative Clustering(HAC)is used to cluster sentences.The approaches of cluster labeling are also unsupervised.For labeling,three different methods have been used to find a limited number of essential words for each cluster.Extracted essential words are used to construct phrases.Constructed phrases are used as labels for each cluster.This paper compares the result of the labeling method with baseline labeling.In the result evaluation,all the labeling methods outperform the baseline method.The aim of this research is cluster labeling that makes a set of labels to describe a cluster content and distinguishes the labels from other cluster labels.