摘要
随着电子商务的飞速发展,电子商务网站上的各种产品评论数量也在飞速增长。如何从Web中大量存在的产品评论中挖掘出对消费者和生产厂商都有价值的信息,已经成为一个非常重要的研究领域。在产品评论中,用户往往会用不同的词语描述同一产品特征。识别这些产品特征同义词才能更好地进行观点汇总。该文经过对产品评论的分析,抽取了must-link和can-not-link两类约束,并使用约束层次聚类算法识别产品特征同义词。同时,比较了几种不同产品特征相似度计算方法的结果。实验结果表明,该文的方法在实际产品评论数据集上取得了较好的效果。
With the great development of e-commerce,the product review mining has recently received a lot of attention.In product reviews,people often use different words and phrases to describe the same product feature,which are necessary to be recognized as synonyms for effective opinion summary.In this paper,we first calculate the similarity of product features.Then the must-link and cannot-link constraints are exacted based on the analysis of product reviews.Finally,the constrained hierarchical clustering algorithm and the extracted constraints are applied to recognize product feature synonyms.Experiments on diverse real-life datasets show promising results.
作者
郗亚辉
XI Yahui(College of Mathematics and Computer Science, HeBei University, Baoding, Hebei 071002, China)
出处
《中文信息学报》
CSCD
北大核心
2016年第4期150-158,共9页
Journal of Chinese Information Processing
基金
国家自然科学基金(61170039)
关键词
产品评论挖掘
产品特征同义词
相似度
约束层次聚类算法
product review mining
product feature synonyms
similarity
constrained hierarchical clustering algorithm