摘要
【目的/意义】在线评论尤其是负向评论是消费者进行购买决策的重要依据。而现有减少信息冗余方法在负向在线评论中表现还有待提高。【方法/过程】文中提出了一种基于Gaussian LDA的负向评论谱聚类方法。首先,利用Gaussian LDA模型获取负向评论中的主题分布,然后通过主题分布来计算评论间的皮尔森相似度,并应用谱聚类算法实现负向评论聚类,最后提取每类距离簇中心最近的m条评论作为该类的代表性评论。【结果/结论】通过将Gaussian LDA、LDA、TF-IDF和Doc2Vec分别与谱聚类结合,以及将Gaussian LDA与K-means、DBSCAN、谱聚类结合进行交叉比较,验证了所提方法的优越性。据此提取的负向评论类别间区分度高,具有高度代表性,较好地解决了信息冗余问题。【创新/局限】先提取主题再进行聚类的多模型集成式聚类方法为解决评论信息冗余问题提供了新的方法和思路,也为研究文本挖掘、文本聚类提供了一种新的参考。
【Purpose/significance】Online reviews, especially negative reviews, are an important basis for consumers to make purchasing decisions. However, the existing methods for reducing information redundancy still need to be improved in negative online reviews.【Method/process】This paper proposes a clustering method of negative review spectrum based on Gaussian LDA. First, the Gaussian LDA model is used to obtain the topic distribution in negative reviews, then the Pearson similarity between reviews is calculated based on the topic distribution, and the negative clustering is implemented using a spectral clustering algorithm. Finally, each class is nearest to the cluster center M comments are representative of this category.【Result/conclusion】Gaussian LDA, LDA, TF-IDF and Doc2 Vec were combined with spectral clustering respectively, and cross-comparison of Gaussian LDA with K-means, DBSCAN, and spectral clustering was performed to verify the superiority of the method Sex. The negative comments extracted based on this are highly differentiated and highly representative, which solves the problem of information redundancy.【Innovation/limitation】The multi-model fusion clustering method that firstly extracting topics and then clustering provides a new method and ideas for solving the problem of review information redundancy. It also provides a new reference for the study of text mining and text clustering.
作者
吴银昊
那日萨
李慧
WU Yin-hao;ZHAO Narisa;LI Hui(Institute of Systems Engineering,Dalian University of Technology,Dalian 116024,China)
出处
《情报科学》
CSSCI
北大核心
2021年第3期136-142,共7页
Information Science
基金
国家自然科学基金面上项目“基于在线评论的网络消费者群体行为预测智能技术研究”(61471083)
教育部人文社科研究规划基金资助项目“基于在线评论的网络消费者群体行为机理及预测”(14YJA630044)
大连市科技创新基金项目“大连智慧城市建设中基于大数据的智能决策理论方法及支持技术研究”(2018J11CY009)。