摘要
【目的】提出一种基于深度学习的方面词提取方法,实现差异化与精细化的挖掘分析。【方法】设计语境窗口自注意力(Context Window Self-Attention,CWSA)模型进行方面词提取,在把握文本整体信息的基础上,聚焦语境窗口内以及邻近文本的语义,从评论中挖掘细粒度的产品特征。在此基础上,采用方面级情感分析方法分析用户需求。【结果】根据京东手机评论构造了方面词提取和方面级情感分析中文数据集,CWSA模型在该数据集上F1分数达到89.65%,效果优于基线方面词提取模型。【局限】公开的中文领域方面词数据集较为匮乏,未来将构建多个产品的中文数据集以获得更丰富的实验分析,并在英文数据集上拓展模型的跨语言适应能力。【结论】在近90万条京东手机评论上进行模型的应用验证,表明所提模型能为企业提供差异化与精细化的挖掘分析。
[Objective] This paper proposes a new deep learning algorithm to extract aspect words, aiming to achieve differentiated and refined user demand analysis. [Methods] We designed a Context Window SelfAttention(CWSA) model to extract aspect words. This model focuses on semantics of the context window and adjacent texts based on overall information of the full-texts. Then, we extracted the fine-grained product features from their reviews. Finally, we conducted the aspect-level sentiment analysis to further examine user demands.[Results] The paper constructed a Chinese dataset for aspect word extraction and aspect-level sentiment analysis with nearly 900, 000 reviews of smartphones sold by JD. com. The proposed CWSA model’s F1 score reached 89.65% on this dataset, which was better than those of the baseline models. [Limitations] There are limited publicly accessible Chinese datasets for aspect word extraction and aspect-level sentiments. More Chinese and English datasets of multiple products need to be constructed to improve our model’s cross-language adaptability.[Conclusions] The proposed model improves differentiated and refined data mining.
作者
肖宇晗
林慧苹
Xiao Yuhan;Lin Huiping(School of Software&Microelectronics,Peking University,Beijing 102600,China)
出处
《数据分析与知识发现》
CSCD
北大核心
2023年第1期63-75,共13页
Data Analysis and Knowledge Discovery
基金
国家重点研发计划(项目编号:2018YFB1702900)的研究成果之一。
关键词
深度学习
方面词提取
情感分析
差异化需求挖掘
Deep Learning
Aspect Word Extraction
Sentiment Analysis
Differentiated Demand Mining