摘要
针对商品评论文本具有短文本及表述用词不规范的特点,探讨如何实现商品评论文本按照商品种类进行自动归类并提高其分类效果。通过TF-IDF和LDA构建训练集的核心词集,利用Word2Vec相似度计算方式对短文本进行特征扩展获得的商品评论文本作为分类对象,基于BERT模型实现分类,并设计相应的对比实验证明本方法的有效性。对商品评论文本扩展后使用BERT分类时,本文方法比未扩展时的F1值提升2.1%,比使用Hownet相似度计算方式扩展时的F1值提升0.9%。从基本原理、不同相似度计算方法以及用词方式等方面分析本方法有效性的原因。本文提出的方法能有效提升商品评论文本按照商品进行信息组织时的分类效果,可以应用于电子商务信息的信息组织及其相关理论方法研究等领域。
In view of the fact that texts of product reviews are short and words are informal,this research aims to explore how to automatically classify product review texts by product categories and improve the classification performance.The core words set of the training set is constructed through the TF-IDF and LDA model,and short texts are extended by Word2Vec similarity calculation method.After extension,the product reviews are categorized by the product categories based on the Bidirectional Encoder Representation of Transformer(BERT)model.And then we design corresponding comparative experiments to prove the effectiveness of this method.When using BERT classification for the product reviews after extension,the F1 value obtained by the method proposed in this paper is 2.1 percent higher than are not extended,and it is 0.9 percent higher than that when using HowNet similarity calculation method.The reasons for the effectiveness of the method proposed in this paper are analyzed from the aspects of basic principles,different word similarity calculation methods,and words used methods.The method proposed in this paper can effectively improve the classification performance of the product reviews when organizing information by product categories,and be applied to the field of information organization of e-commerce information and research on related theories and methods.
作者
李湘东
孙倩茹
石健
Li Xiangdong;Sun Qianru;Shi Jian(School of Information Management,Wuhan University;Center for Electronic Commerce Research and Development,Wuhan University,Wuhan,430072)
出处
《信息资源管理学报》
CSSCI
2023年第1期129-139,共11页
Journal of Information Resources Management