摘要
通过对商品评论的挖掘,商家可以更好地了解消费者的需求从而及时改善产品的设计。目前,针对商品评论的挖掘大多数采用的方法是提取有效的情感特征并利用分类器进行分类。然而由于电商评论文本表述方式多样、行文不规范,口语化等特点,数据稀疏,文档特征维度过高,样本不均衡以及情感词典领域依赖性等问题都导致情感特征的提取过程愈发困难。为了解决这些问题,论文提出一整套针对电商评论挖掘方法,其融合多种策略构建电商领域情感词典;将文本长度作为特征;结合语料库对停用词表进行优化;将文档频率和TF-IDF算法结合进行特征选择和特征加权。论文以热水器评论作为语料库,以支持向量机为核心对所提出方法进行验证,实验结果证明所提出的方法能在降低文本维度的同时可大幅度提高情感分类的准确度。
Sentiment mining in products review can help the manufacturers understand needs of customers fully.By far,most of the approaches on review mining are extracting effective sentiment features and classifying them by classifiers.However,extracting sentiment features is very difficult due to the diversity of expression,colloquialization,non-standard writing,data sparsity,unbalanced samples,high feature dimension,and domain sentiment lexicon dependency.Thus,a novel model for review mining with a SVM is proposed,which builds an sentiment dictionary in E-commerce,combining the corpus to optimize the stop list,adding the text length as a feature and combining document frequency and TF-IDF for feature selecting and weighing to reduce the dimensionality of feature,which can effectively overcome the above drawbacks.Empirical analysis on corpus of Water Heater reviews demonstrates that our model not only achieves a significant performance on accuracy of sentiment classification but also can reduce the text dimension.
作者
熊乐
饶泓
XIONG Le,RAO Hong(Department of Information Engineering, Nanehang University, Nanehang 330031, Chin)
出处
《南昌大学学报(理科版)》
CAS
北大核心
2018年第1期88-94,共7页
Journal of Nanchang University(Natural Science)
基金
国家自然科学基金资助项目(61262047)
江西省重点研发计划基金资助项目(20171BBE50063)
江西省教育厅科技基金资助项目(GJJ14141)