摘要
单词的统计特征在自然语言处理中具有广泛应用。针对统计特征对关键词抽取和文本分类精确度的影响,分析了八种常见的统计特征,通过情感词抽取和商品评论分类,研究统计特征在情感分析领域中的作用。利用八种统计特征构造文本向量空间模型,替代基于单词构造文本向量空间模型的方法,能够降低文本向量的维度,具有隐形语义空间(LSA/SVD)的压缩效果,在保证分类结果准确率的前提下有效降低了算法的复杂度,能够替代传统的向量空间模型。情感词提取实验的结果表明,通过结合统计特征与词性,情感词提取的准确率能够达到76. 4%,显著高于基于统计特征或单词词性的情感词提取算法;商品评论分类的测试结果表明,与传统的基于单词的文本情感分类相比,基于统计特征的商品评论分类的准确率提高了10. 8%。
The statistical features of words are widely used in natural language processing.This paper summarized eight types of statistical features,and studied the role of these features in extracting sentimental words and classifying product reviews.Different from the multi-dimensions of lexical elements in the vector space models(VSM),this paper only employed these 8 types of statistical features in representation of words or documents,which had the ability that could lower the VSM’s dimension and could effectively derive the latent semantic space without expensive time and space complexity of SVD calculation.Sentiment words extraction result show that combining these statistical features and PoS tags of words can achieve much higher extraction accuracy than other methods with precision of 76.4%.Product reviews classification results show that in contrast with sentimental words in constructing the feature space,exclusively using these 8 kinds of statistical features can improve classification precision by 10.8%.
作者
韩彤晖
杨东强
马宏伟
Han Tonghui;Yang Dongqiang;Ma Hongwei(School of Computer Science&Technology,Shandong Jianzhu University,Jinan 250100,China)
出处
《计算机应用研究》
CSCD
北大核心
2019年第3期866-872,共7页
Application Research of Computers
基金
国家教育部人文社会科学研究一般项目基金资助项目(15YJA740054)
关键词
统计特征
情感词提取
商品评论分类
statistical features
extracting sentimental words
classifying product reviews