摘要
【目的】针对基于机器学习的文本情感分类研究中的文本特征表示向量高维性问题,提出BPSO与随机子空间方法结合的选择性集成算法。【方法】在分析BPSO与随机子空间原理的基础上给出BPSO随机子空间的模型框架及算法流程。将中文评论语料进行特征化表示后,使用BPSO随机子空间进行实验验证和分析。【结果】通过改变随机子空间中子空间率的取值,研究标准随机子空间与BPSO随机子空间选择性集成对分类准确率和系统差异度的影响,结果表明BPSO随机子空间无论在分类准确率还是在系统差异度上均高于标准随机子空间。【局限】尚未在英文数据上进行验证。【结论】将BPSO应用于随机子空间方法构成一种新颖的选择性集成模型,不仅解决了特征向量空间高维性的问题,而且提高了分类的准确率和泛化能力,为中文文本情感分类提供了有效的方法。
[Objective] This paper aims to solve the issue of representing high dimensional features in Chinese sentiment analysis, with the help of RS_BPSO, a selective ensemble algorithm. [Methods] First, we developed the framework and algorithm of the proposed RS_BPSO model based on the theory of Random Subspace and Binary Particle Optimization. Then, we transformed the Chinese review corpus into structured feature vectors and examined the new model. [Results] We found that the diversity and accuracy of the RS_BPSO model better than the standard RS model. [Limitations] We did not run the proposed model with corpus in foreign languages. [Conclusions] The RS BPSO model could be an effective method to classify Chinese sentiments.
出处
《数据分析与知识发现》
CSSCI
CSCD
2017年第5期71-81,共11页
Data Analysis and Knowledge Discovery