摘要
随着非平衡分类问题研究的深入,训练数据与测试数据如何划分成为一个值得思考的问题。针对非平衡文本情感分类数据集设计问题,通过下采样方法,对测试数据集设计了平衡与非平衡两种方案,给出了在不同任务需求下,选择相应的实验方案,并对验证分类器分类性能的评价指标进行了讨论。通过在真实的网络评论数据上的实验,验证了这些方案的合理性和适用性。
With the deep researching of the imbalanced classification problems,how to divide the training data and test data has become a worth considering question.Aiming at the imbalanced text sentiment classification problems,this paper has studied both balanced and imbalanced test data with under sampling methods.Discussed in different mission requirements,how to choose a proper scheme and evaluation index to verify the performance of the classifier.The experiments results indicate that proposed schemes are reasonable and applicative on two real network reviews datasets.
出处
《电脑开发与应用》
2013年第5期1-4,共4页
Computer Development & Applications
基金
国家自然科学基金资助项目(60970014
61272095)
山西省自然科学基金资助项目(2010011021-1)
山西省科技攻关项目(20110321027-02)
关键词
非平衡数据
情感分类
实验设计
imbalanced data
sentiment classification
experimental design