摘要
在海量数据集上执行情感分类任务时,传统的单机情感分类算法的扩展性成为系统的瓶颈。在云计算平台Hadoop上,实现了情感分类任务中特征提取、特征向量加权和情感分类等算法的MapReduce化。在情感语料数据集上,对各种子步骤组合下情感分类算法的精度及每种算法的时间开销进行了对比分析。实验结果验证了实现的并行化情感分类算法的有效性,同时它为用户选择合适算法实现情感分类任务提供了有价值的参考信息。
Abstract The scalability problem becomes a bottleneck for traditional stand-alone sentiment classi{ication algorithms due to the massive data We implemented {eature extraction, feature weighting and classification algorithms involved in sentiment classification task by using MapReduce technique on Hadoop platform. We evaluated our proposed paralle- lized sentiment classification algorithms on real data sets in terms of precision and time costs. Experimental results show the effectiveness of these parallelized sentiment classification algorithms and also provide valuable references for users to select suitable sentiment classi{ication algorithms according to user requirements.
出处
《计算机科学》
CSCD
北大核心
2013年第6期206-210,共5页
Computer Science
基金
国家自然科学基金项目(61035003)
科技部国际科技合作计划项目(2010DFA11030)
江苏省自然科学基金项目(BK2010054)资助