摘要
文本分类是目前众多大数据应用的核心问题.本文将Batch SVM增量算法与Bagging算法相结合,提出了一种增量文本分类算法.在云计算分布式处理框架Storm基础上整合所提出的算法,构建了一套高效的基于Storm云平台的在线增量文本分类机制,在真实数据集上构建的实验验证了所提机制的准确性和效率,在保证准确度达到90%的前提下,所提机制的处理时延较现有算法降低50%以上,可以有效实现在线文本分类问题.
Text classification is a core issue in big data applications.In this paper,an incremental text classification algorithm is proposed by combining Batch SVM incremental algorithm and Bagging algorithm.The proposed algorithm was integrated on the cloud computing distributed processing framework Storm.An efficient online incremental text classification mechanism was generated based on the Storm cloud platform.The experiments conducted with real data verified the accuracy and efficiency of the proposed mechanism.With the accuracy of 90%,the processing delay of the proposed mechanism is reduced by more than 50% compared with the existing algorithm.This mechanism can effectively achieve online text classification.
作者
韩耀廷
许志伟
刘利民
HAN Yao-ting;XU Zhi-wei;LIU Li-min(College of Data Science and Application,Inner Mongolia University of Technology,Hohhot 010080,China)
出处
《内蒙古工业大学学报(自然科学版)》
2018年第4期279-286,共8页
Journal of Inner Mongolia University of Technology:Natural Science Edition
基金
内蒙古自治区自然科学基金项目(2018MS06003)