摘要
当前网络舆情信息存在数据量大、流动快及数据非结构化等特点,难以实现对其快速、准确的分类。SVM算法和朴素贝叶斯算法都是性能优秀的传统分类算法,但无法满足快速处理海量数据。文章利用Ha-doop平台可并行处理分布式数据存储的优良特性,提出了HSVM_WNB分类算法,将采集的舆情文档依照HDFS架构进行本地化存储,并通过MapReduce进程完成并行分类处理。最后利用实验验证,本算法能够有效提升网络舆情分类能力与分类效率。
The network public opinion (NPO) information has such features as high volume, fast circulation and unstructured data, which makes it difficult to achieve a fast and accurate classification of the information. Both SVM algorithm and naive Bayes- ian classification algorithm are traditional classification algorithms with excellent performance, but they cannot meet the needs of quickly processing mass data. By using Hadoop platform with features of parallelly process the distributed data storage, this paper puts forward the HSVM-WNB classification algorithm. The NPO documents are locally stored under HDFS frame and the parallel classified processing is achieved through MapReduce. The final empirical result shows that the proposed algorithm effectively im- proves the NPO classification ability and efficiency.
出处
《统计与决策》
CSSCI
北大核心
2017年第14期45-48,共4页
Statistics & Decision
基金
四川大学中央高校基本科研业务费专项资金资助项目(2014SCU11054)