期刊文献+

基于非均衡局部敏感哈希的并行文本分类研究

Research on Parallel Text Classification System Based on Non-Balanced LSH
下载PDF
导出
摘要 针对KNN分类算法在面对海量文本处理时效率低下的问题,提出了一种基于超平面的非均衡局部敏感哈希分类算法,该分类算法相比于传统的局部敏感哈希算法在提高分类的准确性和实时性上有显著的效果.同时,为了进一步降低分类算法的执行时间,提高分类效率,将该分类算法与Spark并行计算模型结合,在大数据处理平台Hadoop上实现了一种高效的并行文本分类系统.实验结果表明,所设计的文本分类系统在具有较高分类速度的同时保持了较高的分类准确性. In order to solve the problem of low efficiency of the K-Nearset Neighbors(KNN) classification algorithm in face of massive text, a non-balanced local sensitive hash classification algorithm based on hyper-plane is proposed, which has a more significant effect than the traditional local sensitive hash algorithm on improving the accuracy and real-time performance. At the same time, in order to further reduce the execution time of the classification algorithm and improve the classification efficiency, an efficient parallel text classification system baseed on Hadoop is designed which combines the classification algorithm and the Spark parallel computing model. The experimental results show that such text classification system has a high classification speed and a high classification accuracy.
出处 《微电子学与计算机》 CSCD 北大核心 2017年第12期67-73,共7页 Microelectronics & Computer
关键词 KNN非均衡局部敏感哈希 HADOOP SPARK KNN non-balanced local sensitive hash(NBLSH) ~ hadoop~ spark
  • 相关文献

参考文献7

二级参考文献72

共引文献97

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部