摘要
本文对目前比较优秀的各种分类方法进行了介绍、分析和比较。在此基础上,借鉴决策树方法的快速分类特性,提出了一种基于数据库抽样的海量数据分类算法,给出了算法的设计思想和实现原理,并对多处理环境下的优化进行了探讨。实验研究表明,该算法可以明显提高海量数据库的分类效率。
In this paper, some excellent classifying methods are introduced and analyzed first. Then the quick classifying character of decision tree method is used for reference, and a mass data classification algorithm is proposed based on database sampling. Both the designing thoughts and implementation principle of the algorithm are given. The optimization of the algorithm is also discussed in multi-processor environment. An example shows that, this classifying algorithm can improve the efficiency of classification in mass database.
出处
《计算机科学》
CSCD
北大核心
2008年第6期299-300,F0003,共3页
Computer Science
关键词
分类
算法
海量数据
数据库
Classification,Algorithm,Mass data,Database