摘要
ID3算法沿用的是机器学习算法,与数据库集成性差。提出一种基于SQL语句的ID3改进算法。通过SQL语句直接对保存在数据库中的数据表进行分组查询,计算测试属性的条件熵,并给出深度优先和广度优先生成子树的递归算法。实验证明,改进的ID3算法充分利用了SQL的高效性和C++语言的灵活性,降低了算法实现难度,高效实现大量数据的分类。
ID3 algorithm was inherited from machine learning, and has a poor integration with database. A new implementation of ID3 algorithm based on SQL was given, calculation of the test attribute condition entropy by send- ing SQL statements directly to the data table saved in the database for grouping query. And the depth-first and breadth-first spanning tree recursive algorithm were also given. Experiments show that the improved ID3 algorithm makes full use of the high efficiency of SQL and C + + language' s flexibility, reduces the difficulty of the algorithm' s implementation, classifies the large amounts of data efficiently.
出处
《科学技术与工程》
北大核心
2012年第34期9370-9373,共4页
Science Technology and Engineering
基金
绥化学院科学技术项目(KQ1201003)资助
关键词
ID3
决策树
信息熵
SQL语句
ID3 decision tree information entropySQL statement