摘要
提出一种基于IN算法构造分类器的剪枝优化算法C IN.针对IN算法利用对数似然比统计量进行假设检验存在的统计意义不明确的问题,本文算法在给定层每一节点引入了样本数阈值和属性值阈值的计算,从而保证检验的有效性.给出了算法的理论依据,并且推导出了对数似然比统计量计算公式成立条件.实验表明,该算法能够消减数据维数并且可以从大规模数据集中提取简明的规则.
This paper proposed a novel algorithm termed as CIN for classification based on IN ( information-theoretic network) algorithm. Aim at ignorance of statistical significance in statistical hypotesis testing by means of the log likelihood ratio in IN algorithm, the CIN algorithm in troduces the threshold of the number of records in each node of given layer so as to guarantee reliability of testing. At the same time, the theoretic basis of the algorithm is given and precondition for the validity of the log likelihood ratio is derived. Empirical results show that the data dimensionality can be reduced and compact rules can be extracted with the CIN algorithm.
出处
《信阳师范学院学报(自然科学版)》
CAS
北大核心
2007年第2期237-240,共4页
Journal of Xinyang Normal University(Natural Science Edition)
关键词
熵
互信息
对数似然比统计量
entropy
mutual information
the log likelihood ratio statistic