摘要
示例学习中传统的扩张矩阵理论和启发式算法是建立在正反例子集一致、没有噪音的基础上的.然而实际应用领域中的噪音数据,导致许多归纳能力很差的规则产生.本文提出从统计学的角度,对扩张矩阵理论的定义加以扩充,利用信息熵和拉普拉斯错误估计函数构造了扩张矩阵启发式算法ECA将该算法应用于几个实际领域的学习问题并与示例学习系统AES及AQ15等进行了比较.实验结果表明,ECA生成的规则简单,归纳能力强,较为有效地解换了实际应用中的噪音问题.
Traditional extension matrix theory and corresponding heuristic algorithms are based on the consistency of positive and negative examples set. However, many overfitting rules will be produced under the noisy data in application to real-world domains. In this paper, from the view of statistics, the basic definitions of extension matrix theory are extended and a rule description method based on probability is given. In which, the conception induced from the training examples can classify the training examples with a high correct probability (maybe not completely correct), and will give a high predictive correct rate for new examples. The information-theoretic entropy measure and Laplace error rate evaluation functions are applied to the path search in extension matrix, and a heuristic algorithm ECA based on entropy is presented. ECA is also applied to several real-world domains such as sleep examples and handwritten digit recognition, and is compared with AE5 and AQ15. The experimental results show that ECA can produce more simple and efficient rules, and can solve the noisy problem in practical application effectively.
出处
《计算机学报》
EI
CSCD
北大核心
1998年第7期619-626,共8页
Chinese Journal of Computers
基金
国家自然科学基金
国际合作项目!彩色匹配
哈工大校科技基金