Stratified sampling is often used in opinion polls to reduce standard errors,and it is known as variance reduction technique in sampling theory.The most common approach of resampling method is based on bootstrapping t...Stratified sampling is often used in opinion polls to reduce standard errors,and it is known as variance reduction technique in sampling theory.The most common approach of resampling method is based on bootstrapping the dataset with replacement.A main purpose of this work is to investigate extensions of the resampling methods in classification problems,specifically we use decision trees,from a family of stratification models to improve prediction accuracy by aggregating classifiers built on a perturbed dataset.We use bagging,as a method of estimating a good decision boundary according to a family of stratification models.The overall conclusion is that for decision trees,un-stratified bootstrapping with bagging can yield lower error rates than other sampling strategies for simulated datasets.Based on the results in these experiments,a possible explanation as to why un-stratified sampling is a best is because bagging is itself a method of stratification.展开更多
基金we would like to acknowledge the Research and Consulting Centre(RCC),University of Benghazi,Libya for funded this work.
文摘Stratified sampling is often used in opinion polls to reduce standard errors,and it is known as variance reduction technique in sampling theory.The most common approach of resampling method is based on bootstrapping the dataset with replacement.A main purpose of this work is to investigate extensions of the resampling methods in classification problems,specifically we use decision trees,from a family of stratification models to improve prediction accuracy by aggregating classifiers built on a perturbed dataset.We use bagging,as a method of estimating a good decision boundary according to a family of stratification models.The overall conclusion is that for decision trees,un-stratified bootstrapping with bagging can yield lower error rates than other sampling strategies for simulated datasets.Based on the results in these experiments,a possible explanation as to why un-stratified sampling is a best is because bagging is itself a method of stratification.