摘要
In this paper, a new approach is presented to find the reference set for the nearest neighbor classifier. The optimal reference set, which has minimum sample size and satisfies a certain error rate threshold, is obtained through a Tabu search algorithm. When the error rate threshold is set to zero, the algorithm obtains a near minimal consistent subset of a given training set. While the threshold is set to a small appropriate value, the obtained reference set may compensate the bias of the nearest neighbor estimate. An aspiration criterion for Tabu search is introduced, which aims to prevent the search process from the inefficient wandering between the feasible and infeasible regions in the search space and speed up the convergence. Experimental results based on a number of typical data sets are presented and analyzed to illustrate the benefits of the proposed method. Compared to conventional methods, such as CNN and Dasarathy's algorithm, the size of the reduced reference sets is much smaller, and the nearest neighbor classification performance is better, especially when the error rate thresholds are set to appropriate nonzero values. The experimental results also illustrate that the MCS (minimal consistent set) of Dasarathy's algorithm is not minimal, and its candidate consistent set is not always ensured to reduce monotonically. A counter example is also given to confirm this claim.
In this paper, a new approach is presented to find the reference set for the nearest neighbor classifier. The optimal reference set, which has minimum sample size and satisfies a certain error rate threshold, is obtained through a Tabu search algorithm. When the error rate threshold is set to zero, the algorithm obtains a near minimal consistent subset of a given training set. While the threshold is set to a small appropriate value, the obtained reference set may compensate the bias of the nearest neighbor estimate. An aspiration criterion for Tabu search is introduced, which aims to prevent the search process from the inefficient wandering between the feasible and infeasible regions in the search space and speed up the convergence. Experimental results based on a number of typical data sets are presented and analyzed to illustrate the benefits of the proposed method. Compared to conventional methods, such as CNN and Dasarathy's algorithm, the size of the reduced reference sets is much smaller, and the nearest neighbor classification performance is better, especially when the error rate thresholds are set to appropriate nonzero values. The experimental results also illustrate that the MCS (minimal consistent set) of Dasarathy's algorithm is not minimal, and its candidate consistent set is not always ensured to reduce monotonically. A counter example is also given to confirm this claim.
基金
he National Natural Science Foundation of China (No.69675007) and Beijing MunicipalNatural Science Foundation (No.4972008).