There are four serious problems in the discriminant analysis. We developed an optimal linear discriminant function (optimal LDF) based on the minimum number of misclassification (minimum NM) using integer programm...There are four serious problems in the discriminant analysis. We developed an optimal linear discriminant function (optimal LDF) based on the minimum number of misclassification (minimum NM) using integer programming (IP). We call this LDF as Revised IP-OLDF. Only this LDF can discriminate the cases on the discriminant hyperplane (Probleml). This LDF and a hard-margin SVM (H-SVM) can discriminate the lineary separable data (LSD) exactly. Another LDFs may not discriminate the LSD theoretically (Problem2). When Revised IP-OLDF discriminate the Swiss banknote data with six variables, we find MNM of two-variables model such as (X4, X6) is zero. Because MNMk decreases monotounusly (MNMk 〉= MNM(k+1)), sixteen MNMs including (X4, X6) are zero. Until now, because there is no research of the LSD, we surveyed another three linear separable data sets such as: 18 exam scores data sets, the Japanese 44 cars data and six microarray datasets. When we discriminate the exam scores with MNM=0, we find the generalized inverse matrix technique causes the serious Problem3 and confirmed this fact by the cars data. At last, we claim the discriminant analysis is not the inferential statistics because there is no standard errors (SEs) of error rates and discriminant coefficients (Problem4). Therefore, we poroposed the "100-fold cross validation for the small sample" method (the method). By this break-through, we can choose the best model having minimum mean of error rate (M2) in the validation sample and obtaine two 95% confidence intervals (CIs) of error rate and discriminant coefficients. When we discriminate the exam scores by this new method, we obtaine the surprising results seven LDFs except for Fisher's LDF are almost the same as the trivial LDFs. In this research, we discriminate the Japanese 44 cars data because we can discuss four problems. There are six independent variables to discriminate 29 regular cars and 15 small cars. This data is linear separable by the emission rate (X1) and the number of seats (X3). We examine the validity of the new model selection procedure of the discriminant analysis. We proposed the model with minimum mean of error rates (M2) in the validation samples is the best model. We had examined this procedure by the exam scores, and we obtain good results. Moreover, the 95% CI of eight LDFs offers us real perception of the discriminant theory. However, the exam scores are different from the ordinal data. Therefore, we apply our theory and procedure to the Japanese 44 cars data and confirmed the same conclution.展开更多
文摘There are four serious problems in the discriminant analysis. We developed an optimal linear discriminant function (optimal LDF) based on the minimum number of misclassification (minimum NM) using integer programming (IP). We call this LDF as Revised IP-OLDF. Only this LDF can discriminate the cases on the discriminant hyperplane (Probleml). This LDF and a hard-margin SVM (H-SVM) can discriminate the lineary separable data (LSD) exactly. Another LDFs may not discriminate the LSD theoretically (Problem2). When Revised IP-OLDF discriminate the Swiss banknote data with six variables, we find MNM of two-variables model such as (X4, X6) is zero. Because MNMk decreases monotounusly (MNMk 〉= MNM(k+1)), sixteen MNMs including (X4, X6) are zero. Until now, because there is no research of the LSD, we surveyed another three linear separable data sets such as: 18 exam scores data sets, the Japanese 44 cars data and six microarray datasets. When we discriminate the exam scores with MNM=0, we find the generalized inverse matrix technique causes the serious Problem3 and confirmed this fact by the cars data. At last, we claim the discriminant analysis is not the inferential statistics because there is no standard errors (SEs) of error rates and discriminant coefficients (Problem4). Therefore, we poroposed the "100-fold cross validation for the small sample" method (the method). By this break-through, we can choose the best model having minimum mean of error rate (M2) in the validation sample and obtaine two 95% confidence intervals (CIs) of error rate and discriminant coefficients. When we discriminate the exam scores by this new method, we obtaine the surprising results seven LDFs except for Fisher's LDF are almost the same as the trivial LDFs. In this research, we discriminate the Japanese 44 cars data because we can discuss four problems. There are six independent variables to discriminate 29 regular cars and 15 small cars. This data is linear separable by the emission rate (X1) and the number of seats (X3). We examine the validity of the new model selection procedure of the discriminant analysis. We proposed the model with minimum mean of error rates (M2) in the validation samples is the best model. We had examined this procedure by the exam scores, and we obtain good results. Moreover, the 95% CI of eight LDFs offers us real perception of the discriminant theory. However, the exam scores are different from the ordinal data. Therefore, we apply our theory and procedure to the Japanese 44 cars data and confirmed the same conclution.