Discriminant Analysis of the Linear Separable Data - Japanese 44 Cars

Discriminant Analysis of the Linear Separable Data - Japanese 44 Cars

下载PDF

导出

摘要 There are four serious problems in the discriminant analysis. We developed an optimal linear discriminant function （optimal LDF） based on the minimum number of misclassification （minimum NM） using integer programming （IP）. We call this LDF as Revised IP-OLDF. Only this LDF can discriminate the cases on the discriminant hyperplane （Probleml）. This LDF and a hard-margin SVM （H-SVM） can discriminate the lineary separable data （LSD） exactly. Another LDFs may not discriminate the LSD theoretically （Problem2）. When Revised IP-OLDF discriminate the Swiss banknote data with six variables, we find MNM of two-variables model such as （X4, X6） is zero. Because MNMk decreases monotounusly （MNMk 〉= MNM（k＋1））, sixteen MNMs including （X4, X6） are zero. Until now, because there is no research of the LSD, we surveyed another three linear separable data sets such as： 18 exam scores data sets, the Japanese 44 cars data and six microarray datasets. When we discriminate the exam scores with MNM=0, we find the generalized inverse matrix technique causes the serious Problem3 and confirmed this fact by the cars data. At last, we claim the discriminant analysis is not the inferential statistics because there is no standard errors （SEs） of error rates and discriminant coefficients （Problem4）. Therefore, we poroposed the ＂100-fold cross validation for the small sample＂ method （the method）. By this break-through, we can choose the best model having minimum mean of error rate （M2） in the validation sample and obtaine two 95% confidence intervals （CIs） of error rate and discriminant coefficients. When we discriminate the exam scores by this new method, we obtaine the surprising results seven LDFs except for Fisher＇s LDF are almost the same as the trivial LDFs. In this research, we discriminate the Japanese 44 cars data because we can discuss four problems. There are six independent variables to discriminate 29 regular cars and 15 small cars. This data is linear separable by the emission rate （X1） and the number of seats （X3）. We examine the validity of the new model selection procedure of the discriminant analysis. We proposed the model with minimum mean of error rates （M2） in the validation samples is the best model. We had examined this procedure by the exam scores, and we obtain good results. Moreover, the 95% CI of eight LDFs offers us real perception of the discriminant theory. However, the exam scores are different from the ordinal data. Therefore, we apply our theory and procedure to the Japanese 44 cars data and confirmed the same conclution.

作者 Shuichi Shinmura

机构地区 Faculty of Economics

出处《Journal of Statistical Science and Application》 2016年第4期165-178,共14页 统计科学与应用（英文版）

关键词 Model Selection Procedure Means of Error Rates Fisher＇s LDF Logistic Regression Support VectorMachine （SVM） Minimum Number of Misclassifications （minimum NM MNM） Revised IP-OLDF based onMNM criterion Revised IPLP-OLDF Revised LP-OLDF Linear Separable Data and Model K-fold Crossvalidation.

分类号 F [经济管理]

引文网络
相关文献

1Chen Caikou Shi Jun.LOCAL CORRELATION DISCRIMINANT ANALYSIS AND ITS SEMI-SUPERVISED EXTENSION[J].Journal of Electronics(China),2011,28(3):289-296. 被引量：1
2任慎明,俞国扬.The Linear Evolution of Tearing Mode in Toroidal Geometry[J].Plasma Science and Technology,2001,3(6):1055-1062.
3吴家梁,林宝勤,达新宇,吴凯.A linear-to-circular polarization converter based on I-shaped circular frequency selective surfaces[J].Chinese Physics B,2017,26(9):184-189.
4LIU Yan,SHI YuMing.Approximation of eigenvalues below the essential spectra of singular second-order symmetric linear difference equations[J].Science China Mathematics,2017,60(9):1661-1678.
5崔文泉,刘波.一种基于RKHS及半参数理论的非线性充分降维方法[J].中国科学技术大学学报,2016,46(11):898-906.

Journal of Statistical Science and Application

2016年第4期

浏览历史

内容加载中请稍等...

Discriminant Analysis of the Linear Separable Data - Japanese 44 Cars

相关作者

相关机构

相关主题

浏览历史