Weused two probabilisticmethods,Gaussian Naïve Bayes and Logistic Regression to predict the genotypes of the offspring of two maize strains,the BLC and the JNE genotypes,based on the phenotypic traits of the pare...Weused two probabilisticmethods,Gaussian Naïve Bayes and Logistic Regression to predict the genotypes of the offspring of two maize strains,the BLC and the JNE genotypes,based on the phenotypic traits of the parents.We determined the prediction performance of the two models with the overall accuracy and the area under the receiver operating curve(AUC).The overall accuracy for both models ranged between 82%and 87%.The values of the area under the receiver operating curvewere 0.90 or higher for Logistic Regression models,and 0.85 or higher for Gaussian Naïve Bayesmodels.These statistics indicated that the two models were very effective in predicting the genotypes of the offspring.Furthermore,bothmodels predicted the BLC genotype with higher accuracy than they did the JNE genotype.The BLC genotype appeared more homogeneous and more predictable.A Chi-square test for the homogeneity of the confusionmatrices showed that in all cases the twomodels produced similar prediction results.That finding was in line with the assertion by Mitchell(2010)who theoretically showed that the twomodels are essentially the same.With logistic regression,each subset of the original data or its corresponding principal components produced exactly the same prediction results.The AUC value may be viewed as a criterion for parent-offspring resemblance for each set of phenotypic traits considered in the analysis.展开更多
文摘Weused two probabilisticmethods,Gaussian Naïve Bayes and Logistic Regression to predict the genotypes of the offspring of two maize strains,the BLC and the JNE genotypes,based on the phenotypic traits of the parents.We determined the prediction performance of the two models with the overall accuracy and the area under the receiver operating curve(AUC).The overall accuracy for both models ranged between 82%and 87%.The values of the area under the receiver operating curvewere 0.90 or higher for Logistic Regression models,and 0.85 or higher for Gaussian Naïve Bayesmodels.These statistics indicated that the two models were very effective in predicting the genotypes of the offspring.Furthermore,bothmodels predicted the BLC genotype with higher accuracy than they did the JNE genotype.The BLC genotype appeared more homogeneous and more predictable.A Chi-square test for the homogeneity of the confusionmatrices showed that in all cases the twomodels produced similar prediction results.That finding was in line with the assertion by Mitchell(2010)who theoretically showed that the twomodels are essentially the same.With logistic regression,each subset of the original data or its corresponding principal components produced exactly the same prediction results.The AUC value may be viewed as a criterion for parent-offspring resemblance for each set of phenotypic traits considered in the analysis.