Today, mammography is the best method for early detection of breast cancer. Radiologists failed to detect evident cancerous signs in approximately 20% of false negative mammograms. False negatives have been identified...Today, mammography is the best method for early detection of breast cancer. Radiologists failed to detect evident cancerous signs in approximately 20% of false negative mammograms. False negatives have been identified as the inability of the radiologist to detect the abnormalities due to several reasons such as poor image quality, image noise, or eye fatigue. This paper presents a framework for a computer aided detection system that integrates Principal Component Analysis (PCA), Fisher Linear Discriminant (FLD), and Nearest Neighbor Classifier (KNN) algorithms for the detection of abnormalities in mammograms. Using normal and abnormal mammograms from the MIAS database, the integrated algorithm achieved 93.06% classification accuracy. Also in this paper, we present an analysis of the integrated algorithm’s parameters and suggest selection criteria.展开更多
There are four serious problems in the discriminant analysis. We developed an optimal linear discriminant function (optimal LDF) based on the minimum number of misclassification (minimum NM) using integer programm...There are four serious problems in the discriminant analysis. We developed an optimal linear discriminant function (optimal LDF) based on the minimum number of misclassification (minimum NM) using integer programming (IP). We call this LDF as Revised IP-OLDF. Only this LDF can discriminate the cases on the discriminant hyperplane (Probleml). This LDF and a hard-margin SVM (H-SVM) can discriminate the lineary separable data (LSD) exactly. Another LDFs may not discriminate the LSD theoretically (Problem2). When Revised IP-OLDF discriminate the Swiss banknote data with six variables, we find MNM of two-variables model such as (X4, X6) is zero. Because MNMk decreases monotounusly (MNMk 〉= MNM(k+1)), sixteen MNMs including (X4, X6) are zero. Until now, because there is no research of the LSD, we surveyed another three linear separable data sets such as: 18 exam scores data sets, the Japanese 44 cars data and six microarray datasets. When we discriminate the exam scores with MNM=0, we find the generalized inverse matrix technique causes the serious Problem3 and confirmed this fact by the cars data. At last, we claim the discriminant analysis is not the inferential statistics because there is no standard errors (SEs) of error rates and discriminant coefficients (Problem4). Therefore, we poroposed the "100-fold cross validation for the small sample" method (the method). By this break-through, we can choose the best model having minimum mean of error rate (M2) in the validation sample and obtaine two 95% confidence intervals (CIs) of error rate and discriminant coefficients. When we discriminate the exam scores by this new method, we obtaine the surprising results seven LDFs except for Fisher's LDF are almost the same as the trivial LDFs. In this research, we discriminate the Japanese 44 cars data because we can discuss four problems. There are six independent variables to discriminate 29 regular cars and 15 small cars. This data is linear separable by the emission rate (X1) and the number of seats (X3). We examine the validity of the new model selection procedure of the discriminant analysis. We proposed the model with minimum mean of error rates (M2) in the validation samples is the best model. We had examined this procedure by the exam scores, and we obtain good results. Moreover, the 95% CI of eight LDFs offers us real perception of the discriminant theory. However, the exam scores are different from the ordinal data. Therefore, we apply our theory and procedure to the Japanese 44 cars data and confirmed the same conclution.展开更多
Foley-Sammon linear discriminant analysis (FSLDA) and uncorrelated linear discriminant analysis (ULDA) are two well-known kinds of linear discriminant analysis. Both ULDA and FSLDA search the kth discriminant vector i...Foley-Sammon linear discriminant analysis (FSLDA) and uncorrelated linear discriminant analysis (ULDA) are two well-known kinds of linear discriminant analysis. Both ULDA and FSLDA search the kth discriminant vector in an n-k+1 dimensional subspace, while they are subject to their respective constraints. Evidenced by strict demonstration, it is clear that in essence ULDA vectors are the covariance-orthogonal vectors of the corresponding eigen-equation. So, the algorithms for the covariance-orthogonal vectors are equivalent to the original algorithm of ULDA, which is time-consuming. Also, it is first revealed that the Fisher criterion value of each FSLDA vector must be not less than that of the corresponding ULDA vector by theory analysis. For a discriminant vector, the larger its Fisher criterion value is, the more powerful in discriminability it is. So, for FSLDA vectors, corresponding to larger Fisher criterion values is an advantage. On the other hand, in general any two feature components extracted by FSLDA vectors are statistically correlated with each other, which may make the discriminant vectors set at a disadvantageous position. In contrast to FSLDA vectors, any two feature components extracted by ULDA vectors are statistically uncorrelated with each other. Two experiments on CENPARMI handwritten numeral database and ORL database are performed. The experimental results are consistent with the theory analysis on Fisher criterion values of ULDA vectors and FSLDA vectors. The experiments also show that the equivalent algorithm of ULDA, presented in this paper, is much more efficient than the original algorithm of ULDA, as the theory analysis expects. Moreover, it appears that if there is high statistical correlation between feature components extracted by FSLDA vectors, FSLDA will not perform well, in spite of larger Fisher criterion value owned by every FSLDA vector. However, when the average correlation coefficient of feature components extracted by FSLDA vectors is at a low level, the performance of FSLDA are comparable with ULDA.展开更多
文摘Today, mammography is the best method for early detection of breast cancer. Radiologists failed to detect evident cancerous signs in approximately 20% of false negative mammograms. False negatives have been identified as the inability of the radiologist to detect the abnormalities due to several reasons such as poor image quality, image noise, or eye fatigue. This paper presents a framework for a computer aided detection system that integrates Principal Component Analysis (PCA), Fisher Linear Discriminant (FLD), and Nearest Neighbor Classifier (KNN) algorithms for the detection of abnormalities in mammograms. Using normal and abnormal mammograms from the MIAS database, the integrated algorithm achieved 93.06% classification accuracy. Also in this paper, we present an analysis of the integrated algorithm’s parameters and suggest selection criteria.
文摘There are four serious problems in the discriminant analysis. We developed an optimal linear discriminant function (optimal LDF) based on the minimum number of misclassification (minimum NM) using integer programming (IP). We call this LDF as Revised IP-OLDF. Only this LDF can discriminate the cases on the discriminant hyperplane (Probleml). This LDF and a hard-margin SVM (H-SVM) can discriminate the lineary separable data (LSD) exactly. Another LDFs may not discriminate the LSD theoretically (Problem2). When Revised IP-OLDF discriminate the Swiss banknote data with six variables, we find MNM of two-variables model such as (X4, X6) is zero. Because MNMk decreases monotounusly (MNMk 〉= MNM(k+1)), sixteen MNMs including (X4, X6) are zero. Until now, because there is no research of the LSD, we surveyed another three linear separable data sets such as: 18 exam scores data sets, the Japanese 44 cars data and six microarray datasets. When we discriminate the exam scores with MNM=0, we find the generalized inverse matrix technique causes the serious Problem3 and confirmed this fact by the cars data. At last, we claim the discriminant analysis is not the inferential statistics because there is no standard errors (SEs) of error rates and discriminant coefficients (Problem4). Therefore, we poroposed the "100-fold cross validation for the small sample" method (the method). By this break-through, we can choose the best model having minimum mean of error rate (M2) in the validation sample and obtaine two 95% confidence intervals (CIs) of error rate and discriminant coefficients. When we discriminate the exam scores by this new method, we obtaine the surprising results seven LDFs except for Fisher's LDF are almost the same as the trivial LDFs. In this research, we discriminate the Japanese 44 cars data because we can discuss four problems. There are six independent variables to discriminate 29 regular cars and 15 small cars. This data is linear separable by the emission rate (X1) and the number of seats (X3). We examine the validity of the new model selection procedure of the discriminant analysis. We proposed the model with minimum mean of error rates (M2) in the validation samples is the best model. We had examined this procedure by the exam scores, and we obtain good results. Moreover, the 95% CI of eight LDFs offers us real perception of the discriminant theory. However, the exam scores are different from the ordinal data. Therefore, we apply our theory and procedure to the Japanese 44 cars data and confirmed the same conclution.
基金The National Natural Science Foundation of China (Grant No.60472060 ,60473039 and 60472061)
文摘Foley-Sammon linear discriminant analysis (FSLDA) and uncorrelated linear discriminant analysis (ULDA) are two well-known kinds of linear discriminant analysis. Both ULDA and FSLDA search the kth discriminant vector in an n-k+1 dimensional subspace, while they are subject to their respective constraints. Evidenced by strict demonstration, it is clear that in essence ULDA vectors are the covariance-orthogonal vectors of the corresponding eigen-equation. So, the algorithms for the covariance-orthogonal vectors are equivalent to the original algorithm of ULDA, which is time-consuming. Also, it is first revealed that the Fisher criterion value of each FSLDA vector must be not less than that of the corresponding ULDA vector by theory analysis. For a discriminant vector, the larger its Fisher criterion value is, the more powerful in discriminability it is. So, for FSLDA vectors, corresponding to larger Fisher criterion values is an advantage. On the other hand, in general any two feature components extracted by FSLDA vectors are statistically correlated with each other, which may make the discriminant vectors set at a disadvantageous position. In contrast to FSLDA vectors, any two feature components extracted by ULDA vectors are statistically uncorrelated with each other. Two experiments on CENPARMI handwritten numeral database and ORL database are performed. The experimental results are consistent with the theory analysis on Fisher criterion values of ULDA vectors and FSLDA vectors. The experiments also show that the equivalent algorithm of ULDA, presented in this paper, is much more efficient than the original algorithm of ULDA, as the theory analysis expects. Moreover, it appears that if there is high statistical correlation between feature components extracted by FSLDA vectors, FSLDA will not perform well, in spite of larger Fisher criterion value owned by every FSLDA vector. However, when the average correlation coefficient of feature components extracted by FSLDA vectors is at a low level, the performance of FSLDA are comparable with ULDA.