Diagnosis and treatment of breast cancer have been improved during the last decade; however, breast cancer is still a leading cause of death among women in the whole world. Early detection and accurate diagnosis of th...Diagnosis and treatment of breast cancer have been improved during the last decade; however, breast cancer is still a leading cause of death among women in the whole world. Early detection and accurate diagnosis of this disease has been demonstrated an approach to long survival of the patients. As an attempt to develop a reliable diagnosing method for breast cancer, we integrated support vector machine (SVM), k-nearest neighbor and probabilistic neural network into a complex machine learning approach to detect malignant breast tumour through a set of indicators consisting of age and ten cellular features of fine-needle aspiration of breast which were ranked according to signal-to-noise ratio to identify determinants distinguishing benign breast tumours from malignant ones. The method turned out to significantly improve the diagnosis, with a sensitivity of 94.04%, a specificity of 97.37%, and an overall accuracy up to 96.24% when SVM was adopted with the sigmoid kernel function under 5-fold cross validation. The results suggest that SVM is a promising methodology to be further developed into a practical adjunct implement to help discerning benign and malignant breast tumours and thus reduce the incidence of misdiagnosis.展开更多
Soil diagnostic horizons, which each have a set of quantified properties, play a key role in soil classification. However, they are difficult to predict, and few attempts have been made to map their spatial occurrence...Soil diagnostic horizons, which each have a set of quantified properties, play a key role in soil classification. However, they are difficult to predict, and few attempts have been made to map their spatial occurrence. We evaluated and compared four machine learning algorithms, namely, the classification and regression tree(CART), random forest(RF), boosted regression trees(BRT), and support vector machine(SVM), to map the occurrence of the soil mattic horizon in the northeastern Qinghai-Tibetan Plateau using readily available ancillary data. The mechanisms of resampling and ensemble techniques significantly improved prediction accuracies(measured based on area under the receiver operator characteristic curve score(AUC)) and produced more stable results for the BRT(AUC of 0.921 ± 0.012, mean ± standard deviation) and RF(0.908 ± 0.013) algorithms compared to the CART algorithm(0.784 ± 0.012), which is the most commonly used machine learning method. Although the SVM algorithm yielded a comparable AUC value(0.906 ± 0.006) to the RF and BRT algorithms, it is sensitive to parameter settings, which are extremely time-consuming.Therefore, we consider it inadequate for occurrence-distribution modeling. Considering the obvious advantages of high prediction accuracy, robustness to parameter settings, the ability to estimate uncertainty in prediction, and easy interpretation of predictor variables, BRT seems to be the most desirable method. These results provide an insight into the use of machine learning algorithms to map the mattic horizon and potentially other soil diagnostic horizons.展开更多
基金Joint Research Project Between Chongqing University and National University of Singapore (No. ARF-151-000-014-112)the Basic Research & Applied Basic Research Program of Chongqing University (No.71341103)Natural Science Foundation of Chongqing S & T Committee(No. CSTC,2006BB5240)
文摘Diagnosis and treatment of breast cancer have been improved during the last decade; however, breast cancer is still a leading cause of death among women in the whole world. Early detection and accurate diagnosis of this disease has been demonstrated an approach to long survival of the patients. As an attempt to develop a reliable diagnosing method for breast cancer, we integrated support vector machine (SVM), k-nearest neighbor and probabilistic neural network into a complex machine learning approach to detect malignant breast tumour through a set of indicators consisting of age and ten cellular features of fine-needle aspiration of breast which were ranked according to signal-to-noise ratio to identify determinants distinguishing benign breast tumours from malignant ones. The method turned out to significantly improve the diagnosis, with a sensitivity of 94.04%, a specificity of 97.37%, and an overall accuracy up to 96.24% when SVM was adopted with the sigmoid kernel function under 5-fold cross validation. The results suggest that SVM is a promising methodology to be further developed into a practical adjunct implement to help discerning benign and malignant breast tumours and thus reduce the incidence of misdiagnosis.
基金supported by the National Natural Science Foundation of China (Nos. 41501229, 41371224, 41130530, and 91325301)the China Postdoctoral Science Foundation (No. 2015M581876)
文摘Soil diagnostic horizons, which each have a set of quantified properties, play a key role in soil classification. However, they are difficult to predict, and few attempts have been made to map their spatial occurrence. We evaluated and compared four machine learning algorithms, namely, the classification and regression tree(CART), random forest(RF), boosted regression trees(BRT), and support vector machine(SVM), to map the occurrence of the soil mattic horizon in the northeastern Qinghai-Tibetan Plateau using readily available ancillary data. The mechanisms of resampling and ensemble techniques significantly improved prediction accuracies(measured based on area under the receiver operator characteristic curve score(AUC)) and produced more stable results for the BRT(AUC of 0.921 ± 0.012, mean ± standard deviation) and RF(0.908 ± 0.013) algorithms compared to the CART algorithm(0.784 ± 0.012), which is the most commonly used machine learning method. Although the SVM algorithm yielded a comparable AUC value(0.906 ± 0.006) to the RF and BRT algorithms, it is sensitive to parameter settings, which are extremely time-consuming.Therefore, we consider it inadequate for occurrence-distribution modeling. Considering the obvious advantages of high prediction accuracy, robustness to parameter settings, the ability to estimate uncertainty in prediction, and easy interpretation of predictor variables, BRT seems to be the most desirable method. These results provide an insight into the use of machine learning algorithms to map the mattic horizon and potentially other soil diagnostic horizons.