We have presented an integrated approach based on supervised and unsupervised learning tech- nique to improve the accuracy of six predictive models. They are developed to predict outcome of tuberculosis treatment cour...We have presented an integrated approach based on supervised and unsupervised learning tech- nique to improve the accuracy of six predictive models. They are developed to predict outcome of tuberculosis treatment course and their accuracy needs to be improved as they are not precise as much as necessary. The integrated supervised and unsupervised learning method (ISULM) has been proposed as a new way to improve model accuracy. The dataset of 6450 Iranian TB patients under DOTS therapy was applied to initially select the significant predictors and then develop six predictive models using decision tree, Bayesian network, logistic regression, multilayer perceptron, radial basis function, and support vector machine algorithms. Developed models have integrated with k-mean clustering analysis to calculate more accurate predicted outcome of tuberculosis treatment course. Obtained results, then, have been evaluated to compare prediction accuracy before and after ISULM application. Recall, Precision, F-measure, and ROC area are other criteria used to assess the models validity as well as change percentage to show how different are models before and after ISULM. ISULM led to improve the prediction accuracy for all applied classifiers ranging between 4% and 10%. The most and least improvement for prediction accuracy were shown by logistic regression and support vector machine respectively. Pre-learning by k- mean clustering to relocate the objects and put similar cases in the same group can improve the classification accuracy in the process of integrating supervised and unsupervised learning.展开更多
NOAA-AVHRR data have been more and more used by scientists because of its short temporal resolution,large scope, inexpensive cost and broad wave bands. On macro and middle scale of vegetation remote sensing, NOAAAVHRR...NOAA-AVHRR data have been more and more used by scientists because of its short temporal resolution,large scope, inexpensive cost and broad wave bands. On macro and middle scale of vegetation remote sensing, NOAAAVHRR possesses an advantage when compared with other satellites. However, because NOAA-AVHRR also problem of low resolution, data distortion and geometrical distortion, in the area of application of NOAA-AVHRR in largescale vegetation - mapping, the accuracy of vegetation classification should be improved. This paper discuss the feasibilityof integrating the geographic data in GIS(Geographical Information System) and remotely sensed data in GIS. Under theenvironment of GIS, temperature, precipitation and elevation, which serve as main factors affecting vegetation growth,were processed by a mathematical model and qualified into geographic image under a certain grid system. The geographicimage were overlaid to the NOAA-AVHRR data which had been compressed and processed. In order to evaluate the usefulness of geographic data for vegetation classification, the area under study was digitally classified by two groups of interpreter: the proposed methodology using maximum likelihood classification assisted by the geographic database and a conventional maximum likelihood classification only. Both result were compared using Kappa statistics. The indices to both theproposed and the conventional digital classification methodology were 0. 668(yew good) and 0. 563(good), respetively.The geographic database rendered an improvement over the conventional digital classification. Furthermore, in this study,some problems related to multi-sources data integration are also discussed.展开更多
文摘We have presented an integrated approach based on supervised and unsupervised learning tech- nique to improve the accuracy of six predictive models. They are developed to predict outcome of tuberculosis treatment course and their accuracy needs to be improved as they are not precise as much as necessary. The integrated supervised and unsupervised learning method (ISULM) has been proposed as a new way to improve model accuracy. The dataset of 6450 Iranian TB patients under DOTS therapy was applied to initially select the significant predictors and then develop six predictive models using decision tree, Bayesian network, logistic regression, multilayer perceptron, radial basis function, and support vector machine algorithms. Developed models have integrated with k-mean clustering analysis to calculate more accurate predicted outcome of tuberculosis treatment course. Obtained results, then, have been evaluated to compare prediction accuracy before and after ISULM application. Recall, Precision, F-measure, and ROC area are other criteria used to assess the models validity as well as change percentage to show how different are models before and after ISULM. ISULM led to improve the prediction accuracy for all applied classifiers ranging between 4% and 10%. The most and least improvement for prediction accuracy were shown by logistic regression and support vector machine respectively. Pre-learning by k- mean clustering to relocate the objects and put similar cases in the same group can improve the classification accuracy in the process of integrating supervised and unsupervised learning.
文摘NOAA-AVHRR data have been more and more used by scientists because of its short temporal resolution,large scope, inexpensive cost and broad wave bands. On macro and middle scale of vegetation remote sensing, NOAAAVHRR possesses an advantage when compared with other satellites. However, because NOAA-AVHRR also problem of low resolution, data distortion and geometrical distortion, in the area of application of NOAA-AVHRR in largescale vegetation - mapping, the accuracy of vegetation classification should be improved. This paper discuss the feasibilityof integrating the geographic data in GIS(Geographical Information System) and remotely sensed data in GIS. Under theenvironment of GIS, temperature, precipitation and elevation, which serve as main factors affecting vegetation growth,were processed by a mathematical model and qualified into geographic image under a certain grid system. The geographicimage were overlaid to the NOAA-AVHRR data which had been compressed and processed. In order to evaluate the usefulness of geographic data for vegetation classification, the area under study was digitally classified by two groups of interpreter: the proposed methodology using maximum likelihood classification assisted by the geographic database and a conventional maximum likelihood classification only. Both result were compared using Kappa statistics. The indices to both theproposed and the conventional digital classification methodology were 0. 668(yew good) and 0. 563(good), respetively.The geographic database rendered an improvement over the conventional digital classification. Furthermore, in this study,some problems related to multi-sources data integration are also discussed.