Decision forest is a well-renowned machine learning technique to address the detection and prediction problems related to clinical data.But,the tra-ditional decision forest(DF)algorithms have lower classification accu...Decision forest is a well-renowned machine learning technique to address the detection and prediction problems related to clinical data.But,the tra-ditional decision forest(DF)algorithms have lower classification accuracy and cannot handle high-dimensional feature space effectively.In this work,we pro-pose a bootstrap decision forest using penalizing attributes(BFPA)algorithm to predict heart disease with higher accuracy.This work integrates a significance-based attribute selection(SAS)algorithm with the BFPA classifier to improve the performance of the diagnostic system in identifying cardiac illness.The pro-posed SAS algorithm is used to determine the correlation among attributes and to select the optimum subset of feature space for learning and testing processes.BFPA selects the optimal number of learning and testing data points as well as the density of trees in the forest to realize higher prediction accuracy in classifying imbalanced datasets effectively.The effectiveness of the developed classifier is cautiously verified on the real-world database(i.e.,Heart disease dataset from UCI repository)by relating its enactment with many advanced approaches with respect to the accuracy,sensitivity,specificity,precision,and intersection over-union(IoU).The empirical results demonstrate that the intended classification approach outdoes other approaches with superior enactment regarding the accu-racy,precision,sensitivity,specificity,and IoU of 94.7%,99.2%,90.1%,91.1%,and 90.4%,correspondingly.Additionally,we carry out Wilcoxon’s rank-sum test to determine whether our proposed classifier with feature selection method enables a noteworthy enhancement related to other classifiers or not.From the experimental results,we can conclude that the integration of SAS and BFPA outperforms other classifiers recently reported in the literature.展开更多
The sustainable use of renewable resources has become an important issue worldwide in the move towards a less fossil-fuel-intensive future.Mainstream method for fulfilling this aim is to increase the share of renewabl...The sustainable use of renewable resources has become an important issue worldwide in the move towards a less fossil-fuel-intensive future.Mainstream method for fulfilling this aim is to increase the share of renewable energy and materials to substitute fossil fuels and to become fully independent from fossil fuels over the long-term.However, the environmental sustainability of this endeavor has been questioned.In addition,economic and social sustainability issues are also much debated topics in this particular context.Forest resources are often thought to contribute partially to achieving a so-called "carbon-neutral society".In this review, we discuss sustainability issues of using forest biomass.We present several sustainability indicators for ecological,economic and social dimensions and discuss the issues in applying them in sustainability impact assessments(SIAs).We also present a number of tools and methods previously used in conducting SIAs.We approach our study from the perspective of the Finnish forestry; in addition, various aspects regarding the application of SIAs in a broader context are also presented.One of the key conclusions of the study is that although sufficient data are available to measure many indicators accurately, the impacts may be very difficult to assess(e.g.impact of greenhouse gases on biodiversity) for conducting a holistic SIA.Furthermore, some indicators, such as "biodiversity", are difficult to quantify in the first place.Therefore, a mix of different methods, such as Multi-criteria Assessment, Life-cycle Assessment or Cost-Benefit Analysis, as well as different approaches(e.g.thresholds and strong/weak sustainability) are needed in aggregating the results of the impacts.SIAs are important in supporting and improving the acceptability of decision-making, but a certain degree of uncertainty will always have to be tolerated.展开更多
Accurate, updated information on the distribution of wetlands is essential for estimating net fluxes of greenhouse gases and for effectively protecting and managing wetlands. Because of their complex community structu...Accurate, updated information on the distribution of wetlands is essential for estimating net fluxes of greenhouse gases and for effectively protecting and managing wetlands. Because of their complex community structure and rich surface vegetation, deciduous broad-leaved forested swamps are considered to be one of the most difficult types of wetland to classify. In this research, with the support of remote sensing and geographic information system, multi-temporal radar images L-Palsar were used initially to extract the forest hydrological layer and phenology phase change layer as two variables through image analysis. Second, based on the environmental characteristics of forested swamps, three decision tree classifiers derived from the two variables were constructed to explore effective methods to identify deciduous broad-leaved forested swamps. Third, this study focused on analyzing the classification process between flat-forests, which are the most severely disturbed elements, and forested swamps. Finally, the application of the decision tree model will be discussed. The results showed that: 1) L-HH band(a L band with wavelength of 0–235 m in HH polarization mode; HH means Synthetic Aperture Radars transmit pulses in horizontal polarization and receive in horizontal polarization) in the leaf-off season is shown to be capable of detecting hydrologic conditions beneath the forest; 2) the accuracy of the classification(forested swamp and forest plat) was 81.5% based on hydrologic features, and 83.5% was achieved by combining hydrologic features and phenology response features, which indicated that hydrological characteristics under the forest played a key role. The HHOJ(refers to the band created by the subtraction with HH band in October and HH band in July) achieved by multi-temporal radar images did improve the classification accuracy, but not significantly, and more leaf-off radar images may be more efficient than multi-seasonal radar images for inland forested swamp mapping; 3) the lower separability between forested swamps dominated by vegetated surfaces and forest plat covered with litter was the main cause of the uncertainty in classification, which led to misleading interpretations of the pixel-based classification. Finally, through the analysis with kappa coefficients, it was shown that the value of the intersection point was an ideal choice for the variable.展开更多
Solubility has been widely regarded as a fundamental property of small molecule drugs and drug candidates,as it has a profound impact on the crystallization process.Solubility prediction,as an alternative to experimen...Solubility has been widely regarded as a fundamental property of small molecule drugs and drug candidates,as it has a profound impact on the crystallization process.Solubility prediction,as an alternative to experiments which can reduce waste and improve crystallization process efficiency,has attracted increasing attention.However,there are still many urgent challenges thus far.Herein we used seven descriptors based on understanding dissolution behavior to establish two solubility prediction models by machine learning algorithms.The solubility data of 120 active pharmaceutical ingredients(APIs)in ethanol were considered in the prediction models,which were constructed by random decision forests and artificial neural network with optimized data structure and model accuracy.Furthermore,a comparison with traditional prediction methods including the modified solubility equation and the quantitative structure-property relationships model was carried out.The highest accuracy shown by the testing set proves that the ML models have the best solubility prediction ability.Multiple linear regression and stepwise regression were used to further investigate the critical factor in determining solubility value.The results revealed that the API properties and the solute-solvent interaction both provide a nonnegligible contribution to the solubility value.展开更多
文摘Decision forest is a well-renowned machine learning technique to address the detection and prediction problems related to clinical data.But,the tra-ditional decision forest(DF)algorithms have lower classification accuracy and cannot handle high-dimensional feature space effectively.In this work,we pro-pose a bootstrap decision forest using penalizing attributes(BFPA)algorithm to predict heart disease with higher accuracy.This work integrates a significance-based attribute selection(SAS)algorithm with the BFPA classifier to improve the performance of the diagnostic system in identifying cardiac illness.The pro-posed SAS algorithm is used to determine the correlation among attributes and to select the optimum subset of feature space for learning and testing processes.BFPA selects the optimal number of learning and testing data points as well as the density of trees in the forest to realize higher prediction accuracy in classifying imbalanced datasets effectively.The effectiveness of the developed classifier is cautiously verified on the real-world database(i.e.,Heart disease dataset from UCI repository)by relating its enactment with many advanced approaches with respect to the accuracy,sensitivity,specificity,precision,and intersection over-union(IoU).The empirical results demonstrate that the intended classification approach outdoes other approaches with superior enactment regarding the accu-racy,precision,sensitivity,specificity,and IoU of 94.7%,99.2%,90.1%,91.1%,and 90.4%,correspondingly.Additionally,we carry out Wilcoxon’s rank-sum test to determine whether our proposed classifier with feature selection method enables a noteworthy enhancement related to other classifiers or not.From the experimental results,we can conclude that the integration of SAS and BFPA outperforms other classifiers recently reported in the literature.
基金supported by the research project“Sustainable,climate-neutral and resource-efficient forest-based bioeconomy”funded by the Strategic Research Council at the Academy of Finland(Council(Decision No.293380)
文摘The sustainable use of renewable resources has become an important issue worldwide in the move towards a less fossil-fuel-intensive future.Mainstream method for fulfilling this aim is to increase the share of renewable energy and materials to substitute fossil fuels and to become fully independent from fossil fuels over the long-term.However, the environmental sustainability of this endeavor has been questioned.In addition,economic and social sustainability issues are also much debated topics in this particular context.Forest resources are often thought to contribute partially to achieving a so-called "carbon-neutral society".In this review, we discuss sustainability issues of using forest biomass.We present several sustainability indicators for ecological,economic and social dimensions and discuss the issues in applying them in sustainability impact assessments(SIAs).We also present a number of tools and methods previously used in conducting SIAs.We approach our study from the perspective of the Finnish forestry; in addition, various aspects regarding the application of SIAs in a broader context are also presented.One of the key conclusions of the study is that although sufficient data are available to measure many indicators accurately, the impacts may be very difficult to assess(e.g.impact of greenhouse gases on biodiversity) for conducting a holistic SIA.Furthermore, some indicators, such as "biodiversity", are difficult to quantify in the first place.Therefore, a mix of different methods, such as Multi-criteria Assessment, Life-cycle Assessment or Cost-Benefit Analysis, as well as different approaches(e.g.thresholds and strong/weak sustainability) are needed in aggregating the results of the impacts.SIAs are important in supporting and improving the acceptability of decision-making, but a certain degree of uncertainty will always have to be tolerated.
基金Under the auspices of Special Funds of State Environmental Protection Public Welfare Industry(No.2011467032)
文摘Accurate, updated information on the distribution of wetlands is essential for estimating net fluxes of greenhouse gases and for effectively protecting and managing wetlands. Because of their complex community structure and rich surface vegetation, deciduous broad-leaved forested swamps are considered to be one of the most difficult types of wetland to classify. In this research, with the support of remote sensing and geographic information system, multi-temporal radar images L-Palsar were used initially to extract the forest hydrological layer and phenology phase change layer as two variables through image analysis. Second, based on the environmental characteristics of forested swamps, three decision tree classifiers derived from the two variables were constructed to explore effective methods to identify deciduous broad-leaved forested swamps. Third, this study focused on analyzing the classification process between flat-forests, which are the most severely disturbed elements, and forested swamps. Finally, the application of the decision tree model will be discussed. The results showed that: 1) L-HH band(a L band with wavelength of 0–235 m in HH polarization mode; HH means Synthetic Aperture Radars transmit pulses in horizontal polarization and receive in horizontal polarization) in the leaf-off season is shown to be capable of detecting hydrologic conditions beneath the forest; 2) the accuracy of the classification(forested swamp and forest plat) was 81.5% based on hydrologic features, and 83.5% was achieved by combining hydrologic features and phenology response features, which indicated that hydrological characteristics under the forest played a key role. The HHOJ(refers to the band created by the subtraction with HH band in October and HH band in July) achieved by multi-temporal radar images did improve the classification accuracy, but not significantly, and more leaf-off radar images may be more efficient than multi-seasonal radar images for inland forested swamp mapping; 3) the lower separability between forested swamps dominated by vegetated surfaces and forest plat covered with litter was the main cause of the uncertainty in classification, which led to misleading interpretations of the pixel-based classification. Finally, through the analysis with kappa coefficients, it was shown that the value of the intersection point was an ideal choice for the variable.
基金supported by the National Natural Science Foundation of China(Grant No.21938009).
文摘Solubility has been widely regarded as a fundamental property of small molecule drugs and drug candidates,as it has a profound impact on the crystallization process.Solubility prediction,as an alternative to experiments which can reduce waste and improve crystallization process efficiency,has attracted increasing attention.However,there are still many urgent challenges thus far.Herein we used seven descriptors based on understanding dissolution behavior to establish two solubility prediction models by machine learning algorithms.The solubility data of 120 active pharmaceutical ingredients(APIs)in ethanol were considered in the prediction models,which were constructed by random decision forests and artificial neural network with optimized data structure and model accuracy.Furthermore,a comparison with traditional prediction methods including the modified solubility equation and the quantitative structure-property relationships model was carried out.The highest accuracy shown by the testing set proves that the ML models have the best solubility prediction ability.Multiple linear regression and stepwise regression were used to further investigate the critical factor in determining solubility value.The results revealed that the API properties and the solute-solvent interaction both provide a nonnegligible contribution to the solubility value.