According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the chang...According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.展开更多
Understanding an underlying structure for phylogenetic trees is very important as it informs on the methods that should be employed during phylogenetic inference. The methods used under a structured population differ ...Understanding an underlying structure for phylogenetic trees is very important as it informs on the methods that should be employed during phylogenetic inference. The methods used under a structured population differ from those needed when a population is not structured. In this paper, we compared two supervised machine learning techniques, that is artificial neural network (ANN) and logistic regression models for prediction of an underlying structure for phylogenetic trees. We carried out parameter tuning for the models to identify optimal models. We then performed 10-fold cross-validation on the optimal models for both logistic regression?and ANN. We also performed a non-supervised technique called clustering to identify the number of clusters that could be identified from simulated phylogenetic trees. The trees were from?both structured?and non-structured populations. Clustering and prediction using classification techniques were?done using tree statistics such as Colless, Sackin and cophenetic indices, among others. Results from 10-fold cross-validation revealed that both logistic regression and ANN models had comparable results, with both models having average accuracy rates of over 0.75. Most of the clustering indices used resulted in 2 or 3 as the optimal number of clusters.展开更多
Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority cl...Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority class is a critically important issue.Feature selection is one method to address this issue. An effective feature selection method can choose a subset of features that favor in the accurate determination of the minority class. A decision tree is a classifier that can be built up by using different splitting criteria. Its advantage is the ease of detecting which feature is used as a splitting node. Thus, it is possible to use a decision tree splitting criterion as a feature selection method. In this paper, an embedded feature selection method using our proposed weighted Gini index(WGI) is proposed. Its comparison results with Chi2, F-statistic and Gini index feature selection methods show that F-statistic and Chi2 reach the best performance when only a few features are selected. As the number of selected features increases, our proposed method has the highest probability of achieving the best performance. The area under a receiver operating characteristic curve(ROC AUC) and F-measure are used as evaluation criteria. Experimental results with two datasets show that ROC AUC performance can be high, even if only a few features are selected and used, and only changes slightly as more and more features are selected. However, the performance of Fmeasure achieves excellent performance only if 20% or more of features are chosen. The results are helpful for practitioners to select a proper feature selection method when facing a practical problem.展开更多
Obstructive Sleep Apnea(OSA)is a respiratory syndrome that occurs due to insufficient airflow through the respiratory or respiratory arrest while sleeping and sometimes due to the reduced oxygen saturation.The aim of ...Obstructive Sleep Apnea(OSA)is a respiratory syndrome that occurs due to insufficient airflow through the respiratory or respiratory arrest while sleeping and sometimes due to the reduced oxygen saturation.The aim of this paper is to analyze the respiratory signal of a person to detect the Normal Breathing Activity and the Sleep Apnea(SA)activity.In the proposed method,the time domain and frequency domain features of respiration signal obtained from the PPG device are extracted.These features are applied to the Classification and Regression Tree(CART)-Particle Swarm Optimization(PSO)classifier which classifies the signal into normal breathing signal and sleep apnea signal.The proposed method is validated to measure the performance metrics like sensitivity,specificity,accuracy and F1 score by applying time domain and frequency domain features separately.Additionally,the performance of the CART-PSO(CPSO)classification algorithm is evaluated through comparing its measures with existing classification algorithms.Concurrently,the effect of the PSO algorithm in the classifier is validated by varying the parameters of PSO.展开更多
为提高植被分类的精度,在利用高光谱图像提取植被信息时需要考虑训练样本和地形等其他因素的影响。以长白山为研究背景,基于CART(Classification And Regression Tree)算法构建决策树模型,对高光谱图像进行植被分类。由于混合像元的影响...为提高植被分类的精度,在利用高光谱图像提取植被信息时需要考虑训练样本和地形等其他因素的影响。以长白山为研究背景,基于CART(Classification And Regression Tree)算法构建决策树模型,对高光谱图像进行植被分类。由于混合像元的影响,以采用PPI(Pixel Purity Index)提取的纯净像元作为训练样本,提取植被指数、纹理和地形等分类特征变量。基于这些变量构建CART决策树对植被分类,并将结果与最大似然法分类结果进行比较。结果表明,CART决策树分类法可实现光谱、纹理和地形特征的有效组合,有较好的分类效果。展开更多
基金supported by the China Earthquake Administration, Institute of Seismology Foundation (IS201526246)
文摘According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.
文摘Understanding an underlying structure for phylogenetic trees is very important as it informs on the methods that should be employed during phylogenetic inference. The methods used under a structured population differ from those needed when a population is not structured. In this paper, we compared two supervised machine learning techniques, that is artificial neural network (ANN) and logistic regression models for prediction of an underlying structure for phylogenetic trees. We carried out parameter tuning for the models to identify optimal models. We then performed 10-fold cross-validation on the optimal models for both logistic regression?and ANN. We also performed a non-supervised technique called clustering to identify the number of clusters that could be identified from simulated phylogenetic trees. The trees were from?both structured?and non-structured populations. Clustering and prediction using classification techniques were?done using tree statistics such as Colless, Sackin and cophenetic indices, among others. Results from 10-fold cross-validation revealed that both logistic regression and ANN models had comparable results, with both models having average accuracy rates of over 0.75. Most of the clustering indices used resulted in 2 or 3 as the optimal number of clusters.
基金supported in part by the National Science Foundation of USA(CMMI-1162482)
文摘Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority class is a critically important issue.Feature selection is one method to address this issue. An effective feature selection method can choose a subset of features that favor in the accurate determination of the minority class. A decision tree is a classifier that can be built up by using different splitting criteria. Its advantage is the ease of detecting which feature is used as a splitting node. Thus, it is possible to use a decision tree splitting criterion as a feature selection method. In this paper, an embedded feature selection method using our proposed weighted Gini index(WGI) is proposed. Its comparison results with Chi2, F-statistic and Gini index feature selection methods show that F-statistic and Chi2 reach the best performance when only a few features are selected. As the number of selected features increases, our proposed method has the highest probability of achieving the best performance. The area under a receiver operating characteristic curve(ROC AUC) and F-measure are used as evaluation criteria. Experimental results with two datasets show that ROC AUC performance can be high, even if only a few features are selected and used, and only changes slightly as more and more features are selected. However, the performance of Fmeasure achieves excellent performance only if 20% or more of features are chosen. The results are helpful for practitioners to select a proper feature selection method when facing a practical problem.
文摘Obstructive Sleep Apnea(OSA)is a respiratory syndrome that occurs due to insufficient airflow through the respiratory or respiratory arrest while sleeping and sometimes due to the reduced oxygen saturation.The aim of this paper is to analyze the respiratory signal of a person to detect the Normal Breathing Activity and the Sleep Apnea(SA)activity.In the proposed method,the time domain and frequency domain features of respiration signal obtained from the PPG device are extracted.These features are applied to the Classification and Regression Tree(CART)-Particle Swarm Optimization(PSO)classifier which classifies the signal into normal breathing signal and sleep apnea signal.The proposed method is validated to measure the performance metrics like sensitivity,specificity,accuracy and F1 score by applying time domain and frequency domain features separately.Additionally,the performance of the CART-PSO(CPSO)classification algorithm is evaluated through comparing its measures with existing classification algorithms.Concurrently,the effect of the PSO algorithm in the classifier is validated by varying the parameters of PSO.