针对卡方自动交互诊断(CHAID)决策树易过拟合的问题,提出CHAID随机森林方法(CHAID Random Forest,CHAID-RF)。该方法采用随机采样、随机选择特征以及集成的策略,将CHAID决策树作为基分类器,形成CHAID-RF。为了验证CHAID-RF的有效性,选取...针对卡方自动交互诊断(CHAID)决策树易过拟合的问题,提出CHAID随机森林方法(CHAID Random Forest,CHAID-RF)。该方法采用随机采样、随机选择特征以及集成的策略,将CHAID决策树作为基分类器,形成CHAID-RF。为了验证CHAID-RF的有效性,选取CART、CHAID、SVM、RF作为对比算法,以准确率、加权查准率、加权查全率、加权F值作为分类模型评价指标,以均方根误差作为回归模型评价指标,采用10个分类数据集和7个回归数据集进行验证。实验结果表明CHAID-RF可行有效。展开更多
After the excavation of the roadway,the original stress balance is destroyed,resulting in the redistribution of stress and the formation of an excavation damaged zone(EDZ)around the roadway.The thickness of EDZ is the...After the excavation of the roadway,the original stress balance is destroyed,resulting in the redistribution of stress and the formation of an excavation damaged zone(EDZ)around the roadway.The thickness of EDZ is the key basis for roadway stability discrimination and support structure design,and it is of great engineering significance to accurately predict the thickness of EDZ.Considering the advantages of machine learning(ML)in dealing with high-dimensional,nonlinear problems,a hybrid prediction model based on the random forest(RF)algorithm is developed in this paper.The model used the dragonfly algorithm(DA)to optimize two hyperparameters in RF,namely mtry and ntree,and used mean absolute error(MAE),rootmean square error(RMSE),determination coefficient(R^(2)),and variance accounted for(VAF)to evaluatemodel prediction performance.A database containing 217 sets of data was collected,with embedding depth(ED),drift span(DS),surrounding rock mass strength(RMS),joint index(JI)as input variables,and the excavation damaged zone thickness(EDZT)as output variable.In addition,four classic models,back propagation neural network(BPNN),extreme learning machine(ELM),radial basis function network(RBF),and RF were compared with the DA-RF model.The results showed that the DARF mold had the best prediction performance(training set:MAE=0.1036,RMSE=0.1514,R^(2)=0.9577,VAF=94.2645;test set:MAE=0.1115,RMSE=0.1417,R^(2)=0.9423,VAF=94.0836).The results of the sensitivity analysis showed that the relative importance of each input variable was DS,ED,RMS,and JI from low to high.展开更多
Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identificatio...Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identification of human body fluids,and has exhibited excellent performance in predicting single-source body fluids.The present study aims to develop a methylation SNaPshot multiplex system for body fluid identification,and accurately predict the mixture samples.In addition,the value of DNA methylation in the prediction of body fluid mixtures was further explored.Methods In the present study,420 samples of body fluid mixtures and 250 samples of single body fluids were tested using an optimized multiplex methylation system.Each kind of body fluid sample presented the specific methylation profiles of the 10 markers.Results Significant differences in methylation levels were observed between the mixtures and single body fluids.For all kinds of mixtures,the Spearman’s correlation analysis revealed a significantly strong correlation between the methylation levels and component proportions(1:20,1:10,1:5,1:1,5:1,10:1 and 20:1).Two random forest classification models were trained for the prediction of mixture types and the prediction of the mixture proportion of 2 components,based on the methylation levels of 10 markers.For the mixture prediction,Model-1 presented outstanding prediction accuracy,which reached up to 99.3%in 427 training samples,and had a remarkable accuracy of 100%in 243 independent test samples.For the mixture proportion prediction,Model-2 demonstrated an excellent accuracy of 98.8%in 252 training samples,and 98.2%in 168 independent test samples.The total prediction accuracy reached 99.3%for body fluid mixtures and 98.6%for the mixture proportions.Conclusion These results indicate the excellent capability and powerful value of the multiplex methylation system in the identification of forensic body fluid mixtures.展开更多
文摘针对卡方自动交互诊断(CHAID)决策树易过拟合的问题,提出CHAID随机森林方法(CHAID Random Forest,CHAID-RF)。该方法采用随机采样、随机选择特征以及集成的策略,将CHAID决策树作为基分类器,形成CHAID-RF。为了验证CHAID-RF的有效性,选取CART、CHAID、SVM、RF作为对比算法,以准确率、加权查准率、加权查全率、加权F值作为分类模型评价指标,以均方根误差作为回归模型评价指标,采用10个分类数据集和7个回归数据集进行验证。实验结果表明CHAID-RF可行有效。
基金funded by the National Science Foundation of China(42177164)the Distinguished Youth Science Foundation of Hunan Province of China(2022JJ10073)the Innovation-Driven Project of Central South University(2020CX040).
文摘After the excavation of the roadway,the original stress balance is destroyed,resulting in the redistribution of stress and the formation of an excavation damaged zone(EDZ)around the roadway.The thickness of EDZ is the key basis for roadway stability discrimination and support structure design,and it is of great engineering significance to accurately predict the thickness of EDZ.Considering the advantages of machine learning(ML)in dealing with high-dimensional,nonlinear problems,a hybrid prediction model based on the random forest(RF)algorithm is developed in this paper.The model used the dragonfly algorithm(DA)to optimize two hyperparameters in RF,namely mtry and ntree,and used mean absolute error(MAE),rootmean square error(RMSE),determination coefficient(R^(2)),and variance accounted for(VAF)to evaluatemodel prediction performance.A database containing 217 sets of data was collected,with embedding depth(ED),drift span(DS),surrounding rock mass strength(RMS),joint index(JI)as input variables,and the excavation damaged zone thickness(EDZT)as output variable.In addition,four classic models,back propagation neural network(BPNN),extreme learning machine(ELM),radial basis function network(RBF),and RF were compared with the DA-RF model.The results showed that the DARF mold had the best prediction performance(training set:MAE=0.1036,RMSE=0.1514,R^(2)=0.9577,VAF=94.2645;test set:MAE=0.1115,RMSE=0.1417,R^(2)=0.9423,VAF=94.0836).The results of the sensitivity analysis showed that the relative importance of each input variable was DS,ED,RMS,and JI from low to high.
基金supported by the grants from the Natural Science Foundation of Hubei Province(No.2020CFB780)the Fundamental Research Funds for the Central Universities(No.2017KFYXJJ020).
文摘Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identification of human body fluids,and has exhibited excellent performance in predicting single-source body fluids.The present study aims to develop a methylation SNaPshot multiplex system for body fluid identification,and accurately predict the mixture samples.In addition,the value of DNA methylation in the prediction of body fluid mixtures was further explored.Methods In the present study,420 samples of body fluid mixtures and 250 samples of single body fluids were tested using an optimized multiplex methylation system.Each kind of body fluid sample presented the specific methylation profiles of the 10 markers.Results Significant differences in methylation levels were observed between the mixtures and single body fluids.For all kinds of mixtures,the Spearman’s correlation analysis revealed a significantly strong correlation between the methylation levels and component proportions(1:20,1:10,1:5,1:1,5:1,10:1 and 20:1).Two random forest classification models were trained for the prediction of mixture types and the prediction of the mixture proportion of 2 components,based on the methylation levels of 10 markers.For the mixture prediction,Model-1 presented outstanding prediction accuracy,which reached up to 99.3%in 427 training samples,and had a remarkable accuracy of 100%in 243 independent test samples.For the mixture proportion prediction,Model-2 demonstrated an excellent accuracy of 98.8%in 252 training samples,and 98.2%in 168 independent test samples.The total prediction accuracy reached 99.3%for body fluid mixtures and 98.6%for the mixture proportions.Conclusion These results indicate the excellent capability and powerful value of the multiplex methylation system in the identification of forensic body fluid mixtures.