This paper proposes an adaptive and diverse hybrid-based ensemble method to improve the performance of binary classification. The proposed method is a non-linear combination of base models and the application of adapt...This paper proposes an adaptive and diverse hybrid-based ensemble method to improve the performance of binary classification. The proposed method is a non-linear combination of base models and the application of adaptive selection of the most suitable model for each data instance. Ensemble method, an important machine learning technique uses multiple single models to construct a hybrid model. A hybrid model generally performs better compared to a single individual model. In a given dataset the application of diverse single models trained with different machine learning algorithms will have different capabilities in recognizing patterns in the given training sample. The proposed approach has been validated on Repeat Buyers Prediction dataset and Census Income Prediction dataset. The experiment results indicate up to 18.5% improvement on F1 score for the Repeat Buyers dataset compared to the best individual model. This improvement also indicates that the proposed ensemble method has an exceptional ability of dealing with imbalanced datasets. In addition, the proposed method outperforms two other commonly used ensemble methods (Averaging and Stacking) in terms of improved F1 score. Finally, our results produced a slightly higher AUC score of 0.718 compared to the previous result of AUC score of 0.712 in the Repeat Buyers competition. This roughly 1% increase AUC score in performance is significant considering a very big dataset such as Repeat Buyers.展开更多
目的利用自适应合成抽样(adaptive synthetic sampling,ADASYN)与类别逆比例加权法处理类别不平衡数据,结合分类器构建模型对阿尔茨海默病(alzheimer′s disease,AD)患者疾病进程进行分类预测。方法数据源自阿尔茨海默病神经影像学计划(...目的利用自适应合成抽样(adaptive synthetic sampling,ADASYN)与类别逆比例加权法处理类别不平衡数据,结合分类器构建模型对阿尔茨海默病(alzheimer′s disease,AD)患者疾病进程进行分类预测。方法数据源自阿尔茨海默病神经影像学计划(Alzheimer′s disease neuroimaging initiative,ADNI),经随机森林填补缺失值,弹性网络筛选特征子集后,利用ADASYN与类别逆比例加权法处理类别不平衡数据。分别结合随机森林(random forest,RF)、支持向量机(support vector machine,SVM)构建四种模型:ADASYN-RF、ADASYN-SVM、加权随机森林(weighted random forest,WRF)、加权支持向量机(weighted support vector machine,WSVM),与RF、SVM比较分类性能。模型评价指标为宏观平均精确率(macro-average of precision,macro-P)、宏观平均召回率(macro-average of recall,macro-R)、宏观平均F1值(macro-average of F1-score,macro-F1)、准确率(accuracy,ACC)、Kappa值和AUC(area under the ROC curve)。结果ADASYN-RF的分类性能最优(Kappa值为0.938,AUC为0.980),ADASYN-SVM次之。利用ADASYN-RF预测得到的重要分类特征分别为CDRSB、LDELTOTAL、MMSE,在临床上均可得到证实。结论ADASYN与类别逆比例加权法都能辅助提升分类器性能,但ADASYN算法更优。展开更多
A rain-type adaptive pyramid Kanade-Lucas-Tomasi(A-PKLT)optical flow method for radar echo extrapolation is proposed.This method introduces a rain-type classification algorithm that can classify radar echoes into six ...A rain-type adaptive pyramid Kanade-Lucas-Tomasi(A-PKLT)optical flow method for radar echo extrapolation is proposed.This method introduces a rain-type classification algorithm that can classify radar echoes into six types:convective,stratiform,surrounding convective,isolated convective core,isolated convective fringe,and weak echoes.Then,new schemes are designed to optimize specific parameters of the PKLT optical flow based on the rain type of the echo.At the same time,the gradients of radar reflectivity in the fringe positions corresponding to all types of rain echoes are increased.As a result,corner points that are characteristic points used for PKLT optical flow tracking in the surrounding area will be increased.Therefore,more motion vectors are purposefully obtained in the whole radar echo area.This helps to describe the motion characteristics of the precipitation more precisely.Then,the motion vectors corresponding to each type of rain echo are merged,and a denser motion vector field is generated by an interpolation algorithm on the basis of merged motion vectors.Finally,the dense motion vectors are used to extrapolate rain echoes into 0-60-min nowcasts by a semi-Lagrangian scheme.Compared with other nowcasting methods for four landfalling typhoons in or near Shanghai,the new optical flow method is found to be more accurate than the traditional cross-correlation and optical flow methods,particularly showing a clear improvement in the nowcasting of convective echoes on the spiral rainbands of typhoons.展开更多
文摘This paper proposes an adaptive and diverse hybrid-based ensemble method to improve the performance of binary classification. The proposed method is a non-linear combination of base models and the application of adaptive selection of the most suitable model for each data instance. Ensemble method, an important machine learning technique uses multiple single models to construct a hybrid model. A hybrid model generally performs better compared to a single individual model. In a given dataset the application of diverse single models trained with different machine learning algorithms will have different capabilities in recognizing patterns in the given training sample. The proposed approach has been validated on Repeat Buyers Prediction dataset and Census Income Prediction dataset. The experiment results indicate up to 18.5% improvement on F1 score for the Repeat Buyers dataset compared to the best individual model. This improvement also indicates that the proposed ensemble method has an exceptional ability of dealing with imbalanced datasets. In addition, the proposed method outperforms two other commonly used ensemble methods (Averaging and Stacking) in terms of improved F1 score. Finally, our results produced a slightly higher AUC score of 0.718 compared to the previous result of AUC score of 0.712 in the Repeat Buyers competition. This roughly 1% increase AUC score in performance is significant considering a very big dataset such as Repeat Buyers.
文摘目的利用自适应合成抽样(adaptive synthetic sampling,ADASYN)与类别逆比例加权法处理类别不平衡数据,结合分类器构建模型对阿尔茨海默病(alzheimer′s disease,AD)患者疾病进程进行分类预测。方法数据源自阿尔茨海默病神经影像学计划(Alzheimer′s disease neuroimaging initiative,ADNI),经随机森林填补缺失值,弹性网络筛选特征子集后,利用ADASYN与类别逆比例加权法处理类别不平衡数据。分别结合随机森林(random forest,RF)、支持向量机(support vector machine,SVM)构建四种模型:ADASYN-RF、ADASYN-SVM、加权随机森林(weighted random forest,WRF)、加权支持向量机(weighted support vector machine,WSVM),与RF、SVM比较分类性能。模型评价指标为宏观平均精确率(macro-average of precision,macro-P)、宏观平均召回率(macro-average of recall,macro-R)、宏观平均F1值(macro-average of F1-score,macro-F1)、准确率(accuracy,ACC)、Kappa值和AUC(area under the ROC curve)。结果ADASYN-RF的分类性能最优(Kappa值为0.938,AUC为0.980),ADASYN-SVM次之。利用ADASYN-RF预测得到的重要分类特征分别为CDRSB、LDELTOTAL、MMSE,在临床上均可得到证实。结论ADASYN与类别逆比例加权法都能辅助提升分类器性能,但ADASYN算法更优。
基金This work was supported by National Key Research and Development Program of China(No.2018YFC1507601)National Natural Science Foundation of China(Grant No.41775049)Scientific Research Project of Shanghai Science and Technology Commission(No.18DZ12000403),and Severe Convection S&T Innovation Team of Shanghai Meteorological Service.
文摘A rain-type adaptive pyramid Kanade-Lucas-Tomasi(A-PKLT)optical flow method for radar echo extrapolation is proposed.This method introduces a rain-type classification algorithm that can classify radar echoes into six types:convective,stratiform,surrounding convective,isolated convective core,isolated convective fringe,and weak echoes.Then,new schemes are designed to optimize specific parameters of the PKLT optical flow based on the rain type of the echo.At the same time,the gradients of radar reflectivity in the fringe positions corresponding to all types of rain echoes are increased.As a result,corner points that are characteristic points used for PKLT optical flow tracking in the surrounding area will be increased.Therefore,more motion vectors are purposefully obtained in the whole radar echo area.This helps to describe the motion characteristics of the precipitation more precisely.Then,the motion vectors corresponding to each type of rain echo are merged,and a denser motion vector field is generated by an interpolation algorithm on the basis of merged motion vectors.Finally,the dense motion vectors are used to extrapolate rain echoes into 0-60-min nowcasts by a semi-Lagrangian scheme.Compared with other nowcasting methods for four landfalling typhoons in or near Shanghai,the new optical flow method is found to be more accurate than the traditional cross-correlation and optical flow methods,particularly showing a clear improvement in the nowcasting of convective echoes on the spiral rainbands of typhoons.