In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified fro...In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method.展开更多
By introducing Rough Set Theory and the principle of Support vector machine,a gear fault diagnosis method based on them is proposed.Firstly,diagnostic decision-making is reduced based on rough set theory,and the noise...By introducing Rough Set Theory and the principle of Support vector machine,a gear fault diagnosis method based on them is proposed.Firstly,diagnostic decision-making is reduced based on rough set theory,and the noise and redundancy in the sample are removed,then,according to the chosen reduction,a support vector machine multi-classifier is designed for gear fault diagnosis.Therefore,SVM’training data can be reduced and running speed can quicken.Test shows its accuracy and effi- ciency of gear fault diagnosis.展开更多
The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier us...The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.展开更多
The local climate zone(LCZ)scheme has been widely utilized in regional climate modeling,urban planning,and thermal comfort investigations.However,existing LCz classification methods face challenges in characterizing c...The local climate zone(LCZ)scheme has been widely utilized in regional climate modeling,urban planning,and thermal comfort investigations.However,existing LCz classification methods face challenges in characterizing complex urban structures and human activities involving local climatic environments.In this study,we proposed a novel LCZ mapping method that fully uses space-borne multi-view and diurnal observations,i.e.daytime Ziyuan-3 stereo imageries(2.1 m)and Luojia-1 nighttime light(NTL)data(130 m).Firstly,we performed land cover classification using multiple machine learning methods(i.e.random forest(RF)and XGBoost algorithms)and various features(i.e.spectral,textural,multi-view features,3D urban structure parameters(USPs),and NTL).In addition,we developed a set of new cumulative elevation indexes to improve building roughness assessments.The indexes can estimate building roughness directly from fused point clouds generated by both along-and across-track modes.Finally,based on the land cover and building roughness results,we extracted 2D and 3D USPs for different land covers and used multi-classifiers to perform LCZ mapping.The results for Beijing,China,show that our method yielded satisfactory accuracy for LCZ mapping,with an overall accuracy(OA)of 90.46%.The overall accuracy of land cover classification using 3D USPs generated from both along-and across-track modes increased by 4.66%,compared to that of using the single along-track mode.Additionally,the OA value of LCZ mapping using 2D and 3D USPs(88.18%)achieved a better result than using only 2D USPs(83.83%).The use of NTL data increased the classification accuracy of LCZs E(bare rock or paved)and F(bare soil or sand)by 6.54%and 3.94%,respectively.The refined LCZ classification achieved through this study will not only contribute to more accurate regional climate modeling but also provide valuable guidance for urban planning initiatives aimed at enhancing thermal comfort and overall livabillity in urban areas.Ultimately,this study paves the way for more comprehensive and effective strategies in addressing the challenges posed by urban microclimates.展开更多
文摘In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method.
文摘By introducing Rough Set Theory and the principle of Support vector machine,a gear fault diagnosis method based on them is proposed.Firstly,diagnostic decision-making is reduced based on rough set theory,and the noise and redundancy in the sample are removed,then,according to the chosen reduction,a support vector machine multi-classifier is designed for gear fault diagnosis.Therefore,SVM’training data can be reduced and running speed can quicken.Test shows its accuracy and effi- ciency of gear fault diagnosis.
文摘The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.
基金supported by the National Natural Science Foundation of China[grant number:41930650]the Scientific Research Project of Beijing Municipal Education Commission[grant number:KM202110016004]the Beijing Key Laboratory of Urban Spatial Information Engineering[grant number 20220111].
文摘The local climate zone(LCZ)scheme has been widely utilized in regional climate modeling,urban planning,and thermal comfort investigations.However,existing LCz classification methods face challenges in characterizing complex urban structures and human activities involving local climatic environments.In this study,we proposed a novel LCZ mapping method that fully uses space-borne multi-view and diurnal observations,i.e.daytime Ziyuan-3 stereo imageries(2.1 m)and Luojia-1 nighttime light(NTL)data(130 m).Firstly,we performed land cover classification using multiple machine learning methods(i.e.random forest(RF)and XGBoost algorithms)and various features(i.e.spectral,textural,multi-view features,3D urban structure parameters(USPs),and NTL).In addition,we developed a set of new cumulative elevation indexes to improve building roughness assessments.The indexes can estimate building roughness directly from fused point clouds generated by both along-and across-track modes.Finally,based on the land cover and building roughness results,we extracted 2D and 3D USPs for different land covers and used multi-classifiers to perform LCZ mapping.The results for Beijing,China,show that our method yielded satisfactory accuracy for LCZ mapping,with an overall accuracy(OA)of 90.46%.The overall accuracy of land cover classification using 3D USPs generated from both along-and across-track modes increased by 4.66%,compared to that of using the single along-track mode.Additionally,the OA value of LCZ mapping using 2D and 3D USPs(88.18%)achieved a better result than using only 2D USPs(83.83%).The use of NTL data increased the classification accuracy of LCZs E(bare rock or paved)and F(bare soil or sand)by 6.54%and 3.94%,respectively.The refined LCZ classification achieved through this study will not only contribute to more accurate regional climate modeling but also provide valuable guidance for urban planning initiatives aimed at enhancing thermal comfort and overall livabillity in urban areas.Ultimately,this study paves the way for more comprehensive and effective strategies in addressing the challenges posed by urban microclimates.