In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified fro...In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method.展开更多
By introducing Rough Set Theory and the principle of Support vector machine,a gear fault diagnosis method based on them is proposed.Firstly,diagnostic decision-making is reduced based on rough set theory,and the noise...By introducing Rough Set Theory and the principle of Support vector machine,a gear fault diagnosis method based on them is proposed.Firstly,diagnostic decision-making is reduced based on rough set theory,and the noise and redundancy in the sample are removed,then,according to the chosen reduction,a support vector machine multi-classifier is designed for gear fault diagnosis.Therefore,SVM’training data can be reduced and running speed can quicken.Test shows its accuracy and effi- ciency of gear fault diagnosis.展开更多
Fault diagnosis plays an important role in complicated industrial process.It is a challenging task to detect,identify and locate faults quickly and accurately for large-scale process system.To solve the problem,a nove...Fault diagnosis plays an important role in complicated industrial process.It is a challenging task to detect,identify and locate faults quickly and accurately for large-scale process system.To solve the problem,a novel Multi Boost-based integrated ENN(extension neural network) fault diagnosis method is proposed.Fault data of complicated chemical process have some difficult-to-handle characteristics,such as high-dimension,non-linear and non-Gaussian distribution,so we use margin discriminant projection(MDP) algorithm to reduce dimensions and extract main features.Then,the affinity propagation(AP) clustering method is used to select core data and boundary data as training samples to reduce memory consumption and shorten learning time.Afterwards,an integrated ENN classifier based on Multi Boost strategy is constructed to identify fault types.The artificial data sets are tested to verify the effectiveness of the proposed method and make a detailed sensitivity analysis for the key parameters.Finally,a real industrial system—Tennessee Eastman(TE) process is employed to evaluate the performance of the proposed method.And the results show that the proposed method is efficient and capable to diagnose various types of faults in complicated chemical process.展开更多
The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier us...The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.展开更多
The local climate zone(LCZ)scheme has been widely utilized in regional climate modeling,urban planning,and thermal comfort investigations.However,existing LCz classification methods face challenges in characterizing c...The local climate zone(LCZ)scheme has been widely utilized in regional climate modeling,urban planning,and thermal comfort investigations.However,existing LCz classification methods face challenges in characterizing complex urban structures and human activities involving local climatic environments.In this study,we proposed a novel LCZ mapping method that fully uses space-borne multi-view and diurnal observations,i.e.daytime Ziyuan-3 stereo imageries(2.1 m)and Luojia-1 nighttime light(NTL)data(130 m).Firstly,we performed land cover classification using multiple machine learning methods(i.e.random forest(RF)and XGBoost algorithms)and various features(i.e.spectral,textural,multi-view features,3D urban structure parameters(USPs),and NTL).In addition,we developed a set of new cumulative elevation indexes to improve building roughness assessments.The indexes can estimate building roughness directly from fused point clouds generated by both along-and across-track modes.Finally,based on the land cover and building roughness results,we extracted 2D and 3D USPs for different land covers and used multi-classifiers to perform LCZ mapping.The results for Beijing,China,show that our method yielded satisfactory accuracy for LCZ mapping,with an overall accuracy(OA)of 90.46%.The overall accuracy of land cover classification using 3D USPs generated from both along-and across-track modes increased by 4.66%,compared to that of using the single along-track mode.Additionally,the OA value of LCZ mapping using 2D and 3D USPs(88.18%)achieved a better result than using only 2D USPs(83.83%).The use of NTL data increased the classification accuracy of LCZs E(bare rock or paved)and F(bare soil or sand)by 6.54%and 3.94%,respectively.The refined LCZ classification achieved through this study will not only contribute to more accurate regional climate modeling but also provide valuable guidance for urban planning initiatives aimed at enhancing thermal comfort and overall livabillity in urban areas.Ultimately,this study paves the way for more comprehensive and effective strategies in addressing the challenges posed by urban microclimates.展开更多
文摘In ultra-high-dimensional data, it is common for the response variable to be multi-classified. Therefore, this paper proposes a model-free screening method for variables whose response variable is multi-classified from the point of view of introducing Jensen-Shannon divergence to measure the importance of covariates. The idea of the method is to calculate the Jensen-Shannon divergence between the conditional probability distribution of the covariates on a given response variable and the unconditional probability distribution of the covariates, and then use the probabilities of the response variables as weights to calculate the weighted Jensen-Shannon divergence, where a larger weighted Jensen-Shannon divergence means that the covariates are more important. Additionally, we also investigated an adapted version of the method, which is to measure the relationship between the covariates and the response variable using the weighted Jensen-Shannon divergence adjusted by the logarithmic factor of the number of categories when the number of categories in each covariate varies. Then, through both theoretical and simulation experiments, it was demonstrated that the proposed methods have sure screening and ranking consistency properties. Finally, the results from simulation and real-dataset experiments show that in feature screening, the proposed methods investigated are robust in performance and faster in computational speed compared with an existing method.
文摘By introducing Rough Set Theory and the principle of Support vector machine,a gear fault diagnosis method based on them is proposed.Firstly,diagnostic decision-making is reduced based on rough set theory,and the noise and redundancy in the sample are removed,then,according to the chosen reduction,a support vector machine multi-classifier is designed for gear fault diagnosis.Therefore,SVM’training data can be reduced and running speed can quicken.Test shows its accuracy and effi- ciency of gear fault diagnosis.
基金Project (61203021) supported by the National Natural Science Foundation of ChinaProject (2011216011) supported by the Key Science and Technology Program of Liaoning Province,China+1 种基金Project (2013020024) supported by the Natural Science Foundation of Liaoning Province,ChinaProject (LJQ2015061) supported by the Program for Liaoning Excellent Talents in Universities,China
文摘Fault diagnosis plays an important role in complicated industrial process.It is a challenging task to detect,identify and locate faults quickly and accurately for large-scale process system.To solve the problem,a novel Multi Boost-based integrated ENN(extension neural network) fault diagnosis method is proposed.Fault data of complicated chemical process have some difficult-to-handle characteristics,such as high-dimension,non-linear and non-Gaussian distribution,so we use margin discriminant projection(MDP) algorithm to reduce dimensions and extract main features.Then,the affinity propagation(AP) clustering method is used to select core data and boundary data as training samples to reduce memory consumption and shorten learning time.Afterwards,an integrated ENN classifier based on Multi Boost strategy is constructed to identify fault types.The artificial data sets are tested to verify the effectiveness of the proposed method and make a detailed sensitivity analysis for the key parameters.Finally,a real industrial system—Tennessee Eastman(TE) process is employed to evaluate the performance of the proposed method.And the results show that the proposed method is efficient and capable to diagnose various types of faults in complicated chemical process.
文摘The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.
基金supported by the National Natural Science Foundation of China[grant number:41930650]the Scientific Research Project of Beijing Municipal Education Commission[grant number:KM202110016004]the Beijing Key Laboratory of Urban Spatial Information Engineering[grant number 20220111].
文摘The local climate zone(LCZ)scheme has been widely utilized in regional climate modeling,urban planning,and thermal comfort investigations.However,existing LCz classification methods face challenges in characterizing complex urban structures and human activities involving local climatic environments.In this study,we proposed a novel LCZ mapping method that fully uses space-borne multi-view and diurnal observations,i.e.daytime Ziyuan-3 stereo imageries(2.1 m)and Luojia-1 nighttime light(NTL)data(130 m).Firstly,we performed land cover classification using multiple machine learning methods(i.e.random forest(RF)and XGBoost algorithms)and various features(i.e.spectral,textural,multi-view features,3D urban structure parameters(USPs),and NTL).In addition,we developed a set of new cumulative elevation indexes to improve building roughness assessments.The indexes can estimate building roughness directly from fused point clouds generated by both along-and across-track modes.Finally,based on the land cover and building roughness results,we extracted 2D and 3D USPs for different land covers and used multi-classifiers to perform LCZ mapping.The results for Beijing,China,show that our method yielded satisfactory accuracy for LCZ mapping,with an overall accuracy(OA)of 90.46%.The overall accuracy of land cover classification using 3D USPs generated from both along-and across-track modes increased by 4.66%,compared to that of using the single along-track mode.Additionally,the OA value of LCZ mapping using 2D and 3D USPs(88.18%)achieved a better result than using only 2D USPs(83.83%).The use of NTL data increased the classification accuracy of LCZs E(bare rock or paved)and F(bare soil or sand)by 6.54%and 3.94%,respectively.The refined LCZ classification achieved through this study will not only contribute to more accurate regional climate modeling but also provide valuable guidance for urban planning initiatives aimed at enhancing thermal comfort and overall livabillity in urban areas.Ultimately,this study paves the way for more comprehensive and effective strategies in addressing the challenges posed by urban microclimates.