To solve the unbalanced data problems of learning models for semantic concepts, an optimized modeling method based on the posterior probability support vector machine (PPSVM) is presented. A neighborbased posterior ...To solve the unbalanced data problems of learning models for semantic concepts, an optimized modeling method based on the posterior probability support vector machine (PPSVM) is presented. A neighborbased posterior probability estimator for visual concepts is provided. The proposed method has been applied in a high-level visual semantic concept classification system and the experiment results show that it results in enhanced performance over the baseline SVM models, as well as in improved robustness with respect to high-level visual semantic concept classification.展开更多
In order to solve the problem that, the <span style="white-space:normal;">hyper-parameters</span> of the existing random forest-based classification prediction model depend on empirical settings,...In order to solve the problem that, the <span style="white-space:normal;">hyper-parameters</span> of the existing random forest-based classification prediction model depend on empirical settings, which leads to unsatisfactory model performance. We propose a based on adaptive particle swarm optimization algorithm random forest model to optimize data classification and an adaptive particle swarm algorithm for optimizing hyper-parameters in the random forest to ensure that the model can better predict unbalanced data. Aiming at the premature convergence problem in the particle swarm optimization algorithm, the population is adaptively divided according to the fitness of the population, and an adaptive update strategy is introduced to enhance the ability of particles to jump out of the local optimum. The main steps of the model are as follows: Normalize the data set, initialize the model on the training set, and then use the particle swarm optimization algorithm to optimize the modeling process to establish a classification model. Experimental results show that our proposed algorithm is better than traditional algorithms, especially in terms of F1-Measure and ACC evaluation standards. The results of the six-keel imbalanced data set demonstrate the advantages of our proposed algorithm.展开更多
The expected mean squares for unbalanced mixed effect interactive model were derived using Brute Force Method. From the expected mean squares, there are no obvious denominators for testing for the main effects when th...The expected mean squares for unbalanced mixed effect interactive model were derived using Brute Force Method. From the expected mean squares, there are no obvious denominators for testing for the main effects when the factors are mixed. An expression for F-test for testing for the main effects was derived which was proved to be unbiased.展开更多
Rapid and precise location of the faults of on-board equipment of train control system is a significant factor to ensure reliable train operation.Text data of the fault tracking table of on-board equipment are taken a...Rapid and precise location of the faults of on-board equipment of train control system is a significant factor to ensure reliable train operation.Text data of the fault tracking table of on-board equipment are taken as samples,and an on-board equipment fault diagnosis model is designed based on the combination of convolutional neural network(CNN)and particle swarm optimization-support vector machines(PSO-SVM).Due to the characteristics of high dimensionality and sparseness of fault text data,CNN is used to achieve feature extraction.In order to decrease the influence of the imbalance of the fault sample data category on the classification accuracy,the PSO-SVM algorithm is introduced.The fully connected classification part of CNN is replaced by PSO-SVM,the extracted features are classified precisely,and the intelligent diagnosis of on-board equipment fault is implemented.According to the test analysis of the fault text data of on-board equipment recorded by a railway bureau and comparison with other models,the experimental results indicate that this model can obviously upgrade the evaluation indexes and can be used as an effective model for fault diagnosis for on-board equipment.展开更多
This paper is concerned with the routing protocol design for large-scale wireless sensor and actor networks (WSANs).The actor-sensor-actor communication (ASAC) strategy is first proposed to guarantee the reliability o...This paper is concerned with the routing protocol design for large-scale wireless sensor and actor networks (WSANs).The actor-sensor-actor communication (ASAC) strategy is first proposed to guarantee the reliability of persistent actor-actor communication.To keep network connectivity and prolong network lifetime,we propose a dynamic gradient-based routing protocol (DGR) to balance the energy consumption of the network.With the different communication ranges of sensors and actors,the DGR protocol uses a data load expansion strategy to significantly prolong the network lifetime.The balance coefficient and the routing re-establishment threshold are also introduced to make the tradeoff between network lifetime and routing efficiency.Simulation results show the effectiveness of the proposed DGR protocol for unbalanced and persistent data transmission.展开更多
This paper investigates the effectiveness of various factors upon the capital structure decisions of Chinese firms by conducting an empirical analysis of Chinese-listed retail companies.An unbalanced panel dataset was...This paper investigates the effectiveness of various factors upon the capital structure decisions of Chinese firms by conducting an empirical analysis of Chinese-listed retail companies.An unbalanced panel dataset was formed with a sample of 110 companies observed for 12 years(2010~2021).Each observation is measured quarterly.Traditional explanatory variables are adopted in the study,including profitability,company size,tangibility of assets,internal financing ability,tax ratio,growth opportunities,and volatility.By employing the Fama-Macbeth approach,the regression results are interpreted to determine the impact of independent variables upon the leverage a company takes on.To solve the reverse causality problem,we include the lag term(last quarter’s data)of the debt-to-equity ratio as control variables.Consistent with previous theoretical and empirical studies,firms’leverage ratio is positively related to size,tangibility,tax ratio,and last quarter’s debt level.Companies’profitability and internal financing ability are negatively correlated with their debt-to-equity ratio.Firms’earning volatility and growth opportunities do not show significant relationship with the debt-to-equity ratio.The study has provided more empirical evidence on capital structure theories regarding emerging financial markets.展开更多
Least squares support vector machine (LS-SVM) plays an important role in steel surface defects classification because of its high speed. However, the defect samples obtained from the real production line may be noise....Least squares support vector machine (LS-SVM) plays an important role in steel surface defects classification because of its high speed. However, the defect samples obtained from the real production line may be noise. LS-SVM suffers from the poor classification performance in the classification stage when there are noise samples. Thus, in the classification stage, it is necessary to design an effective algorithm to process the defects dataset obtained from the real production line. To this end, an adaptive weight function was employed to reduce the adverse effect of noise samples. Moreover, although LSSVM offers fast speed, it still suffers from a high computational complexity if the number of training samples is large. The time for steel surface defects classification should be as short as possible. Therefore, a sparse strategy was adopted to prune the training samples. Finally, since the steel surface defects classification belongs to unbalanced data classification, LSSVM algorithm is not applicable. Hence, the unbalanced data information was introduced to improve the classification performance. Comprehensively considering above-mentioned factors, an improved LS-SVM classification model was proposed, termed as ILS-SVM. Experimental results show that the new algorithm has the advantages of high speed and great anti-noise ability.展开更多
The robust guarantee of train control on-board equipment is inextricably linked to the safe functioning of a high-speed train.A fault diagnostic model of on-board equipment is built utilizing the integrated learning X...The robust guarantee of train control on-board equipment is inextricably linked to the safe functioning of a high-speed train.A fault diagnostic model of on-board equipment is built utilizing the integrated learning XGBoost(eXtreme Gradient Boosting)algorithm to help technicians assess the malfunction category of high-speed train control on-board equipment accurately and rapidly.The XGBoost algorithm iterates multiple decision tree models to improve the accuracy of fault diagnosis by lifting the predicted residual and adding regular terms.To begin,the text features were extracted using the improved TF-IDF(Term Frequency-Inverse Document Frequency)approach,and 24 fault feature words were chosen and converted into weight word vectors.Secondly,considering the imbalanced fault categories in the data set,the ADASYN(Adaptive Synthetic sampling)adaptive synthetically oversampling technique was used to synthesize a few category fault samples.Finally,the data samples were split into training and test sets based on the fault text data of CTCS-3train control on-board equipment recorded by Guangzhou Railway Group maintenance personnel.The XGBoost model was utilized to realize the automatic fault location of the test set after optimized parameter tuning through grid search.Compared with other methods,the evaluation index of the XGBoost model was significantly improved.The diagnostic accuracy reached 95.43%,which verifies the effectiveness of the method in text fault diagnosis.展开更多
This study proposes a classification model of equipment fault diagnosis based on integrated incremental learning mechanism on the basis of characteristics of industrial equipment status data.The model first proposes a...This study proposes a classification model of equipment fault diagnosis based on integrated incremental learning mechanism on the basis of characteristics of industrial equipment status data.The model first proposes a dynamic weight combination classification model based on long short-term memory(LSTM)and support vector machine(SVM).It solved the problem of fault feature extraction and classification in high noise equipment state data.Then,in this model,integrated incremental learning mechanism and unbalanced data processing technology were introduced to solve problems of massive unbalanced new data feature extraction and classification and sample category imbalance under equipment status data.Finally,an equipment fault diagnosis classification model based on integrated incremental dynamic weight combination is formed.Experiments prove that the model can effectively overcome the problems of excessive data volume,unbalanced,high noise,and inability to correlate data samples in the process of equipment fault diagnosis.展开更多
The traditional random forest algorithm works along with unbalanced data,cannot achieve satisfactory prediction results for minority class,and suffers from the parameter selection dilemma.In view of this problem,this ...The traditional random forest algorithm works along with unbalanced data,cannot achieve satisfactory prediction results for minority class,and suffers from the parameter selection dilemma.In view of this problem,this paper proposes an unbalanced accuracy weighted random forest algorithm(UAW_RF)based on the adaptive step size artificial bee colony optimization.It combines the ideas of decision tree optimization,sampling selection,and weighted voting to improve the ability of stochastic forest algorithm when dealing with biased data classification.The adaptive step size and the optimal solution were introduced to improve the position updating formula of the artificial bee colony algorithm,and then the parameter combination of the random forest algorithm was iteratively optimized with the advantages of the algorithm.Experimental results show satisfactory accuracies and prove that the method can effectively improve the classification accuracy of the random forest algorithm.展开更多
This paper studies the income inequality and economic development relationship by using unbalanced panel data of OECD and non-OECD countries(regions)for the period 1962-2003.The nonparametric estimation results show t...This paper studies the income inequality and economic development relationship by using unbalanced panel data of OECD and non-OECD countries(regions)for the period 1962-2003.The nonparametric estimation results show that income inequality in OECD countries is almost on the backside of the inverted-U relationship,while non-OECD countries are approximately on the foreside,except that the relationship in both country groups shows an upturn at a high level of development.Development has an indirect effect on inequality through control variables,but the modes are different in the two country groups.The model specification tests show that the relationship is not necessarily captured by the conventional quadratic function.The cubic and fourthdegree polynomials,respectively,fit the OECD and non-OECD country groups best.Our finding is robust regardless of whether the specification uses control variables.Development plays a dominant role in mitigating inequality.展开更多
基金Sponsored by the Beijing Municipal Natural Science Foundation(4082027)
文摘To solve the unbalanced data problems of learning models for semantic concepts, an optimized modeling method based on the posterior probability support vector machine (PPSVM) is presented. A neighborbased posterior probability estimator for visual concepts is provided. The proposed method has been applied in a high-level visual semantic concept classification system and the experiment results show that it results in enhanced performance over the baseline SVM models, as well as in improved robustness with respect to high-level visual semantic concept classification.
文摘In order to solve the problem that, the <span style="white-space:normal;">hyper-parameters</span> of the existing random forest-based classification prediction model depend on empirical settings, which leads to unsatisfactory model performance. We propose a based on adaptive particle swarm optimization algorithm random forest model to optimize data classification and an adaptive particle swarm algorithm for optimizing hyper-parameters in the random forest to ensure that the model can better predict unbalanced data. Aiming at the premature convergence problem in the particle swarm optimization algorithm, the population is adaptively divided according to the fitness of the population, and an adaptive update strategy is introduced to enhance the ability of particles to jump out of the local optimum. The main steps of the model are as follows: Normalize the data set, initialize the model on the training set, and then use the particle swarm optimization algorithm to optimize the modeling process to establish a classification model. Experimental results show that our proposed algorithm is better than traditional algorithms, especially in terms of F1-Measure and ACC evaluation standards. The results of the six-keel imbalanced data set demonstrate the advantages of our proposed algorithm.
文摘The expected mean squares for unbalanced mixed effect interactive model were derived using Brute Force Method. From the expected mean squares, there are no obvious denominators for testing for the main effects when the factors are mixed. An expression for F-test for testing for the main effects was derived which was proved to be unbiased.
基金Gansu Province Higher Education Innovation Fund Project(No.2020B-104)“Innovation Star”Project for Outstanding Postgraduates of Gansu Province(No.2021CXZX-606)。
文摘Rapid and precise location of the faults of on-board equipment of train control system is a significant factor to ensure reliable train operation.Text data of the fault tracking table of on-board equipment are taken as samples,and an on-board equipment fault diagnosis model is designed based on the combination of convolutional neural network(CNN)and particle swarm optimization-support vector machines(PSO-SVM).Due to the characteristics of high dimensionality and sparseness of fault text data,CNN is used to achieve feature extraction.In order to decrease the influence of the imbalance of the fault sample data category on the classification accuracy,the PSO-SVM algorithm is introduced.The fully connected classification part of CNN is replaced by PSO-SVM,the extracted features are classified precisely,and the intelligent diagnosis of on-board equipment fault is implemented.According to the test analysis of the fault text data of on-board equipment recorded by a railway bureau and comparison with other models,the experimental results indicate that this model can obviously upgrade the evaluation indexes and can be used as an effective model for fault diagnosis for on-board equipment.
基金supported by the National Natural Science Foundation of China (Nos.60934003 and 60974123)the National Basic Research Program (973) of China (No.2010CB731800)the Science and Technology Commission of Shanghai Municipality,China (Nos.09PJ1406100,10XD1402100,and 09CG06)
文摘This paper is concerned with the routing protocol design for large-scale wireless sensor and actor networks (WSANs).The actor-sensor-actor communication (ASAC) strategy is first proposed to guarantee the reliability of persistent actor-actor communication.To keep network connectivity and prolong network lifetime,we propose a dynamic gradient-based routing protocol (DGR) to balance the energy consumption of the network.With the different communication ranges of sensors and actors,the DGR protocol uses a data load expansion strategy to significantly prolong the network lifetime.The balance coefficient and the routing re-establishment threshold are also introduced to make the tradeoff between network lifetime and routing efficiency.Simulation results show the effectiveness of the proposed DGR protocol for unbalanced and persistent data transmission.
文摘This paper investigates the effectiveness of various factors upon the capital structure decisions of Chinese firms by conducting an empirical analysis of Chinese-listed retail companies.An unbalanced panel dataset was formed with a sample of 110 companies observed for 12 years(2010~2021).Each observation is measured quarterly.Traditional explanatory variables are adopted in the study,including profitability,company size,tangibility of assets,internal financing ability,tax ratio,growth opportunities,and volatility.By employing the Fama-Macbeth approach,the regression results are interpreted to determine the impact of independent variables upon the leverage a company takes on.To solve the reverse causality problem,we include the lag term(last quarter’s data)of the debt-to-equity ratio as control variables.Consistent with previous theoretical and empirical studies,firms’leverage ratio is positively related to size,tangibility,tax ratio,and last quarter’s debt level.Companies’profitability and internal financing ability are negatively correlated with their debt-to-equity ratio.Firms’earning volatility and growth opportunities do not show significant relationship with the debt-to-equity ratio.The study has provided more empirical evidence on capital structure theories regarding emerging financial markets.
基金the Natural Science Foundation of Liaoning Province,China(20180550067)Liaoning Province Ministry of Education Scientific Study Project(2020LNZD06 and 2017LNQN11)University of Science and Technology Liaoning Talent Project Grants(601011507-20 and 601013360-17).
文摘Least squares support vector machine (LS-SVM) plays an important role in steel surface defects classification because of its high speed. However, the defect samples obtained from the real production line may be noise. LS-SVM suffers from the poor classification performance in the classification stage when there are noise samples. Thus, in the classification stage, it is necessary to design an effective algorithm to process the defects dataset obtained from the real production line. To this end, an adaptive weight function was employed to reduce the adverse effect of noise samples. Moreover, although LSSVM offers fast speed, it still suffers from a high computational complexity if the number of training samples is large. The time for steel surface defects classification should be as short as possible. Therefore, a sparse strategy was adopted to prune the training samples. Finally, since the steel surface defects classification belongs to unbalanced data classification, LSSVM algorithm is not applicable. Hence, the unbalanced data information was introduced to improve the classification performance. Comprehensively considering above-mentioned factors, an improved LS-SVM classification model was proposed, termed as ILS-SVM. Experimental results show that the new algorithm has the advantages of high speed and great anti-noise ability.
基金supported by the Science and Tec hnology Research and Development Plan Contract of China National Railway Group Co.,Ltd(Grant No.N2022G012)the Railway Science and Technology Research and Development Center Project(Project No.SYF2022SJ004).
文摘The robust guarantee of train control on-board equipment is inextricably linked to the safe functioning of a high-speed train.A fault diagnostic model of on-board equipment is built utilizing the integrated learning XGBoost(eXtreme Gradient Boosting)algorithm to help technicians assess the malfunction category of high-speed train control on-board equipment accurately and rapidly.The XGBoost algorithm iterates multiple decision tree models to improve the accuracy of fault diagnosis by lifting the predicted residual and adding regular terms.To begin,the text features were extracted using the improved TF-IDF(Term Frequency-Inverse Document Frequency)approach,and 24 fault feature words were chosen and converted into weight word vectors.Secondly,considering the imbalanced fault categories in the data set,the ADASYN(Adaptive Synthetic sampling)adaptive synthetically oversampling technique was used to synthesize a few category fault samples.Finally,the data samples were split into training and test sets based on the fault text data of CTCS-3train control on-board equipment recorded by Guangzhou Railway Group maintenance personnel.The XGBoost model was utilized to realize the automatic fault location of the test set after optimized parameter tuning through grid search.Compared with other methods,the evaluation index of the XGBoost model was significantly improved.The diagnostic accuracy reached 95.43%,which verifies the effectiveness of the method in text fault diagnosis.
基金Tianjin Science and Technology Project under Grant No.18YFCZZC00060 and No.18ZXZNGX00100Hebei Provincial Natural Science Foundation Project under Grant No.F2019202062.
文摘This study proposes a classification model of equipment fault diagnosis based on integrated incremental learning mechanism on the basis of characteristics of industrial equipment status data.The model first proposes a dynamic weight combination classification model based on long short-term memory(LSTM)and support vector machine(SVM).It solved the problem of fault feature extraction and classification in high noise equipment state data.Then,in this model,integrated incremental learning mechanism and unbalanced data processing technology were introduced to solve problems of massive unbalanced new data feature extraction and classification and sample category imbalance under equipment status data.Finally,an equipment fault diagnosis classification model based on integrated incremental dynamic weight combination is formed.Experiments prove that the model can effectively overcome the problems of excessive data volume,unbalanced,high noise,and inability to correlate data samples in the process of equipment fault diagnosis.
基金the CERNET Innovation Project(No.NGII20190315)the Foundation of A Hundred Youth Talents Training Program of Lanzhou Jiaotong University.
文摘The traditional random forest algorithm works along with unbalanced data,cannot achieve satisfactory prediction results for minority class,and suffers from the parameter selection dilemma.In view of this problem,this paper proposes an unbalanced accuracy weighted random forest algorithm(UAW_RF)based on the adaptive step size artificial bee colony optimization.It combines the ideas of decision tree optimization,sampling selection,and weighted voting to improve the ability of stochastic forest algorithm when dealing with biased data classification.The adaptive step size and the optimal solution were introduced to improve the position updating formula of the artificial bee colony algorithm,and then the parameter combination of the random forest algorithm was iteratively optimized with the advantages of the algorithm.Experimental results show satisfactory accuracies and prove that the method can effectively improve the classification accuracy of the random forest algorithm.
基金Research funding from the City University of Hong Kong under Strategic Research Grant (Project No. 700233)the China National Natural Science Foundation (Grant No. 7097113)
文摘This paper studies the income inequality and economic development relationship by using unbalanced panel data of OECD and non-OECD countries(regions)for the period 1962-2003.The nonparametric estimation results show that income inequality in OECD countries is almost on the backside of the inverted-U relationship,while non-OECD countries are approximately on the foreside,except that the relationship in both country groups shows an upturn at a high level of development.Development has an indirect effect on inequality through control variables,but the modes are different in the two country groups.The model specification tests show that the relationship is not necessarily captured by the conventional quadratic function.The cubic and fourthdegree polynomials,respectively,fit the OECD and non-OECD country groups best.Our finding is robust regardless of whether the specification uses control variables.Development plays a dominant role in mitigating inequality.