Traditional classification algorithms perform not very well on imbalanced data sets and small sample size. To deal with the problem, a novel method is proposed to change the class distribution through adding virtual s...Traditional classification algorithms perform not very well on imbalanced data sets and small sample size. To deal with the problem, a novel method is proposed to change the class distribution through adding virtual samples, which are generated by the windowed regression over-sampling (WRO) method. The proposed method WRO not only reflects the additive effects but also reflects the multiplicative effect between samples. A comparative study between the proposed method and other over-sampling methods such as synthetic minority over-sampling technique (SMOTE) and borderline over-sampling (BOS) on UCI datasets and Fourier transform infrared spectroscopy (FTIR) data set is provided. Experimental results show that the WRO method can achieve better performance than other methods.展开更多
Reliability assessment of the braking system in a high?speed train under small sample size and zero?failure data is veryimportant for safe operation. Traditional reliability assessment methods are only performed well ...Reliability assessment of the braking system in a high?speed train under small sample size and zero?failure data is veryimportant for safe operation. Traditional reliability assessment methods are only performed well under conditions of large sample size and complete failure data,which lead to large deviation under conditions of small sample size and zero?failure data. To improve this problem,a new Bayesian method is proposed. Based on the characteristics of the solenoid valve in the braking system of a high?speed train,the modified Weibull distribution is selected to describe the failure rate over the entire lifetime. Based on the assumption of a binomial distribution for the failure probability at censored time,a concave method is employed to obtain the relationships between accumulation failure prob?abilities. A numerical simulation is performed to compare the results of the proposed method with those obtained from maximum likelihood estimation,and to illustrate that the proposed Bayesian model exhibits a better accuracy for the expectation value when the sample size is less than 12. Finally,the robustness of the model is demonstrated by obtaining the reliability indicators for a numerical case involving the solenoid valve of the braking system,which shows that the change in the reliability and failure rate among the di erent hyperparameters is small. The method is provided to avoid misleading of subjective information and improve accuracy of reliability assessment under condi?tions of small sample size and zero?failure data.展开更多
This paper establishes the asymptotic independence between the quadratic form z^(T)Az and maximum max1≤i≤p|zi|of a sequence of independent sub-Gaussian random variables z=(z1m…zp)^(T).Based on this theoretical resu...This paper establishes the asymptotic independence between the quadratic form z^(T)Az and maximum max1≤i≤p|zi|of a sequence of independent sub-Gaussian random variables z=(z1m…zp)^(T).Based on this theoretical result,we find the asymptotic joint distribution for the quadratic form and maximum,which can be applied into the high-dimensional testing problems.By combining the sum-type test and the max-type test,we propose the Fisher’s combination tests for the one-sample mean test and two-sample mean test.Under this novel general framework,several strong assumptions in existing literature have been relaxed.Monte Carlo simulation has been done which shows that our proposed tests are strongly robust to both sparse and dense data.展开更多
For several decades, much attention has been paid to the two-sample Behrens-Fisher (BF) problem which tests the equality of the means or mean vectors of two normal populations with unequal variance/covariance structur...For several decades, much attention has been paid to the two-sample Behrens-Fisher (BF) problem which tests the equality of the means or mean vectors of two normal populations with unequal variance/covariance structures. Little work, however, has been done for the k-sample BF problem for high dimensional data which tests the equality of the mean vectors of several high-dimensional normal populations with unequal covariance structures. In this paper we study this challenging problem via extending the famous Scheffe’s transformation method, which reduces the k-sample BF problem to a one-sample problem. The induced one-sample problem can be easily tested by the classical Hotelling’s T 2 test when the size of the resulting sample is very large relative to its dimensionality. For high dimensional data, however, the dimensionality of the resulting sample is often very large, and even much larger than its sample size, which makes the classical Hotelling’s T 2 test not powerful or not even well defined. To overcome this difficulty, we propose and study an L 2-norm based test. The asymptotic powers of the proposed L 2-norm based test and Hotelling’s T 2 test are derived and theoretically compared. Methods for implementing the L 2-norm based test are described. Simulation studies are conducted to compare the L 2-norm based test and Hotelling’s T 2 test when the latter can be well defined, and to compare the proposed implementation methods for the L 2-norm based test otherwise. The methodologies are motivated and illustrated by a real data example.展开更多
文摘Traditional classification algorithms perform not very well on imbalanced data sets and small sample size. To deal with the problem, a novel method is proposed to change the class distribution through adding virtual samples, which are generated by the windowed regression over-sampling (WRO) method. The proposed method WRO not only reflects the additive effects but also reflects the multiplicative effect between samples. A comparative study between the proposed method and other over-sampling methods such as synthetic minority over-sampling technique (SMOTE) and borderline over-sampling (BOS) on UCI datasets and Fourier transform infrared spectroscopy (FTIR) data set is provided. Experimental results show that the WRO method can achieve better performance than other methods.
基金Supported by National Natural Science Foundation of China(Grant No.51175028)Great Scholars Training Project(Grant No.CIT&TCD20150312)Beijing Recognized Talent Project(Grant No.2014018)
文摘Reliability assessment of the braking system in a high?speed train under small sample size and zero?failure data is veryimportant for safe operation. Traditional reliability assessment methods are only performed well under conditions of large sample size and complete failure data,which lead to large deviation under conditions of small sample size and zero?failure data. To improve this problem,a new Bayesian method is proposed. Based on the characteristics of the solenoid valve in the braking system of a high?speed train,the modified Weibull distribution is selected to describe the failure rate over the entire lifetime. Based on the assumption of a binomial distribution for the failure probability at censored time,a concave method is employed to obtain the relationships between accumulation failure prob?abilities. A numerical simulation is performed to compare the results of the proposed method with those obtained from maximum likelihood estimation,and to illustrate that the proposed Bayesian model exhibits a better accuracy for the expectation value when the sample size is less than 12. Finally,the robustness of the model is demonstrated by obtaining the reliability indicators for a numerical case involving the solenoid valve of the braking system,which shows that the change in the reliability and failure rate among the di erent hyperparameters is small. The method is provided to avoid misleading of subjective information and improve accuracy of reliability assessment under condi?tions of small sample size and zero?failure data.
基金supported by the National Natural Science Foundation of China(Grant Nos.12101335 and 12271271)the Natural Science Foundation of Tianjin(Grant No.21JCQNJC00020)+4 种基金the Fundamental Research Funds for the Central Universities,Nankai University(Grant Nos.63211088 and 63221050)supported by National Natural Science Foundation of China(Grant No.12101332)supported by Shenzhen Wukong Investment Company,the Fundamental Research Funds for the Central Universities under(Grant No.ZB22000105)the China National Key R&D Program(Grant Nos.2019YFC1908502,2022YFA1003703,2022YFA1003802,2022YFA1003803)the National Natural Science Foundation of China(Grants Nos.12271271,11925106,12231011,11931001 and 11971247)。
文摘This paper establishes the asymptotic independence between the quadratic form z^(T)Az and maximum max1≤i≤p|zi|of a sequence of independent sub-Gaussian random variables z=(z1m…zp)^(T).Based on this theoretical result,we find the asymptotic joint distribution for the quadratic form and maximum,which can be applied into the high-dimensional testing problems.By combining the sum-type test and the max-type test,we propose the Fisher’s combination tests for the one-sample mean test and two-sample mean test.Under this novel general framework,several strong assumptions in existing literature have been relaxed.Monte Carlo simulation has been done which shows that our proposed tests are strongly robust to both sparse and dense data.
基金supported by the National University of Singapore Academic Research Grant (Grant No. R-155-000-085-112)
文摘For several decades, much attention has been paid to the two-sample Behrens-Fisher (BF) problem which tests the equality of the means or mean vectors of two normal populations with unequal variance/covariance structures. Little work, however, has been done for the k-sample BF problem for high dimensional data which tests the equality of the mean vectors of several high-dimensional normal populations with unequal covariance structures. In this paper we study this challenging problem via extending the famous Scheffe’s transformation method, which reduces the k-sample BF problem to a one-sample problem. The induced one-sample problem can be easily tested by the classical Hotelling’s T 2 test when the size of the resulting sample is very large relative to its dimensionality. For high dimensional data, however, the dimensionality of the resulting sample is often very large, and even much larger than its sample size, which makes the classical Hotelling’s T 2 test not powerful or not even well defined. To overcome this difficulty, we propose and study an L 2-norm based test. The asymptotic powers of the proposed L 2-norm based test and Hotelling’s T 2 test are derived and theoretically compared. Methods for implementing the L 2-norm based test are described. Simulation studies are conducted to compare the L 2-norm based test and Hotelling’s T 2 test when the latter can be well defined, and to compare the proposed implementation methods for the L 2-norm based test otherwise. The methodologies are motivated and illustrated by a real data example.