In this article, the zero-inflated non-central negative binomial(ZINNB) distribution is introduced. Some of its basic properties are obtained. In addition, we use the maximum likelihood estimation method to estimate t...In this article, the zero-inflated non-central negative binomial(ZINNB) distribution is introduced. Some of its basic properties are obtained. In addition, we use the maximum likelihood estimation method to estimate the parameters of the ZINNB distribution, and illustrate its application by fitting the actual data sets.展开更多
We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and ...We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and requires minimum assumptions. The new derivations are compared with the established derivations, such as by convolution, moment generating function, and Bayesian inference. The chi-square testing has seen many applications to physics and other fields. We describe a unique version of the chi-square test where both the variance and location are tested, which is then applied to environmental data. The chi-square test is used to make a judgment whether a laboratory method is capable of detection of gross alpha and beta radioactivity in drinking water for regulatory monitoring to protect health of population. A case of a failure of the chi-square test and its amelioration are described. The chi-square test is compared to and supplemented by the t-test.展开更多
Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-v...Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.展开更多
Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.Th...Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.The inherent laws reflected by the historical data of the distribution network are ignored,which affects the objectivity of the planning scheme.In this study,to improve the efficiency and accuracy of distribution network planning,the characteristics of distribution network data were extracted using a data-mining technique,and correlation knowledge of existing problems in the network was obtained.A data-mining model based on correlation rules was established.The inputs of the model were the electrical characteristic indices screened using the gray correlation method.The Apriori algorithm was used to extract correlation knowledge from the operational data of the distribution network and obtain strong correlation rules.Degree of promotion and chi-square tests were used to verify the rationality of the strong correlation rules of the model output.In this study,the correlation relationship between heavy load or overload problems of distribution network feeders in different regions and related characteristic indices was determined,and the confidence of the correlation rules was obtained.These results can provide an effective basis for the formulation of a distribution network planning scheme.展开更多
This research paper deals with an extension of the non-central Wishart introduced in 1944 by Anderson and Girshick,that is the non-central Riesz distribution when the scale parameter is derived from a discrete vector....This research paper deals with an extension of the non-central Wishart introduced in 1944 by Anderson and Girshick,that is the non-central Riesz distribution when the scale parameter is derived from a discrete vector.It is related to the matrix of normal samples with monotonous missing data.We characterize this distribution by means of its Laplace transform and we give an algorithm for generating it.Then we investigate,based on the method of the moment,the estimation of the parameters of the proposed model.The performance of the proposed estimators is evaluated by a numerical study.展开更多
Maximum likelihood (ML) estimation for the generalized asymmetric Laplace (GAL) distribution also known as Variance gamma using simplex direct search algorithms is investigated. In this paper, we use numerical direct ...Maximum likelihood (ML) estimation for the generalized asymmetric Laplace (GAL) distribution also known as Variance gamma using simplex direct search algorithms is investigated. In this paper, we use numerical direct search techniques for maximizing the log-likelihood to obtain ML estimators instead of using the traditional EM algorithm. The density function of the GAL is only continuous but not differentiable with respect to the parameters and the appearance of the Bessel function in the density make it difficult to obtain the asymptotic covariance matrix for the entire GAL family. Using M-estimation theory, the properties of the ML estimators are investigated in this paper. The ML estimators are shown to be consistent for the GAL family and their asymptotic normality can only be guaranteed for the asymmetric Laplace (AL) family. The asymptotic covariance matrix is obtained for the AL family and it completes the results obtained previously in the literature. For the general GAL model, alternative methods of inferences based on quadratic distances (QD) are proposed. The QD methods appear to be overall more efficient than likelihood methods infinite samples using sample sizes n ≤5000 and the range of parameters often encountered for financial data. The proposed methods only require that the moment generating function of the parametric model exists and has a closed form expression and can be used for other models.展开更多
In order to explore the seed drop characteristics by aerial seeding equipment,taking aerial seeding for Pinus tabulaeformis as an example,the Gaussian curve fitting and chi-square goodness-of-fit test were carried out...In order to explore the seed drop characteristics by aerial seeding equipment,taking aerial seeding for Pinus tabulaeformis as an example,the Gaussian curve fitting and chi-square goodness-of-fit test were carried out on the data of fallen seed distribution,and the seed distribution models of domestic FB-85 and imported PZLM-18 equipment were established.The seeding performance indexes of the two kinds of equipment were calculated and compared by using the model,the existing problems of domestic equipment and their causes are analyzed,and finally,some suggestions for equipment optimization were put forward.The results indicated that the seed drop of the two kinds of equipment showed the characteristics of dense distribution in the middle and sparse distribution on both sides,and followed the Gaussian distribution as a whole;compared with PZLM-18,FB-85 had better seeding performance,but it also had the problem of uneven seed distribution;in addition to the influence of aircraft flow field,the fishtail structure design of diffuser is another important reason for the uneven seed distribution of domestic equipment;without changing the fishtail structure design,it is suggested that the principle of cross-superposition of two seeding belts should be used to replace a single large-size diffuser with two small-size diffusers,which can reduce the number of seeds in the middle and increase the number of seeds on both sides,so as to improve the uniformity of seed distribution.展开更多
The Internet service provider(ISP)is the heart of any country’s Internet infrastructure and plays an important role in connecting to theWorld WideWeb.Internet exchange point(IXP)allows the interconnection of two or m...The Internet service provider(ISP)is the heart of any country’s Internet infrastructure and plays an important role in connecting to theWorld WideWeb.Internet exchange point(IXP)allows the interconnection of two or more separate network infrastructures.All Internet traffic entering a country should pass through its IXP.Thus,it is an ideal location for performing malicious traffic analysis.Distributed denial of service(DDoS)attacks are becoming a more serious daily threat.Malicious actors in DDoS attacks control numerous infected machines known as botnets.Botnets are used to send numerous fake requests to overwhelm the resources of victims and make them unavailable for some periods.To date,such attacks present a major devastating security threat on the Internet.This paper proposes an effective and efficient machine learning(ML)-based DDoS detection approach for the early warning and protection of the Saudi Arabia Internet exchange point(SAIXP)platform.The effectiveness and efficiency of the proposed approach are verified by selecting an accurate ML method with a small number of input features.A chi-square method is used for feature selection because it is easier to compute than other methods,and it does not require any assumption about feature distribution values.Several ML methods are assessed using holdout and 10-fold tests on a public large-size dataset.The experiments showed that the performance of the decision tree(DT)classifier achieved a high accuracy result(99.98%)with a small number of features(10 features).The experimental results confirmthe applicability of using DT and chi-square for DDoS detection and early warning in SAIXP.展开更多
Let X 1, ..., X n be independent and identically distributed random variables and W n = W n (X 1, ..., X n ) be an estimator of parameter ?. Denote T n = (W n ? ? 0)/s n , where s n 2 is a variance estimator of W n . ...Let X 1, ..., X n be independent and identically distributed random variables and W n = W n (X 1, ..., X n ) be an estimator of parameter ?. Denote T n = (W n ? ? 0)/s n , where s n 2 is a variance estimator of W n . In this paper a general result on the limiting distributions of the non-central studentized statistic T n is given. Especially, when s n 2 is the jacknife estimate of variance, it is shown that the limit could be normal, a weighted χ 2 distribution, a stable distribution, or a mixture of normal and stable distribution. Applications to the power of the studentized U- and L- tests are also discussed.展开更多
A general theorem on the limiting distribution of the generalized t-distribution is obtained,many applications of this theorem to some subclasses of elliptically contoured distributions in cluding multivariate normal ...A general theorem on the limiting distribution of the generalized t-distribution is obtained,many applications of this theorem to some subclasses of elliptically contoured distributions in cluding multivariate normal and multivariate t distributions are discussed.Further,their limiting distributions by density function are derived.展开更多
In this paper,several properties of one-way classification model with skew-normal random effects are obtained,such as moment generating function,density function and noncentral skew chi-square distribution,etc.Based o...In this paper,several properties of one-way classification model with skew-normal random effects are obtained,such as moment generating function,density function and noncentral skew chi-square distribution,etc.Based on the EM algorithm,we discuss the maximum likelihood(ML)estimation of unknown parameters.For testing problem of fixed effect,a parametric bootstrap(PB)approach is developed.Finally,some simulation results on the Type I error rates and powers of the PB approach are obtained,which show that the PB approach provides satisfactory performances on the Type I error rates and powers,even for small samples.For illustration,our main results are applied to a real data problem.展开更多
Relativistic heavy-ion collisions can produce extremely strong magnetic fields in the collision regions. The spatial variation features of the magnetic fields are analyzed in detail for non-central Pb Pb collisions at...Relativistic heavy-ion collisions can produce extremely strong magnetic fields in the collision regions. The spatial variation features of the magnetic fields are analyzed in detail for non-central Pb Pb collisions at LHC at √SNN= 900, 2760 and 7000 GeV and Au-Au collisions at RHIC at √SNN=62.4, 130 and 200 GeV. The dependencies of magnetic field on proper time, collision energies and impact parameters are investigated in this paper. It is shown that an enormous and highly inhomogeneous spatial distribution magnetic field can indeed be created in off-centre relativistic heavy-ion collisions in RHIC and LHC energy regions. The enormous magnetic field is produced just after the collision, and the magnitude of magnetic field of the LHC energy region is larger than that of the RHIC energy region at small proper time. It is found that the magnetic field in the LHC energy region decreases more quickly with the increase of proper time than that of the RHIC energy region.展开更多
In this paper, we consider simulated minimum Hellinger distance (SMHD) inferences for count data. We consider grouped and ungrouped data and emphasize SMHD methods. The approaches extend the methods based on the deter...In this paper, we consider simulated minimum Hellinger distance (SMHD) inferences for count data. We consider grouped and ungrouped data and emphasize SMHD methods. The approaches extend the methods based on the deterministic version of Hellinger distance for count data. The methods are general, it only requires that random samples from the discrete parametric family can be drawn and can be used as alternative methods to estimation using probability generating function (pgf) or methods based matching moments. Whereas this paper focuses on count data, goodness of fit tests based on simulated Hellinger distance can also be applied for testing goodness of fit for continuous distributions when continuous observations are grouped into intervals like in the case of the traditional Pearson’s statistics. Asymptotic properties of the SMHD methods are studied and the methods appear to preserve the properties of having good efficiency and robustness of the deterministic version.展开更多
In goodness-of-fit tests, Pearson's chi-squared test is one of most widely used tools of formal statistical analysis. However, Pearson's chi-squared test depends on the partition of the sample space. Different const...In goodness-of-fit tests, Pearson's chi-squared test is one of most widely used tools of formal statistical analysis. However, Pearson's chi-squared test depends on the partition of the sample space. Different constructions of the partition of the sample space may lead to different conclusions. Based on an equiprobable partition of sample space, a modified chi^quared test is proposed. A method for constructing the modified chi-squared test is proposed. As an application, the proposed test is used to test whether vectorial data come from an uniformity distribution defined on the hypersphere. Some simulation studies show that the modified chisquared test against different alternative is robust.展开更多
Generating an Adversarial network(GAN)has shown great development prospects in image generation and semi-supervised learning and has evolved into Triple-GAN.However,there are still two problems that need to be solved ...Generating an Adversarial network(GAN)has shown great development prospects in image generation and semi-supervised learning and has evolved into Triple-GAN.However,there are still two problems that need to be solved in Triple-GAN:based on the KL divergence distribution structure,gradients are easy to disappear and training instability occurs.Since Triple-GAN tags the samples manually,the manual marking workload is too large.Marked uneven and so on.This article builds on this improved Triple-GAN model(Improved Triple-GAN),which uses Random Forests to classify real samples,automate tagging of leaf nodes,and use Least Squares Generative Adversarial Networks(LSGAN)ideological structure loss function to avoid gradients disappear.Experiments were performed on the Improved Triple-GAN model and the Triple-GAN model using the MINIST,cifar10 and cifar100 datasets respectively,experiments show that the error rate of generated samples is greatly reduced.At the same time,the classification effect of the data set and the sharpness of the samples are greatly improved.And it has greatly improved the stability of model training and automation of labels.展开更多
Sampling is a fundamental method for generating data subsets.As many data analysis methods are developed based on probability distributions,maintaining distributions when sampling can help to ensure good data analysis...Sampling is a fundamental method for generating data subsets.As many data analysis methods are developed based on probability distributions,maintaining distributions when sampling can help to ensure good data analysis performance.However,sampling a minimum subset while maintaining probability distributions is still a problem.In this paper,we decompose a joint probability distribution into a product of conditional probabilities based on Bayesian networks and use the chi-square test to formulate a sampling problem that requires that the sampled subset pass the distribution test to ensure the distribution.Furthermore,a heuristic sampling algorithm is proposed to generate the required subset by designing two scoring functions:one based on the chi-square test and the other based on likelihood functions.Experiments on four types of datasets with a size of 60000 show that when the significant difference level,a,is set to 0.05,the algorithm can exclude 99.9%,99.0%,93.1%and 96.7%of the samples based on their Bayesian networks-ASIA,ALARM,HEPAR2,and ANDES,respectively.When subsets of the same size are sampled,the subset generated by our algorithm passes all the distribution tests and the average distribution difference is approximately 0.03;by contrast,the subsets generated by random sampling pass only 83.8%of the tests,and the average distribution difference is approximately 0.24.展开更多
Heavy rainfall is one of the most frequent and widespread severe weather hazards that affect Sylhet city. In Sylhet, nine natural channels locally called “chara” are mainly responsible for the drainage of heavy rain...Heavy rainfall is one of the most frequent and widespread severe weather hazards that affect Sylhet city. In Sylhet, nine natural channels locally called “chara” are mainly responsible for the drainage of heavy rainfall to the Surma River. The present condition of this natural drainage is not feasible due to the unplanned land development and manipulation by people. So water logging is a common scenario of Sylhet city. Many parts of Sylhet city are experiencing the severe inundation problem due to heavy rainfall. Time series analysis of rainfall data (1974-2015) is important for knowing and predicting of rainfall variation. Mann-Kendall analysis showed no trend for monthly and yearly rainfall data. The rainfall intensity of Sylhet is higher than other districts of Bangladesh. In this study, IDF (Intensity Duration Frequency) curve has been developed that is commonly used in engineering planning and design. By using IDF curve, ArcGIS software, DEM map, and normal discharge, maximum discharge for a different point of Mongoli chara and Bolramer chara has been calculated (for 25 years return period). Discharge through Mongoli chara and Bolramer chara has been represented in ArcMap using ordinary kriging. To keep the catchment of those chara free from inundation and water logging problem, this calculated discharge needed to be managed.展开更多
According to the actual measurement data, probability models of horizontal wind load were obtained based on wind velocity statistic and power spectral density function of fluctuating wind velocity through stochastic s...According to the actual measurement data, probability models of horizontal wind load were obtained based on wind velocity statistic and power spectral density function of fluctuating wind velocity through stochastic sampling and using spectrum analysis method. Through the comparison of two models, probability models of horizontal wind load based on probability models of fluctuating wind velocity were obtained by revising the mean and variance of fluctuating wind velocity. Results show that the variance takes lower value when the power spectral density function of fluctuating wind velocity is used to obtain the probability model of horizontal wind load. The quadratic term of fluctuating wind velocity takes a small contribution value in total wind load with almost no contribution to the model of horizontal wind load. It is convenient for practical engineering to obtain the models of horizontal wind load by using probability models of fluctuating wind velocity.展开更多
In this paper, we generalize the proof of the Cochran statistic in the case of an ANOVA two ways structure that asymptotically follows a Chi-2. While construction of homogeneity statistics test usually resorts to the ...In this paper, we generalize the proof of the Cochran statistic in the case of an ANOVA two ways structure that asymptotically follows a Chi-2. While construction of homogeneity statistics test usually resorts to the determination of the covariance matrix and its inverse, the Moore-Penrose matrix, our approach, avoids this step. We also show that the Cochran statistic in ANOVA two ways is equivalent to conventional homogeneity statistics test. In particular, we show that it satisfies the invariance property. Finally, we conduct empirical verification from a meta-analysis that confirms our theoretical results.展开更多
This paper focuses on reducing the complexity of K-best sphere decoding (SD) algorithm for the detection of uncoded multi-ple input multiple output (MIMO) systems. The proposed algorithm utilizes the threshold-pru...This paper focuses on reducing the complexity of K-best sphere decoding (SD) algorithm for the detection of uncoded multi-ple input multiple output (MIMO) systems. The proposed algorithm utilizes the threshold-pruning method to cut nodes with partial Euclidean distances (PEDs) larger than the threshold. Both the known noise value and the unknown noise value are considered to generate the threshold, which is the sum of the two values. The known noise value is the smal est PED of signals in the detected layers. The unknown noise value is generated by the noise power, the quality of service (QoS) and the signal-to-noise ratio (SNR) bound. Simulation results show that by considering both two noise values, the proposed algorithm makes an efficient reduction while the performance drops little.展开更多
文摘In this article, the zero-inflated non-central negative binomial(ZINNB) distribution is introduced. Some of its basic properties are obtained. In addition, we use the maximum likelihood estimation method to estimate the parameters of the ZINNB distribution, and illustrate its application by fitting the actual data sets.
文摘We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and requires minimum assumptions. The new derivations are compared with the established derivations, such as by convolution, moment generating function, and Bayesian inference. The chi-square testing has seen many applications to physics and other fields. We describe a unique version of the chi-square test where both the variance and location are tested, which is then applied to environmental data. The chi-square test is used to make a judgment whether a laboratory method is capable of detection of gross alpha and beta radioactivity in drinking water for regulatory monitoring to protect health of population. A case of a failure of the chi-square test and its amelioration are described. The chi-square test is compared to and supplemented by the t-test.
文摘Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.
基金supported by the Science and Technology Project of China Southern Power Grid(GZHKJXM20210043-080041KK52210002).
文摘Traditional distribution network planning relies on the professional knowledge of planners,especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors.The inherent laws reflected by the historical data of the distribution network are ignored,which affects the objectivity of the planning scheme.In this study,to improve the efficiency and accuracy of distribution network planning,the characteristics of distribution network data were extracted using a data-mining technique,and correlation knowledge of existing problems in the network was obtained.A data-mining model based on correlation rules was established.The inputs of the model were the electrical characteristic indices screened using the gray correlation method.The Apriori algorithm was used to extract correlation knowledge from the operational data of the distribution network and obtain strong correlation rules.Degree of promotion and chi-square tests were used to verify the rationality of the strong correlation rules of the model output.In this study,the correlation relationship between heavy load or overload problems of distribution network feeders in different regions and related characteristic indices was determined,and the confidence of the correlation rules was obtained.These results can provide an effective basis for the formulation of a distribution network planning scheme.
文摘This research paper deals with an extension of the non-central Wishart introduced in 1944 by Anderson and Girshick,that is the non-central Riesz distribution when the scale parameter is derived from a discrete vector.It is related to the matrix of normal samples with monotonous missing data.We characterize this distribution by means of its Laplace transform and we give an algorithm for generating it.Then we investigate,based on the method of the moment,the estimation of the parameters of the proposed model.The performance of the proposed estimators is evaluated by a numerical study.
文摘Maximum likelihood (ML) estimation for the generalized asymmetric Laplace (GAL) distribution also known as Variance gamma using simplex direct search algorithms is investigated. In this paper, we use numerical direct search techniques for maximizing the log-likelihood to obtain ML estimators instead of using the traditional EM algorithm. The density function of the GAL is only continuous but not differentiable with respect to the parameters and the appearance of the Bessel function in the density make it difficult to obtain the asymptotic covariance matrix for the entire GAL family. Using M-estimation theory, the properties of the ML estimators are investigated in this paper. The ML estimators are shown to be consistent for the GAL family and their asymptotic normality can only be guaranteed for the asymmetric Laplace (AL) family. The asymptotic covariance matrix is obtained for the AL family and it completes the results obtained previously in the literature. For the general GAL model, alternative methods of inferences based on quadratic distances (QD) are proposed. The QD methods appear to be overall more efficient than likelihood methods infinite samples using sample sizes n ≤5000 and the range of parameters often encountered for financial data. The proposed methods only require that the moment generating function of the parametric model exists and has a closed form expression and can be used for other models.
文摘In order to explore the seed drop characteristics by aerial seeding equipment,taking aerial seeding for Pinus tabulaeformis as an example,the Gaussian curve fitting and chi-square goodness-of-fit test were carried out on the data of fallen seed distribution,and the seed distribution models of domestic FB-85 and imported PZLM-18 equipment were established.The seeding performance indexes of the two kinds of equipment were calculated and compared by using the model,the existing problems of domestic equipment and their causes are analyzed,and finally,some suggestions for equipment optimization were put forward.The results indicated that the seed drop of the two kinds of equipment showed the characteristics of dense distribution in the middle and sparse distribution on both sides,and followed the Gaussian distribution as a whole;compared with PZLM-18,FB-85 had better seeding performance,but it also had the problem of uneven seed distribution;in addition to the influence of aircraft flow field,the fishtail structure design of diffuser is another important reason for the uneven seed distribution of domestic equipment;without changing the fishtail structure design,it is suggested that the principle of cross-superposition of two seeding belts should be used to replace a single large-size diffuser with two small-size diffusers,which can reduce the number of seeds in the middle and increase the number of seeds on both sides,so as to improve the uniformity of seed distribution.
文摘The Internet service provider(ISP)is the heart of any country’s Internet infrastructure and plays an important role in connecting to theWorld WideWeb.Internet exchange point(IXP)allows the interconnection of two or more separate network infrastructures.All Internet traffic entering a country should pass through its IXP.Thus,it is an ideal location for performing malicious traffic analysis.Distributed denial of service(DDoS)attacks are becoming a more serious daily threat.Malicious actors in DDoS attacks control numerous infected machines known as botnets.Botnets are used to send numerous fake requests to overwhelm the resources of victims and make them unavailable for some periods.To date,such attacks present a major devastating security threat on the Internet.This paper proposes an effective and efficient machine learning(ML)-based DDoS detection approach for the early warning and protection of the Saudi Arabia Internet exchange point(SAIXP)platform.The effectiveness and efficiency of the proposed approach are verified by selecting an accurate ML method with a small number of input features.A chi-square method is used for feature selection because it is easier to compute than other methods,and it does not require any assumption about feature distribution values.Several ML methods are assessed using holdout and 10-fold tests on a public large-size dataset.The experiments showed that the performance of the decision tree(DT)classifier achieved a high accuracy result(99.98%)with a small number of features(10 features).The experimental results confirmthe applicability of using DT and chi-square for DDoS detection and early warning in SAIXP.
基金supported in part by Hong Kong UST (Grant No. DAG05/06.SC)Hong Kong RGC CERG(Grant No. 602206)+1 种基金supported by National Natural Science Foundation (Grant No.10801118)the PhD Programs Foundation of the Ministry of Education of China (Grant No. 200803351094)
文摘Let X 1, ..., X n be independent and identically distributed random variables and W n = W n (X 1, ..., X n ) be an estimator of parameter ?. Denote T n = (W n ? ? 0)/s n , where s n 2 is a variance estimator of W n . In this paper a general result on the limiting distributions of the non-central studentized statistic T n is given. Especially, when s n 2 is the jacknife estimate of variance, it is shown that the limit could be normal, a weighted χ 2 distribution, a stable distribution, or a mixture of normal and stable distribution. Applications to the power of the studentized U- and L- tests are also discussed.
基金This project is supported by the National Natural Science Foundation of Chinaby Grant DA01070 from U.S. Public Health Service
文摘A general theorem on the limiting distribution of the generalized t-distribution is obtained,many applications of this theorem to some subclasses of elliptically contoured distributions in cluding multivariate normal and multivariate t distributions are discussed.Further,their limiting distributions by density function are derived.
基金Supported by Zhejiang Provincial Philosophy and Social Science Planning Zhijiang Youth Project of China(Grant No.16ZJQN017YB)Ministry of Education of China,Humanities and Social Science Projects(Grant No.19YJA910006)+2 种基金Zhejiang Provincial Natural Science Foundation of China(Grant No.LY20A010019)Fundamental Research Funds for the Provincial Universities of Zhejiang(Grant No.GK199900299012-204)Zhejiang Provincial Statistical Science Research Base Project of China(Grant No.19TJJD08)
文摘In this paper,several properties of one-way classification model with skew-normal random effects are obtained,such as moment generating function,density function and noncentral skew chi-square distribution,etc.Based on the EM algorithm,we discuss the maximum likelihood(ML)estimation of unknown parameters.For testing problem of fixed effect,a parametric bootstrap(PB)approach is developed.Finally,some simulation results on the Type I error rates and powers of the PB approach are obtained,which show that the PB approach provides satisfactory performances on the Type I error rates and powers,even for small samples.For illustration,our main results are applied to a real data problem.
基金Supported by National Natural Science Foundation of China(11375069,11435054,11075061,11221504)Key Laboratory foundation of Quark and Lepton Physics(Hua-Zhong Normal University)(QLPL2014P01)
文摘Relativistic heavy-ion collisions can produce extremely strong magnetic fields in the collision regions. The spatial variation features of the magnetic fields are analyzed in detail for non-central Pb Pb collisions at LHC at √SNN= 900, 2760 and 7000 GeV and Au-Au collisions at RHIC at √SNN=62.4, 130 and 200 GeV. The dependencies of magnetic field on proper time, collision energies and impact parameters are investigated in this paper. It is shown that an enormous and highly inhomogeneous spatial distribution magnetic field can indeed be created in off-centre relativistic heavy-ion collisions in RHIC and LHC energy regions. The enormous magnetic field is produced just after the collision, and the magnitude of magnetic field of the LHC energy region is larger than that of the RHIC energy region at small proper time. It is found that the magnetic field in the LHC energy region decreases more quickly with the increase of proper time than that of the RHIC energy region.
文摘In this paper, we consider simulated minimum Hellinger distance (SMHD) inferences for count data. We consider grouped and ungrouped data and emphasize SMHD methods. The approaches extend the methods based on the deterministic version of Hellinger distance for count data. The methods are general, it only requires that random samples from the discrete parametric family can be drawn and can be used as alternative methods to estimation using probability generating function (pgf) or methods based matching moments. Whereas this paper focuses on count data, goodness of fit tests based on simulated Hellinger distance can also be applied for testing goodness of fit for continuous distributions when continuous observations are grouped into intervals like in the case of the traditional Pearson’s statistics. Asymptotic properties of the SMHD methods are studied and the methods appear to preserve the properties of having good efficiency and robustness of the deterministic version.
基金Foundation item: the Natural Science Foundation of Beijing (No. 1062001)Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality(No. 05006011200702).Acknowledgements The authors cordially thank the Associate Editor and Reviewers for their constructive comments which lead to improvement of the manuscript. They are also very grateful to Prof. Adelaide Figueiredo for his help.
文摘In goodness-of-fit tests, Pearson's chi-squared test is one of most widely used tools of formal statistical analysis. However, Pearson's chi-squared test depends on the partition of the sample space. Different constructions of the partition of the sample space may lead to different conclusions. Based on an equiprobable partition of sample space, a modified chi^quared test is proposed. A method for constructing the modified chi-squared test is proposed. As an application, the proposed test is used to test whether vectorial data come from an uniformity distribution defined on the hypersphere. Some simulation studies show that the modified chisquared test against different alternative is robust.
基金This work was supported by Hengyang Normal University Hunan Province Key Laboratory for Digital Technology and Application of Settlement Cultural Heritage Open Fund Project“Based on CNN-based 3D Image Reconstruction Research”(JL16K05)Ministry of Science and Technology of the People’s Republic of China(2018AAA0102301).
文摘Generating an Adversarial network(GAN)has shown great development prospects in image generation and semi-supervised learning and has evolved into Triple-GAN.However,there are still two problems that need to be solved in Triple-GAN:based on the KL divergence distribution structure,gradients are easy to disappear and training instability occurs.Since Triple-GAN tags the samples manually,the manual marking workload is too large.Marked uneven and so on.This article builds on this improved Triple-GAN model(Improved Triple-GAN),which uses Random Forests to classify real samples,automate tagging of leaf nodes,and use Least Squares Generative Adversarial Networks(LSGAN)ideological structure loss function to avoid gradients disappear.Experiments were performed on the Improved Triple-GAN model and the Triple-GAN model using the MINIST,cifar10 and cifar100 datasets respectively,experiments show that the error rate of generated samples is greatly reduced.At the same time,the classification effect of the data set and the sharpness of the samples are greatly improved.And it has greatly improved the stability of model training and automation of labels.
基金the National Key Research and Development Program of China under Grant No.2018YFB1003204Anhui Provincial Key Technologies Research and Development Program under Grant Nos.1804b06020378 and 1704el002221the National 111 Project of China under Grant No.B14025.
文摘Sampling is a fundamental method for generating data subsets.As many data analysis methods are developed based on probability distributions,maintaining distributions when sampling can help to ensure good data analysis performance.However,sampling a minimum subset while maintaining probability distributions is still a problem.In this paper,we decompose a joint probability distribution into a product of conditional probabilities based on Bayesian networks and use the chi-square test to formulate a sampling problem that requires that the sampled subset pass the distribution test to ensure the distribution.Furthermore,a heuristic sampling algorithm is proposed to generate the required subset by designing two scoring functions:one based on the chi-square test and the other based on likelihood functions.Experiments on four types of datasets with a size of 60000 show that when the significant difference level,a,is set to 0.05,the algorithm can exclude 99.9%,99.0%,93.1%and 96.7%of the samples based on their Bayesian networks-ASIA,ALARM,HEPAR2,and ANDES,respectively.When subsets of the same size are sampled,the subset generated by our algorithm passes all the distribution tests and the average distribution difference is approximately 0.03;by contrast,the subsets generated by random sampling pass only 83.8%of the tests,and the average distribution difference is approximately 0.24.
文摘Heavy rainfall is one of the most frequent and widespread severe weather hazards that affect Sylhet city. In Sylhet, nine natural channels locally called “chara” are mainly responsible for the drainage of heavy rainfall to the Surma River. The present condition of this natural drainage is not feasible due to the unplanned land development and manipulation by people. So water logging is a common scenario of Sylhet city. Many parts of Sylhet city are experiencing the severe inundation problem due to heavy rainfall. Time series analysis of rainfall data (1974-2015) is important for knowing and predicting of rainfall variation. Mann-Kendall analysis showed no trend for monthly and yearly rainfall data. The rainfall intensity of Sylhet is higher than other districts of Bangladesh. In this study, IDF (Intensity Duration Frequency) curve has been developed that is commonly used in engineering planning and design. By using IDF curve, ArcGIS software, DEM map, and normal discharge, maximum discharge for a different point of Mongoli chara and Bolramer chara has been calculated (for 25 years return period). Discharge through Mongoli chara and Bolramer chara has been represented in ArcMap using ordinary kriging. To keep the catchment of those chara free from inundation and water logging problem, this calculated discharge needed to be managed.
文摘According to the actual measurement data, probability models of horizontal wind load were obtained based on wind velocity statistic and power spectral density function of fluctuating wind velocity through stochastic sampling and using spectrum analysis method. Through the comparison of two models, probability models of horizontal wind load based on probability models of fluctuating wind velocity were obtained by revising the mean and variance of fluctuating wind velocity. Results show that the variance takes lower value when the power spectral density function of fluctuating wind velocity is used to obtain the probability model of horizontal wind load. The quadratic term of fluctuating wind velocity takes a small contribution value in total wind load with almost no contribution to the model of horizontal wind load. It is convenient for practical engineering to obtain the models of horizontal wind load by using probability models of fluctuating wind velocity.
文摘In this paper, we generalize the proof of the Cochran statistic in the case of an ANOVA two ways structure that asymptotically follows a Chi-2. While construction of homogeneity statistics test usually resorts to the determination of the covariance matrix and its inverse, the Moore-Penrose matrix, our approach, avoids this step. We also show that the Cochran statistic in ANOVA two ways is equivalent to conventional homogeneity statistics test. In particular, we show that it satisfies the invariance property. Finally, we conduct empirical verification from a meta-analysis that confirms our theoretical results.
基金supported by the National Natural Science Foundation of China(61071083)
文摘This paper focuses on reducing the complexity of K-best sphere decoding (SD) algorithm for the detection of uncoded multi-ple input multiple output (MIMO) systems. The proposed algorithm utilizes the threshold-pruning method to cut nodes with partial Euclidean distances (PEDs) larger than the threshold. Both the known noise value and the unknown noise value are considered to generate the threshold, which is the sum of the two values. The known noise value is the smal est PED of signals in the detected layers. The unknown noise value is generated by the noise power, the quality of service (QoS) and the signal-to-noise ratio (SNR) bound. Simulation results show that by considering both two noise values, the proposed algorithm makes an efficient reduction while the performance drops little.