In this review article, we revisit derivation of the cumulative density function (CDF) of the test statistic of the one-sample Kolmogorov-Smirnov test. Even though several such proofs already exist, they often leave o...In this review article, we revisit derivation of the cumulative density function (CDF) of the test statistic of the one-sample Kolmogorov-Smirnov test. Even though several such proofs already exist, they often leave out essential details necessary for proper understanding of the individual steps. Our goal is filling in these gaps, to make our presentation accessible to advanced undergraduates. We also propose a simple formula capable of approximating the exact distribution to a sufficient accuracy for any practical sample size.展开更多
In this article, we study the Kolmogorov-Smirnov type goodness-of-fit test for the inhomogeneous Poisson process with the unknown translation parameter as multidimensional parameter. The basic hypothesis and the alter...In this article, we study the Kolmogorov-Smirnov type goodness-of-fit test for the inhomogeneous Poisson process with the unknown translation parameter as multidimensional parameter. The basic hypothesis and the alternative are composite and carry to the intensity measure of inhomogeneous Poisson process and the intensity function is regular. For this model of shift parameter, we propose test which is asymptotically partially distribution free and consistent. We show that under null hypothesis the limit distribution of this statistic does not depend on unknown parameter.展开更多
In this article we improve a goodness-of-fit test, of the Kolmogorov-Smirnov type, for equally distributed- but not stationary-strongly dependent data. The test is based on the asymptotic behavior of the empirical pro...In this article we improve a goodness-of-fit test, of the Kolmogorov-Smirnov type, for equally distributed- but not stationary-strongly dependent data. The test is based on the asymptotic behavior of the empirical process, which is much more complex than in the classical case. Applications to simulated data and discussion of the obtained results are provided. This is, to the best of our knowledge, the first result providing a general goodness of fit test for non-weakly dependent data.展开更多
The seasonal variability and spatial distribution of precipitation are the main cause of flood and drought events. The study of spatial distribution and temporal trend of precipitation in river basins has been paid mo...The seasonal variability and spatial distribution of precipitation are the main cause of flood and drought events. The study of spatial distribution and temporal trend of precipitation in river basins has been paid more and more attention. However, in China, the precipitation data are measured by weather stations (WS) of China Meteorological Administration and hydrological rain gauges (RG) of national and local hydrology bureau. The WS data usually have long record with fewer stations, while the RG data usually have short record with more stations. The consistency and correlation of these two data sets have not been well understood. In this paper, the precipitation data from 30 weather stations for 1958-2007 and 248 rain gauges for 1995-2004 in the Haihe River basin are examined and compared using linear regression, 5-year moving average, Mann-Kendall trend analysis, Kolmogorov-Smirnov test, Z test and F test methods. The results show that the annual precipitation from both WS and RG records are normally distributed with minor difference in the mean value and variance. It is statistically feasible to extend the precipitation of RG by WS data sets. Using the extended precipitation data, the detailed spatial distribution of the annual and seasonal precipitation amounts as well as their temporal trends are calculated and mapped. The various distribution maps produced in the study show that for the whole basin the precipitation of 1958-2007 has been decreasing except for spring season. The decline trend is significant in summer, and this trend is stronger after the 1980s. The annual and seasonal precipitation amounts and changing trends are different in different regions and seasons. The precipitation is decreasing from south to north, from coastal zone to inland area.展开更多
Proposed by the Swedish engineer and mathematician Ernst Hjalmar Waloddi Weibull (1887-1979), the Weibull distribution is a probability distribution that is widely used to model lifetime data. Because of its flexibili...Proposed by the Swedish engineer and mathematician Ernst Hjalmar Waloddi Weibull (1887-1979), the Weibull distribution is a probability distribution that is widely used to model lifetime data. Because of its flexibility, some modifications of the Weibull distribution have been made from several researches in order to best adjust the non-monotonic shapes. This paper gives a study on the performance of two specific modifications of the Weibull distribution which are the exponentiated Weibull distribution and the additive Weibull distribution.展开更多
Corrosion test data were measured using non-destructive electrochemical techniques and analysed for studying inhibition effectiveness by different concentrations of NazCr207 on the corrosion of concrete steel-rehar in...Corrosion test data were measured using non-destructive electrochemical techniques and analysed for studying inhibition effectiveness by different concentrations of NazCr207 on the corrosion of concrete steel-rehar in NaC1 and in H2SO4 media. For these, specifications of ASTM G16-95 R04 were combined with the normal and the Gumbel probability density functions as model analytical methods for addressing issues of conflicting reports of inhibitor effectiveness that had generated concerns. Results show that reinforced concrete samples admixed with concentrations having 4 g (0.012 7 tool), 8 g (0.025 4 mol) and 6 g (0.019 l tool) NaaCr207 exhibited, in that order, high inhibition effectiveness, with respective efficiency, r/, of (90.46±1.30)%, (88.41+2.24)% and (84.87±4.74)%, in the NaC1 medium. These exhibit good agreements within replicates and statistical methods for the samples. Also, optimal inhibition effectiveness model in the H2SO4 medium was exhibited by 8 g (0.025 4 mol) Na2Cr207 concentration having r/=(78.44±1.10)%. These bear implications for addressing conflicting test data in the study of effective inhibitors for mitigating steel-rebar corrosion in aggressive environments.展开更多
This paper presents a minimum error thresholding (MET) algorithm under the hypothesis that the gray level histogram of SAR image fits to a mixture model of shifted Rayleigh distribution. This algorithm is applied to r...This paper presents a minimum error thresholding (MET) algorithm under the hypothesis that the gray level histogram of SAR image fits to a mixture model of shifted Rayleigh distribution. This algorithm is applied to real SAR images and compared with traditional Otsu algorithm and other MET algorithms based on various models of histogram. The hypothesis of using Rayleigh distribution model is confirmed by Kolmogorov-Smirnov testing and the comparison results obtained show that the proposed new algorithm has good performance in thresholding SAR images.展开更多
Tourism impacts on society are complex and mixed.However,they are vital to diverse societies,clusters,and individuals dependent upon their morals,attitudes,and resources existing for tourism development.Increasing tou...Tourism impacts on society are complex and mixed.However,they are vital to diverse societies,clusters,and individuals dependent upon their morals,attitudes,and resources existing for tourism development.Increasing tourism also brings many problems.Hence,tourist experience is fundamental for destination image and devel-opment.This research examines tourist perceptions and attitudes toward tourism impacts in Chitkul,Kalpa,and Nako in Kinnaur.Random sampling has been used to measure tourist responses on a range of indicators related to tourism development.Likert scale responses were analyzed using factor analysis,ANOVA,Mann-Whitney U-test,Kolmogorov test,and descriptive statistics.The results confirmed that tourists do not perceive any type of pollu-tion or societal barriers.They observed that natural magnetism and the socio-cultural milieu of the destination is what attracts tourists.However,tourists are not satisfied with‘networking services’,‘organization efforts’,‘sup-plementary conveniences’,and‘carriage concerns’at selected destinations in Kinnaur.Moreover,Chitkul emerged as the top tourist destination in Kinnaur.Since the destination would emerge as a hub of tourist activities shortly considering the congestion and exploitation of nearby tourist destinations at Kulu-Manali-Rohtang in Beas Valley.Hence,the assessment of tourist perceptions can be used as an indicator of tourism destination competitiveness and can assist in developing appropriate tourism policies and infrastructure upgrades.展开更多
The World Wide Web is essential to general public nowadays. From a data analysis viewpoint, it provides rich opportunities to gather observational data on a large-scale. This paper focuses on modeling the behavior of ...The World Wide Web is essential to general public nowadays. From a data analysis viewpoint, it provides rich opportunities to gather observational data on a large-scale. This paper focuses on modeling the behavior of visitors to an academic website. Although the conventional probability models, which were used by other literature for fitting in a commercial web site, capture the power law behavior in our data, they fail to capture other important features like the long tail. We propose a new model based on the identities of the users. Qualitative and quantitative tests, which are used for comparing the model fitting to our data, show that the new model outperforms other two conventional probability models.展开更多
We analyze ten of the longest (127 to 230 year-long) time series of European daily temperatures available from five different Köppen-Geiger climate classes. We split these according to the level of solar cycl...We analyze ten of the longest (127 to 230 year-long) time series of European daily temperatures available from five different Köppen-Geiger climate classes. We split these according to the level of solar cycle activity (H for “higher than median” and L for “lower than median”). This reveals coherent patterns in the temperature differences: when TH-TL?are stacked according to their calendar date, the daily averages from January 1 to December 31st disclose characteristic features in addition to the dominant annual seasonal wave, namely variations up to 2°C lasting for about 1.5 to 3 months. The five observatories at intermediate latitudes in a band from Oxford in the West to Prague in the East (same climate class) have very similar signatures. These similarities are most unlikely to be due to pure chance (confirmed by confidence levels in excess of 99% with the Kolmogorov-Smirnov and Kuiper nonparametric tests). The TH-TL patterns carry a regional signature, modulated by a more local response function. On the other hand, northern European observatories (St Petersburg and Arkhangelsk), those south of the Alps (Milan and Bologna), and the easternmost one in Astrakhan, corresponding to different climate classes, have different signatures. Similarly, preliminary study of long air pressure recordings confirms what emerges from the analysis of temperatures. These new observations lead us to conclude that the climate in different regions presents different responses to variations in solar activity. Moreover, the distributions of the lower, middle, and higher quartiles of the temperature and pressure indices in solar cycles with high versus low activity are significantly different, providing further robust statistical confirmation to this conclusion (confidence level higher to much higher than 99% using the Kuiper test).展开更多
Chronic disease is an important factor that affect the health of elderly people. We analyzed the 2006 and 2010 data from the Chinese Urban and Rural Elderly Population Surveys, which are nationally representative surv...Chronic disease is an important factor that affect the health of elderly people. We analyzed the 2006 and 2010 data from the Chinese Urban and Rural Elderly Population Surveys, which are nationally representative surveys of elderly people aged 60 years and above. We found that there existed a typical power-law distribution for the rates of different numbers of chronic diseases among elderly Chinese people. A Kolmogorov-Smirnov test indicated that the result was robust, and the power exponents were approximately ?2.5. In addition, a paired t-test was conducted, which demonstrated that the rates of different numbers of chronic diseases did not have significant urban-rural differences, time differences or gender differences.展开更多
A nonparametric test for normality of linear autoregressive time series is proposed in this paper.The test is based on the best one-step forecast in mean square with time reverse.Some asymptotic theory is developed fo...A nonparametric test for normality of linear autoregressive time series is proposed in this paper.The test is based on the best one-step forecast in mean square with time reverse.Some asymptotic theory is developed for the test,and it is shown that the test is easy to use and has good powers.The empirical percentage points to conduct the test in practice are provided and three examples using real data are included.展开更多
The empirical upper percentage points of the null distribution of a Kolmogorov-Smirnov type test for checking linearity in autoregressive models are tabulated in this paper, and the good power property of the test is ...The empirical upper percentage points of the null distribution of a Kolmogorov-Smirnov type test for checking linearity in autoregressive models are tabulated in this paper, and the good power property of the test is demonstrated.展开更多
We focus on the COM-type negative binomial distribution with three parameters, which belongs to COM-type (a, b, 0) class distributions and family of equilibrium distributions of arbitrary birth-death process. Beside...We focus on the COM-type negative binomial distribution with three parameters, which belongs to COM-type (a, b, 0) class distributions and family of equilibrium distributions of arbitrary birth-death process. Besides, we show abundant distributional properties such as overdispersion and underdispersion, log-concavity, log-convexity (infinite divisibility), pseudo compound Poisson, stochastic ordering, and asymptotic approximation. Some characterizations including sum of equicorrelated geometrically distributed random variables, conditional distribution, limit distribution of COM-negative hypergeometric distribution, and Stein's identity are given for theoretical properties. COM- negative binomial distribution was applied to overdispersion and ultrahigh zeroinflated data sets. With the aid of ratio regression, we employ maximum likelihood method to estimate the parameters and the goodness-of-fit are evaluated by the discrete Kolmogorov-Smirnov test.展开更多
A new automatic evaluationmethod of subway service quality based onmetro smart card data is proposed suitable for three different levels:station pair,railway line and subway network,which has merits of overcoming the ...A new automatic evaluationmethod of subway service quality based onmetro smart card data is proposed suitable for three different levels:station pair,railway line and subway network,which has merits of overcoming the previous lagging and subjective evaluation in the system of‘questionnaire survey plus evaluationmethod’.First,passengers’travel time distribution for different operating periods in station OD pairs are introduced initially for service evaluation purposes and are classified into different groups in order to infer the station’s operating characteristics at the different periods.Second,the classification is verified by K-means cluster analysis and K-S tests.Third,the service quality weight indicator is proposed to identify the service quality of the entire metro network from the dual perspectives of passengers and companies.Finally,the feasibility and rationality of the proposed method are verified by Shenzhen metro smart card data as an example.The new automated evaluation method of subway service quality is suitable for online and offline application.展开更多
文摘In this review article, we revisit derivation of the cumulative density function (CDF) of the test statistic of the one-sample Kolmogorov-Smirnov test. Even though several such proofs already exist, they often leave out essential details necessary for proper understanding of the individual steps. Our goal is filling in these gaps, to make our presentation accessible to advanced undergraduates. We also propose a simple formula capable of approximating the exact distribution to a sufficient accuracy for any practical sample size.
文摘In this article, we study the Kolmogorov-Smirnov type goodness-of-fit test for the inhomogeneous Poisson process with the unknown translation parameter as multidimensional parameter. The basic hypothesis and the alternative are composite and carry to the intensity measure of inhomogeneous Poisson process and the intensity function is regular. For this model of shift parameter, we propose test which is asymptotically partially distribution free and consistent. We show that under null hypothesis the limit distribution of this statistic does not depend on unknown parameter.
文摘In this article we improve a goodness-of-fit test, of the Kolmogorov-Smirnov type, for equally distributed- but not stationary-strongly dependent data. The test is based on the asymptotic behavior of the empirical process, which is much more complex than in the classical case. Applications to simulated data and discussion of the obtained results are provided. This is, to the best of our knowledge, the first result providing a general goodness of fit test for non-weakly dependent data.
基金National Basic Research Program of China, No.2010CB428406 The Key Knowledge Innovation Project of the CAS, No.KZCX2-YW-126 Key Project of National Natural Science Foundation of China, No.40730632
文摘The seasonal variability and spatial distribution of precipitation are the main cause of flood and drought events. The study of spatial distribution and temporal trend of precipitation in river basins has been paid more and more attention. However, in China, the precipitation data are measured by weather stations (WS) of China Meteorological Administration and hydrological rain gauges (RG) of national and local hydrology bureau. The WS data usually have long record with fewer stations, while the RG data usually have short record with more stations. The consistency and correlation of these two data sets have not been well understood. In this paper, the precipitation data from 30 weather stations for 1958-2007 and 248 rain gauges for 1995-2004 in the Haihe River basin are examined and compared using linear regression, 5-year moving average, Mann-Kendall trend analysis, Kolmogorov-Smirnov test, Z test and F test methods. The results show that the annual precipitation from both WS and RG records are normally distributed with minor difference in the mean value and variance. It is statistically feasible to extend the precipitation of RG by WS data sets. Using the extended precipitation data, the detailed spatial distribution of the annual and seasonal precipitation amounts as well as their temporal trends are calculated and mapped. The various distribution maps produced in the study show that for the whole basin the precipitation of 1958-2007 has been decreasing except for spring season. The decline trend is significant in summer, and this trend is stronger after the 1980s. The annual and seasonal precipitation amounts and changing trends are different in different regions and seasons. The precipitation is decreasing from south to north, from coastal zone to inland area.
文摘Proposed by the Swedish engineer and mathematician Ernst Hjalmar Waloddi Weibull (1887-1979), the Weibull distribution is a probability distribution that is widely used to model lifetime data. Because of its flexibility, some modifications of the Weibull distribution have been made from several researches in order to best adjust the non-monotonic shapes. This paper gives a study on the performance of two specific modifications of the Weibull distribution which are the exponentiated Weibull distribution and the additive Weibull distribution.
文摘Corrosion test data were measured using non-destructive electrochemical techniques and analysed for studying inhibition effectiveness by different concentrations of NazCr207 on the corrosion of concrete steel-rehar in NaC1 and in H2SO4 media. For these, specifications of ASTM G16-95 R04 were combined with the normal and the Gumbel probability density functions as model analytical methods for addressing issues of conflicting reports of inhibitor effectiveness that had generated concerns. Results show that reinforced concrete samples admixed with concentrations having 4 g (0.012 7 tool), 8 g (0.025 4 mol) and 6 g (0.019 l tool) NaaCr207 exhibited, in that order, high inhibition effectiveness, with respective efficiency, r/, of (90.46±1.30)%, (88.41+2.24)% and (84.87±4.74)%, in the NaC1 medium. These exhibit good agreements within replicates and statistical methods for the samples. Also, optimal inhibition effectiveness model in the H2SO4 medium was exhibited by 8 g (0.025 4 mol) Na2Cr207 concentration having r/=(78.44±1.10)%. These bear implications for addressing conflicting test data in the study of effective inhibitors for mitigating steel-rebar corrosion in aggressive environments.
基金Supported by the National Natural Foundation of China(No.69672029 and No.69772021)
文摘This paper presents a minimum error thresholding (MET) algorithm under the hypothesis that the gray level histogram of SAR image fits to a mixture model of shifted Rayleigh distribution. This algorithm is applied to real SAR images and compared with traditional Otsu algorithm and other MET algorithms based on various models of histogram. The hypothesis of using Rayleigh distribution model is confirmed by Kolmogorov-Smirnov testing and the comparison results obtained show that the proposed new algorithm has good performance in thresholding SAR images.
文摘Tourism impacts on society are complex and mixed.However,they are vital to diverse societies,clusters,and individuals dependent upon their morals,attitudes,and resources existing for tourism development.Increasing tourism also brings many problems.Hence,tourist experience is fundamental for destination image and devel-opment.This research examines tourist perceptions and attitudes toward tourism impacts in Chitkul,Kalpa,and Nako in Kinnaur.Random sampling has been used to measure tourist responses on a range of indicators related to tourism development.Likert scale responses were analyzed using factor analysis,ANOVA,Mann-Whitney U-test,Kolmogorov test,and descriptive statistics.The results confirmed that tourists do not perceive any type of pollu-tion or societal barriers.They observed that natural magnetism and the socio-cultural milieu of the destination is what attracts tourists.However,tourists are not satisfied with‘networking services’,‘organization efforts’,‘sup-plementary conveniences’,and‘carriage concerns’at selected destinations in Kinnaur.Moreover,Chitkul emerged as the top tourist destination in Kinnaur.Since the destination would emerge as a hub of tourist activities shortly considering the congestion and exploitation of nearby tourist destinations at Kulu-Manali-Rohtang in Beas Valley.Hence,the assessment of tourist perceptions can be used as an indicator of tourism destination competitiveness and can assist in developing appropriate tourism policies and infrastructure upgrades.
文摘The World Wide Web is essential to general public nowadays. From a data analysis viewpoint, it provides rich opportunities to gather observational data on a large-scale. This paper focuses on modeling the behavior of visitors to an academic website. Although the conventional probability models, which were used by other literature for fitting in a commercial web site, capture the power law behavior in our data, they fail to capture other important features like the long tail. We propose a new model based on the identities of the users. Qualitative and quantitative tests, which are used for comparing the model fitting to our data, show that the new model outperforms other two conventional probability models.
文摘We analyze ten of the longest (127 to 230 year-long) time series of European daily temperatures available from five different Köppen-Geiger climate classes. We split these according to the level of solar cycle activity (H for “higher than median” and L for “lower than median”). This reveals coherent patterns in the temperature differences: when TH-TL?are stacked according to their calendar date, the daily averages from January 1 to December 31st disclose characteristic features in addition to the dominant annual seasonal wave, namely variations up to 2°C lasting for about 1.5 to 3 months. The five observatories at intermediate latitudes in a band from Oxford in the West to Prague in the East (same climate class) have very similar signatures. These similarities are most unlikely to be due to pure chance (confirmed by confidence levels in excess of 99% with the Kolmogorov-Smirnov and Kuiper nonparametric tests). The TH-TL patterns carry a regional signature, modulated by a more local response function. On the other hand, northern European observatories (St Petersburg and Arkhangelsk), those south of the Alps (Milan and Bologna), and the easternmost one in Astrakhan, corresponding to different climate classes, have different signatures. Similarly, preliminary study of long air pressure recordings confirms what emerges from the analysis of temperatures. These new observations lead us to conclude that the climate in different regions presents different responses to variations in solar activity. Moreover, the distributions of the lower, middle, and higher quartiles of the temperature and pressure indices in solar cycles with high versus low activity are significantly different, providing further robust statistical confirmation to this conclusion (confidence level higher to much higher than 99% using the Kuiper test).
文摘Chronic disease is an important factor that affect the health of elderly people. We analyzed the 2006 and 2010 data from the Chinese Urban and Rural Elderly Population Surveys, which are nationally representative surveys of elderly people aged 60 years and above. We found that there existed a typical power-law distribution for the rates of different numbers of chronic diseases among elderly Chinese people. A Kolmogorov-Smirnov test indicated that the result was robust, and the power exponents were approximately ?2.5. In addition, a paired t-test was conducted, which demonstrated that the rates of different numbers of chronic diseases did not have significant urban-rural differences, time differences or gender differences.
基金This research is supported by the National Natural Science Foundation of China(No.19971093) the Knowledge Innovation Program of the Chinese Academy of Sciences (No. KZCX2-SW-118).
文摘A nonparametric test for normality of linear autoregressive time series is proposed in this paper.The test is based on the best one-step forecast in mean square with time reverse.Some asymptotic theory is developed for the test,and it is shown that the test is easy to use and has good powers.The empirical percentage points to conduct the test in practice are provided and three examples using real data are included.
基金a Grant from the Natural Sciences and Engineering Research Council of Canada. This research is supported in part by the Nation
文摘The empirical upper percentage points of the null distribution of a Kolmogorov-Smirnov type test for checking linearity in autoregressive models are tabulated in this paper, and the good power property of the test is demonstrated.
基金The proposed COM-negative binomial distribution of this work was as early as conceptualized in December, 2014 when the authors saw the online version of [15]. The authors want to thank Prof. R. KShler for mailing the valuable encyclopedia of discrete univariate distributions [39] to them. This work was partly supported by the National Natural Science Foundation of China (Grant No. 11201165).
文摘We focus on the COM-type negative binomial distribution with three parameters, which belongs to COM-type (a, b, 0) class distributions and family of equilibrium distributions of arbitrary birth-death process. Besides, we show abundant distributional properties such as overdispersion and underdispersion, log-concavity, log-convexity (infinite divisibility), pseudo compound Poisson, stochastic ordering, and asymptotic approximation. Some characterizations including sum of equicorrelated geometrically distributed random variables, conditional distribution, limit distribution of COM-negative hypergeometric distribution, and Stein's identity are given for theoretical properties. COM- negative binomial distribution was applied to overdispersion and ultrahigh zeroinflated data sets. With the aid of ratio regression, we employ maximum likelihood method to estimate the parameters and the goodness-of-fit are evaluated by the discrete Kolmogorov-Smirnov test.
文摘A new automatic evaluationmethod of subway service quality based onmetro smart card data is proposed suitable for three different levels:station pair,railway line and subway network,which has merits of overcoming the previous lagging and subjective evaluation in the system of‘questionnaire survey plus evaluationmethod’.First,passengers’travel time distribution for different operating periods in station OD pairs are introduced initially for service evaluation purposes and are classified into different groups in order to infer the station’s operating characteristics at the different periods.Second,the classification is verified by K-means cluster analysis and K-S tests.Third,the service quality weight indicator is proposed to identify the service quality of the entire metro network from the dual perspectives of passengers and companies.Finally,the feasibility and rationality of the proposed method are verified by Shenzhen metro smart card data as an example.The new automated evaluation method of subway service quality is suitable for online and offline application.