The objectives of this paper are to demonstrate the algorithms employed by three statistical software programs (R, Real Statistics using Excel, and SPSS) for calculating the exact two-tailed probability of the Wald-Wo...The objectives of this paper are to demonstrate the algorithms employed by three statistical software programs (R, Real Statistics using Excel, and SPSS) for calculating the exact two-tailed probability of the Wald-Wolfowitz one-sample runs test for randomness, to present a novel approach for computing this probability, and to compare the four procedures by generating samples of 10 and 11 data points, varying the parameters n<sub>0</sub> (number of zeros) and n<sub>1</sub> (number of ones), as well as the number of runs. Fifty-nine samples are created to replicate the behavior of the distribution of the number of runs with 10 and 11 data points. The exact two-tailed probabilities for the four procedures were compared using Friedman’s test. Given the significant difference in central tendency, post-hoc comparisons were conducted using Conover’s test with Benjamini-Yekutielli correction. It is concluded that the procedures of Real Statistics using Excel and R exhibit some inadequacies in the calculation of the exact two-tailed probability, whereas the new proposal and the SPSS procedure are deemed more suitable. The proposed robust algorithm has a more transparent rationale than the SPSS one, albeit being somewhat more conservative. We recommend its implementation for this test and its application to others, such as the binomial and sign test.展开更多
A simple stochastic mechanism that produces exact and approximate power-law distributions is presented. The model considers radially symmetric Gaussian, exponential and power-law functions inn= 1, 2, 3 dimensions. Ran...A simple stochastic mechanism that produces exact and approximate power-law distributions is presented. The model considers radially symmetric Gaussian, exponential and power-law functions inn= 1, 2, 3 dimensions. Randomly sampling these functions with a radially uniform sampling scheme produces heavy-tailed distributions. For two-dimensional Gaussians and one-dimensional exponential functions, exact power-laws with exponent –1 are obtained. In other cases, densities with an approximate power-law behaviour close to the origin arise. These densities are analyzed using Padé approximants in order to show the approximate power-law behaviour. If the sampled function itself follows a power-law with exponent –α, random sampling leads to densities that also follow an exact power-law, with exponent -n/a – 1. The presented mechanism shows that power-laws can arise in generic situations different from previously considered specialized systems such as multi-particle systems close to phase transitions, dynamical systems at bifurcation points or systems displaying self-organized criticality. Thus, the presented mechanism may serve as an alternative hypothesis in system identification problems.展开更多
The purpose of this paper is to obtain the expression of the sample mean difference variance of the Student’s distributive model. In the 2007 the study of the mean difference variance, after some decades, was resumed...The purpose of this paper is to obtain the expression of the sample mean difference variance of the Student’s distributive model. In the 2007 the study of the mean difference variance, after some decades, was resumed by Campobasso</span><span style="font-family:Verdana;"> [1]</span><span style="font-family:Verdana;">. Using the Nair’s </span><span style="font-family:Verdana;">[2]</span><span style="font-family:Verdana;"> and Lomnicki’s general results</span><span style="font-family:Verdana;"> [3]</span><span style="font-family:Verdana;">, he obtained the variance of sample mean difference for different distributive models (Laplace</span><span style="font-family:Verdana;">’</span><span style="font-family:Verdana;">s, triangular, power, logit, Pareto</span><span style="font-family:Verdana;">’</span><span style="font-family:Verdana;">s and Gumbel’s model). In addition he extended the knowledge comparing to the ones already known for the other distributive model (normal, rectangular and exponential model).展开更多
In the paper,we study the strong uniform consistency for the kernal estimates of random window w■th of density function and its derivatives under the condition that the sequence{X_n}of the ■ are the identically Φ-m...In the paper,we study the strong uniform consistency for the kernal estimates of random window w■th of density function and its derivatives under the condition that the sequence{X_n}of the ■ are the identically Φ-mixing random variabks.展开更多
We present an iterative algorithm for approximating an unknown function sequentially using random samples of the function values and gradients. This is an extension of the recently developed sequential approximation (...We present an iterative algorithm for approximating an unknown function sequentially using random samples of the function values and gradients. This is an extension of the recently developed sequential approximation (SA) method, which approximates a target function using samples of function values only. The current paper extends the development of the SA methods to the Sobolev space and allows the use of gradient information naturally. The algorithm is easy to implement, as it requires only vector operations and does not involve any matrices. We present tight error bound of the algorithm, and derive an optimal sampling probability measure that results in fastest error convergence. Numerical examples are provided to verify the theoretical error analysis and the effectiveness of the proposed SA algorithm.展开更多
A composite random variable is a product (or sum of products) of statistically distributed quantities. Such a variable can represent the solution to a multi-factor quantitative problem submitted to a large, diverse, i...A composite random variable is a product (or sum of products) of statistically distributed quantities. Such a variable can represent the solution to a multi-factor quantitative problem submitted to a large, diverse, independent, anonymous group of non-expert respondents (the “crowd”). The objective of this research is to examine the statistical distribution of solutions from a large crowd to a quantitative problem involving image analysis and object counting. Theoretical analysis by the author, covering a range of conditions and types of factor variables, predicts that composite random variables are distributed log-normally to an excellent approximation. If the factors in a problem are themselves distributed log-normally, then their product is rigorously log-normal. A crowdsourcing experiment devised by the author and implemented with the assistance of a BBC (British Broadcasting Corporation) television show, yielded a sample of approximately 2000 responses consistent with a log-normal distribution. The sample mean was within ~12% of the true count. However, a Monte Carlo simulation (MCS) of the experiment, employing either normal or log-normal random variables as factors to model the processes by which a crowd of 1 million might arrive at their estimates, resulted in a visually perfect log-normal distribution with a mean response within ~5% of the true count. The results of this research suggest that a well-modeled MCS, by simulating a sample of responses from a large, rational, and incentivized crowd, can provide a more accurate solution to a quantitative problem than might be attainable by direct sampling of a smaller crowd or an uninformed crowd, irrespective of size, that guesses randomly.展开更多
On the basis of the principles of simple random sampling, the statistical model of rate of disfigurement (RD) is put forward and described in detail. According to the definition of simple random sampling for the attri...On the basis of the principles of simple random sampling, the statistical model of rate of disfigurement (RD) is put forward and described in detail. According to the definition of simple random sampling for the attribute data in GIS, the mean and variance of the RD are deduced as the characteristic value of the statistical model in order to explain the feasibility of the accuracy measurement of the attribute data in GIS by using the RD. Moreover, on the basis of the mean and variance of the RD, the quality assessment method for attribute data of vector maps during the data collecting is discussed. The RD spread graph is also drawn to see whether the quality of the attribute data is under control. The RD model can synthetically judge the quality of attribute data, which is different from other measurement coefficients that only discuss accuracy of classification.展开更多
The aim of this study is to investigate the impacts of the sampling strategy of landslide and non-landslide on the performance of landslide susceptibility assessment(LSA).The study area is the Feiyun catchment in Wenz...The aim of this study is to investigate the impacts of the sampling strategy of landslide and non-landslide on the performance of landslide susceptibility assessment(LSA).The study area is the Feiyun catchment in Wenzhou City,Southeast China.Two types of landslides samples,combined with seven non-landslide sampling strategies,resulted in a total of 14 scenarios.The corresponding landslide susceptibility map(LSM)for each scenario was generated using the random forest model.The receiver operating characteristic(ROC)curve and statistical indicators were calculated and used to assess the impact of the dataset sampling strategy.The results showed that higher accuracies were achieved when using the landslide core as positive samples,combined with non-landslide sampling from the very low zone or buffer zone.The results reveal the influence of landslide and non-landslide sampling strategies on the accuracy of LSA,which provides a reference for subsequent researchers aiming to obtain a more reasonable LSM.展开更多
The aim of this paper is to compare sample quality across two probability samples and one that uses probabilistic cluster sampling combined with random route and quota sampling within the selected clusters in order to...The aim of this paper is to compare sample quality across two probability samples and one that uses probabilistic cluster sampling combined with random route and quota sampling within the selected clusters in order to define the ultimate survey units. All of them use the face-to-face interview as the survey procedure. The hypothesis to be tested is that it is possible to achieve the same degree of representativeness using a combination of random route sampling and quota sampling (with substitution) as it can be achieved by means of household sampling (without substitution) based on the municipal register of inhabitants. We have found such marked differences in the age and gender distribution of the probability sampling, where the deviations exceed 6%. A different picture emerges when it comes to comparing the employment variables, where the quota sampling overestimates the economic activity rate (2.5%) and the unemployment rate (8%) and underestimates the employment rate (3.46%).展开更多
In general the accuracy of mean estimator can be improved by stratified random sampling. In this paper, we provide an idea different from empirical methods that the accuracy can be more improved through bootstrap resa...In general the accuracy of mean estimator can be improved by stratified random sampling. In this paper, we provide an idea different from empirical methods that the accuracy can be more improved through bootstrap resampling method under some conditions. The determination of sample size by bootstrap method is also discussed, and a simulation is made to verify the accuracy of the proposed method. The simulation results show that the sample size based on bootstrapping is smaller than that based on central limit theorem.展开更多
One of the key assumptions in respondent-driven sampling (RDS) analysis, called “random selection assumption,” is that respondents randomly recruit their peers from their personal networks. The objective of this stu...One of the key assumptions in respondent-driven sampling (RDS) analysis, called “random selection assumption,” is that respondents randomly recruit their peers from their personal networks. The objective of this study was to verify this assumption in the empirical data of egocentric networks. Methods: We conducted an egocentric network study among young drug users in China, in which RDS was used to recruit this hard-to-reach population. If the random recruitment assumption holds, the RDS-estimated population proportions should be similar to the actual population proportions. Following this logic, we first calculated the population proportions of five visible variables (gender, age, education, marital status, and drug use mode) among the total drug-use alters from which the RDS sample was drawn, and then estimated the RDS-adjusted population proportions and their 95% confidence intervals in the RDS sample. Theoretically, if the random recruitment assumption holds, the 95% confidence intervals estimated in the RDS sample should include the population proportions calculated in the total drug-use alters. Results: The evaluation of the RDS sample indicated its success in reaching the convergence of RDS compositions and including a broad cross-section of the hidden population. Findings demonstrate that the random selection assumption holds for three group traits, but not for two others. Specifically, egos randomly recruited subjects in different age groups, marital status, or drug use modes from their network alters, but not in gender and education levels. Conclusions: This study demonstrates the occurrence of non-random recruitment, indicating that the recruitment of subjects in this RDS study was not completely at random. Future studies are needed to assess the extent to which the population proportion estimates can be biased when the violation of the assumption occurs in some group traits in RDS samples.展开更多
In this paper, we propose a software component under Windows that generates pseudo random numbers using RDS (Refined Descriptive Sampling) as required by the simulation. RDS is regarded as the best sampling method a...In this paper, we propose a software component under Windows that generates pseudo random numbers using RDS (Refined Descriptive Sampling) as required by the simulation. RDS is regarded as the best sampling method as shown in the literature. In order to validate the proposed component, its implementation is proposed on approximating integrals. The simulation results from RDS using "RDSRnd" generator were compared to those obtained using the generator "Rnd" included in the Pascal programming language under Windows. The best results are given by the proposed software component.展开更多
In this study we have proposed a modified ratio type estimator for population variance of the study variable y under simple random sampling without replacement making use of coefficient of kurtosis and median of an au...In this study we have proposed a modified ratio type estimator for population variance of the study variable y under simple random sampling without replacement making use of coefficient of kurtosis and median of an auxiliary variable x. The estimator’s properties have been derived up to first order of Taylor’s series expansion. The efficiency conditions derived theoretically under which the proposed estimator performs better than existing estimators. Empirical studies have been done using real populations to demonstrate the performance of the developed estimator in comparison with the existing estimators. The proposed estimator as illustrated by the empirical studies performs better than the existing estimators under some specified conditions i.e. it has the smallest Mean Squared Error and the highest Percentage Relative Efficiency. The developed estimator therefore is suitable to be applied to situations in which the variable of interest has a positive correlation with the auxiliary variable.展开更多
In this paper, auxiliary information is used to determine an estimator of finite population total using nonparametric regression under stratified random sampling. To achieve this, a model-based approach is adopted by ...In this paper, auxiliary information is used to determine an estimator of finite population total using nonparametric regression under stratified random sampling. To achieve this, a model-based approach is adopted by making use of the local polynomial regression estimation to predict the nonsampled values of the survey variable y. The performance of the proposed estimator is investigated against some design-based and model-based regression estimators. The simulation experiments show that the resulting estimator exhibits good properties. Generally, good confidence intervals are seen for the nonparametric regression estimators, and use of the proposed estimator leads to relatively smaller values of RE compared to other estimators.展开更多
In this paper, analysis of methodology was realized for the application of stratified random sampling with optimum allocation in the case of a subject of research which concerns the rural population and presents high ...In this paper, analysis of methodology was realized for the application of stratified random sampling with optimum allocation in the case of a subject of research which concerns the rural population and presents high differentiations among the three strata in which this population could be classified. The rural population of Evros Prefecture (Greece) with criterion the mean altitude of settlements was classified in three strata, the mountainous, semi-mountainous and fiat population for the estimation of mean consumption of forest fuelwood for covering of heating and cooking needs in households of these three strata. The analysis of this methodology includes: (1) the determination of total size of sample for entire the rural population and its allocation to the various strata; (2) the investigation of effectiveness of stratification with the technique of analysis of variance (One-Way ANOVA); (3) the conduct of sampling research with the realization of face-to-face interviews in selected households and (4) the control of forms of the questionnaire and the analysis of data by using the statistical package for social sciences, SPSS for Windows. All data for the analysis of this methodology and its practical application were taken by the pilot sampling which was realized in each stratum. Relative paper was not found by the review of literature.展开更多
In this paper, the problem of nonparametric estimation of finite population quantile function using multiplicative bias correction technique is considered. A robust estimator of the finite population quantile function...In this paper, the problem of nonparametric estimation of finite population quantile function using multiplicative bias correction technique is considered. A robust estimator of the finite population quantile function based on multiplicative bias correction is derived with the aid of a super population model. Most studies have concentrated on kernel smoothers in the estimation of regression functions. This technique has also been applied to various methods of non-parametric estimation of the finite population quantile already under review. A major problem with the use of nonparametric kernel-based regression over a finite interval, such as the estimation of finite population quantities, is bias at boundary points. By correcting the boundary problems associated with previous model-based estimators, the multiplicative bias corrected estimator produced better results in estimating the finite population quantile function. Furthermore, the asymptotic behavior of the proposed estimators </span><span style="font-family:Verdana;">is</span><span style="font-family:Verdana;"> presented</span><span style="font-family:Verdana;">. </span><span style="font-family:Verdana;">It is observed that the estimator is asymptotically unbiased and statistically consistent when certain conditions are satisfied. The simulation results show that the suggested estimator is quite well in terms of relative bias, mean squared error, and relative root mean error. As a result, the multiplicative bias corrected estimator is strongly suggested for survey sampling estimation of the finite population quantile function.展开更多
The research was carried out on the territory of the Karelian Isthmus of the Leningrad Region using Sentinel-2B images and data from a network of ground sample plots. The ground sample plots are located in the studied...The research was carried out on the territory of the Karelian Isthmus of the Leningrad Region using Sentinel-2B images and data from a network of ground sample plots. The ground sample plots are located in the studied territory mainly in a regular manner, laid and surveyed according to the ICP-Forests methodology with some additions. The total area of the sample plots is a small part of the entire study area. One of the objectives of the study was to determine the possibility of using the k-NN (nearest neighbor method) to assess the state of forests throughout the whole studied territory by joint statistical processing of data from ground sample plots and Sentinel-2B imagery. The data of the ground-based sample plots were divided into 2 equal parts, one for the application of the k-NN method, the second for checking the results of the method application. The systematic error in determining the mean damage class of the tree stands on sample plots by the k-NN method turned out to be zero, the random error is equal to one point. These results offer a possibility to determine the state of the forest in the entire study area. The second objective of the study was to examine the possibility of using the short-wave vegetation index (SWVI) to assess the state of forests. As a result, a close statistically reliable dependence of the average score of the state of plantations and the value of the SWVI index was established, which makes it possible to use the established relationship to determine the state of forests throughout the studied territory. The joint use and statistical processing of remotely sensed data and ground-based test areas by the two studied methods make it possible to assess the state of forests throughout the large studied area within the image. The results obtained can be used to monitor the state of forests in large areas and design appropriate forestry protective measures.展开更多
Srivastava and Jhajj [ 1 6] proposed a class of estimators for estimating population variance using multi auxiliary variables in simple random sampling and they utilized the means and variances of auxiliary variables....Srivastava and Jhajj [ 1 6] proposed a class of estimators for estimating population variance using multi auxiliary variables in simple random sampling and they utilized the means and variances of auxiliary variables. In this paper, we adapted this class and motivated by Searle [13], and we suggested more generalized class of estimators for estimating the population variance in simple random sampling. The expressions for the mean square error of proposed class have been derived in general form. Besides obtaining the minimized MSE of the proposed and adapted class, it is shown that the adapted classis the special case of the proposed class. Moreover, these theoretical findings are supported by an empirical study of original data.展开更多
Objective:To reveal the distribution characteristics and demographic factors of traditional Chinese medicine(TCM)constitution among elderly individuals in China.Methods: Elderly individuals from seven regions in China...Objective:To reveal the distribution characteristics and demographic factors of traditional Chinese medicine(TCM)constitution among elderly individuals in China.Methods: Elderly individuals from seven regions in China were selected as samples in this study using a multistage cluster random sampling method.The basic information questionnaire and Constitution in Chinese Medicine Questionnaire(Elderly Edition)were used.Descriptive statistical analysis,chi-squared tests,and binary logistic regression analysis were used.Results: The single balanced constitution(BC)accounted for 23.9%.The results of the major TCM constitution types showed that BC(43.2%)accounted for the largest proportion and unbalanced constitutions ranged from 0.9%to 15.7%.East China region(odds ratio[OR]=2.097;95%confidence interval[CI],1.912 to 2.301),married status(OR=1.341;95%CI,1.235 to 1.457),and managers(OR=1.254;95%CI,1.044 to 1.505)were significantly associated with BC.Age>70 years was associated with qi-deficiency constitution and blood stasis constitution(BSC).Female sex was significantly associated with yang-deficiency constitution(OR=1.646;95%CI,1.52 to 1.782).Southwest region was significantly associated with phlegm-dampness constitution(OR=1.809;95%CI,1.569 to 2.086).North China region was significantly associated with inherited special constitution(OR=2.521;95%CI,1.569 to 4.05).South China region(OR=2.741;95%CI,1.997 to 1.3.763),Central China region(OR=8.889;95%CI,6.676 to 11.835),senior middle school education(OR=2.442;95%CI,1.932 to 3.088),and managers(OR=1.804;95%CI,1.21 to 2.69)were significantly associated with BSC.Conclusions: This study defined the distribution characteristics and demographic factors of TCM constitution in the elderly population.Adjusting and improving unbalanced constitutions,which are correlated with diseases,can help promote healthy aging through the scientific management of these demographic factors.展开更多
Machine learning(ML)algorithms are frequently used in landslide susceptibility modeling.Different data handling strategies may generate variations in landslide susceptibility modeling,even when using the same ML algor...Machine learning(ML)algorithms are frequently used in landslide susceptibility modeling.Different data handling strategies may generate variations in landslide susceptibility modeling,even when using the same ML algorithm.This research aims to compare the combinations of inventory data handling,cross validation(CV),and hyperparameter tuning strategies to generate landslide susceptibility maps.The results are expected to provide a general strategy for landslide susceptibility modeling using ML techniques.The authors employed eight landslide inventory data handling scenarios to convert a landslide polygon into a landslide point,i.e.,the landslide point is located on the toe(minimum height),on the scarp(maximum height),at the center of the landslide,randomly inside the polygon(1 point),randomly inside the polygon(3 points),randomly inside the polygon(5 points),randomly inside the polygon(10 points),and 15 m grid sampling.Random forest models using CV-nonspatial hyperparameter tuning,spatial CV-spatial hyperparameter tuning,and spatial CV-forward feature selection-no hyperparameter tuning were applied for each data handling strategy.The combination generated 24 random forest ML workflows,which are applied using a complete inventory of 743 landslides triggered by Tropical Cyclone Cempaka(2017)in Pacitan Regency,Indonesia,and 11 landslide controlling factors.The results show that grid sampling with spatial CV and spatial hyperparameter tuning is favorable because the strategy can minimize overfitting,generate a relatively high-performance predictive model,and reduce the appearance of susceptibility artifacts in the landslide area.Careful data inventory handling,CV,and hyperparameter tuning strategies should be considered in landslide susceptibility modeling to increase the applicability of landslide susceptibility maps in practical application.展开更多
文摘The objectives of this paper are to demonstrate the algorithms employed by three statistical software programs (R, Real Statistics using Excel, and SPSS) for calculating the exact two-tailed probability of the Wald-Wolfowitz one-sample runs test for randomness, to present a novel approach for computing this probability, and to compare the four procedures by generating samples of 10 and 11 data points, varying the parameters n<sub>0</sub> (number of zeros) and n<sub>1</sub> (number of ones), as well as the number of runs. Fifty-nine samples are created to replicate the behavior of the distribution of the number of runs with 10 and 11 data points. The exact two-tailed probabilities for the four procedures were compared using Friedman’s test. Given the significant difference in central tendency, post-hoc comparisons were conducted using Conover’s test with Benjamini-Yekutielli correction. It is concluded that the procedures of Real Statistics using Excel and R exhibit some inadequacies in the calculation of the exact two-tailed probability, whereas the new proposal and the SPSS procedure are deemed more suitable. The proposed robust algorithm has a more transparent rationale than the SPSS one, albeit being somewhat more conservative. We recommend its implementation for this test and its application to others, such as the binomial and sign test.
文摘A simple stochastic mechanism that produces exact and approximate power-law distributions is presented. The model considers radially symmetric Gaussian, exponential and power-law functions inn= 1, 2, 3 dimensions. Randomly sampling these functions with a radially uniform sampling scheme produces heavy-tailed distributions. For two-dimensional Gaussians and one-dimensional exponential functions, exact power-laws with exponent –1 are obtained. In other cases, densities with an approximate power-law behaviour close to the origin arise. These densities are analyzed using Padé approximants in order to show the approximate power-law behaviour. If the sampled function itself follows a power-law with exponent –α, random sampling leads to densities that also follow an exact power-law, with exponent -n/a – 1. The presented mechanism shows that power-laws can arise in generic situations different from previously considered specialized systems such as multi-particle systems close to phase transitions, dynamical systems at bifurcation points or systems displaying self-organized criticality. Thus, the presented mechanism may serve as an alternative hypothesis in system identification problems.
文摘The purpose of this paper is to obtain the expression of the sample mean difference variance of the Student’s distributive model. In the 2007 the study of the mean difference variance, after some decades, was resumed by Campobasso</span><span style="font-family:Verdana;"> [1]</span><span style="font-family:Verdana;">. Using the Nair’s </span><span style="font-family:Verdana;">[2]</span><span style="font-family:Verdana;"> and Lomnicki’s general results</span><span style="font-family:Verdana;"> [3]</span><span style="font-family:Verdana;">, he obtained the variance of sample mean difference for different distributive models (Laplace</span><span style="font-family:Verdana;">’</span><span style="font-family:Verdana;">s, triangular, power, logit, Pareto</span><span style="font-family:Verdana;">’</span><span style="font-family:Verdana;">s and Gumbel’s model). In addition he extended the knowledge comparing to the ones already known for the other distributive model (normal, rectangular and exponential model).
基金supported by Natural Science Foun■ion of Henan P■visial Commission of Bdusation
文摘In the paper,we study the strong uniform consistency for the kernal estimates of random window w■th of density function and its derivatives under the condition that the sequence{X_n}of the ■ are the identically Φ-mixing random variabks.
文摘We present an iterative algorithm for approximating an unknown function sequentially using random samples of the function values and gradients. This is an extension of the recently developed sequential approximation (SA) method, which approximates a target function using samples of function values only. The current paper extends the development of the SA methods to the Sobolev space and allows the use of gradient information naturally. The algorithm is easy to implement, as it requires only vector operations and does not involve any matrices. We present tight error bound of the algorithm, and derive an optimal sampling probability measure that results in fastest error convergence. Numerical examples are provided to verify the theoretical error analysis and the effectiveness of the proposed SA algorithm.
文摘A composite random variable is a product (or sum of products) of statistically distributed quantities. Such a variable can represent the solution to a multi-factor quantitative problem submitted to a large, diverse, independent, anonymous group of non-expert respondents (the “crowd”). The objective of this research is to examine the statistical distribution of solutions from a large crowd to a quantitative problem involving image analysis and object counting. Theoretical analysis by the author, covering a range of conditions and types of factor variables, predicts that composite random variables are distributed log-normally to an excellent approximation. If the factors in a problem are themselves distributed log-normally, then their product is rigorously log-normal. A crowdsourcing experiment devised by the author and implemented with the assistance of a BBC (British Broadcasting Corporation) television show, yielded a sample of approximately 2000 responses consistent with a log-normal distribution. The sample mean was within ~12% of the true count. However, a Monte Carlo simulation (MCS) of the experiment, employing either normal or log-normal random variables as factors to model the processes by which a crowd of 1 million might arrive at their estimates, resulted in a visually perfect log-normal distribution with a mean response within ~5% of the true count. The results of this research suggest that a well-modeled MCS, by simulating a sample of responses from a large, rational, and incentivized crowd, can provide a more accurate solution to a quantitative problem than might be attainable by direct sampling of a smaller crowd or an uninformed crowd, irrespective of size, that guesses randomly.
基金ProjectsupportedbytheNationalNaturalScienceFoundationofChina (No .40 1 71 0 78) ,FundfromHongKongPolytechnicUniversity (No.1 .34 .970 9)andtheResearchGrantsCouncilofHongKongSAR (No .3 ZB40 ) .
文摘On the basis of the principles of simple random sampling, the statistical model of rate of disfigurement (RD) is put forward and described in detail. According to the definition of simple random sampling for the attribute data in GIS, the mean and variance of the RD are deduced as the characteristic value of the statistical model in order to explain the feasibility of the accuracy measurement of the attribute data in GIS by using the RD. Moreover, on the basis of the mean and variance of the RD, the quality assessment method for attribute data of vector maps during the data collecting is discussed. The RD spread graph is also drawn to see whether the quality of the attribute data is under control. The RD model can synthetically judge the quality of attribute data, which is different from other measurement coefficients that only discuss accuracy of classification.
文摘The aim of this study is to investigate the impacts of the sampling strategy of landslide and non-landslide on the performance of landslide susceptibility assessment(LSA).The study area is the Feiyun catchment in Wenzhou City,Southeast China.Two types of landslides samples,combined with seven non-landslide sampling strategies,resulted in a total of 14 scenarios.The corresponding landslide susceptibility map(LSM)for each scenario was generated using the random forest model.The receiver operating characteristic(ROC)curve and statistical indicators were calculated and used to assess the impact of the dataset sampling strategy.The results showed that higher accuracies were achieved when using the landslide core as positive samples,combined with non-landslide sampling from the very low zone or buffer zone.The results reveal the influence of landslide and non-landslide sampling strategies on the accuracy of LSA,which provides a reference for subsequent researchers aiming to obtain a more reasonable LSM.
文摘The aim of this paper is to compare sample quality across two probability samples and one that uses probabilistic cluster sampling combined with random route and quota sampling within the selected clusters in order to define the ultimate survey units. All of them use the face-to-face interview as the survey procedure. The hypothesis to be tested is that it is possible to achieve the same degree of representativeness using a combination of random route sampling and quota sampling (with substitution) as it can be achieved by means of household sampling (without substitution) based on the municipal register of inhabitants. We have found such marked differences in the age and gender distribution of the probability sampling, where the deviations exceed 6%. A different picture emerges when it comes to comparing the employment variables, where the quota sampling overestimates the economic activity rate (2.5%) and the unemployment rate (8%) and underestimates the employment rate (3.46%).
基金The Science Research Start-up Foundation for Young Teachers of Southwest Jiaotong University(No.2007Q091)
文摘In general the accuracy of mean estimator can be improved by stratified random sampling. In this paper, we provide an idea different from empirical methods that the accuracy can be more improved through bootstrap resampling method under some conditions. The determination of sample size by bootstrap method is also discussed, and a simulation is made to verify the accuracy of the proposed method. The simulation results show that the sample size based on bootstrapping is smaller than that based on central limit theorem.
文摘One of the key assumptions in respondent-driven sampling (RDS) analysis, called “random selection assumption,” is that respondents randomly recruit their peers from their personal networks. The objective of this study was to verify this assumption in the empirical data of egocentric networks. Methods: We conducted an egocentric network study among young drug users in China, in which RDS was used to recruit this hard-to-reach population. If the random recruitment assumption holds, the RDS-estimated population proportions should be similar to the actual population proportions. Following this logic, we first calculated the population proportions of five visible variables (gender, age, education, marital status, and drug use mode) among the total drug-use alters from which the RDS sample was drawn, and then estimated the RDS-adjusted population proportions and their 95% confidence intervals in the RDS sample. Theoretically, if the random recruitment assumption holds, the 95% confidence intervals estimated in the RDS sample should include the population proportions calculated in the total drug-use alters. Results: The evaluation of the RDS sample indicated its success in reaching the convergence of RDS compositions and including a broad cross-section of the hidden population. Findings demonstrate that the random selection assumption holds for three group traits, but not for two others. Specifically, egos randomly recruited subjects in different age groups, marital status, or drug use modes from their network alters, but not in gender and education levels. Conclusions: This study demonstrates the occurrence of non-random recruitment, indicating that the recruitment of subjects in this RDS study was not completely at random. Future studies are needed to assess the extent to which the population proportion estimates can be biased when the violation of the assumption occurs in some group traits in RDS samples.
文摘In this paper, we propose a software component under Windows that generates pseudo random numbers using RDS (Refined Descriptive Sampling) as required by the simulation. RDS is regarded as the best sampling method as shown in the literature. In order to validate the proposed component, its implementation is proposed on approximating integrals. The simulation results from RDS using "RDSRnd" generator were compared to those obtained using the generator "Rnd" included in the Pascal programming language under Windows. The best results are given by the proposed software component.
文摘In this study we have proposed a modified ratio type estimator for population variance of the study variable y under simple random sampling without replacement making use of coefficient of kurtosis and median of an auxiliary variable x. The estimator’s properties have been derived up to first order of Taylor’s series expansion. The efficiency conditions derived theoretically under which the proposed estimator performs better than existing estimators. Empirical studies have been done using real populations to demonstrate the performance of the developed estimator in comparison with the existing estimators. The proposed estimator as illustrated by the empirical studies performs better than the existing estimators under some specified conditions i.e. it has the smallest Mean Squared Error and the highest Percentage Relative Efficiency. The developed estimator therefore is suitable to be applied to situations in which the variable of interest has a positive correlation with the auxiliary variable.
文摘In this paper, auxiliary information is used to determine an estimator of finite population total using nonparametric regression under stratified random sampling. To achieve this, a model-based approach is adopted by making use of the local polynomial regression estimation to predict the nonsampled values of the survey variable y. The performance of the proposed estimator is investigated against some design-based and model-based regression estimators. The simulation experiments show that the resulting estimator exhibits good properties. Generally, good confidence intervals are seen for the nonparametric regression estimators, and use of the proposed estimator leads to relatively smaller values of RE compared to other estimators.
文摘In this paper, analysis of methodology was realized for the application of stratified random sampling with optimum allocation in the case of a subject of research which concerns the rural population and presents high differentiations among the three strata in which this population could be classified. The rural population of Evros Prefecture (Greece) with criterion the mean altitude of settlements was classified in three strata, the mountainous, semi-mountainous and fiat population for the estimation of mean consumption of forest fuelwood for covering of heating and cooking needs in households of these three strata. The analysis of this methodology includes: (1) the determination of total size of sample for entire the rural population and its allocation to the various strata; (2) the investigation of effectiveness of stratification with the technique of analysis of variance (One-Way ANOVA); (3) the conduct of sampling research with the realization of face-to-face interviews in selected households and (4) the control of forms of the questionnaire and the analysis of data by using the statistical package for social sciences, SPSS for Windows. All data for the analysis of this methodology and its practical application were taken by the pilot sampling which was realized in each stratum. Relative paper was not found by the review of literature.
文摘In this paper, the problem of nonparametric estimation of finite population quantile function using multiplicative bias correction technique is considered. A robust estimator of the finite population quantile function based on multiplicative bias correction is derived with the aid of a super population model. Most studies have concentrated on kernel smoothers in the estimation of regression functions. This technique has also been applied to various methods of non-parametric estimation of the finite population quantile already under review. A major problem with the use of nonparametric kernel-based regression over a finite interval, such as the estimation of finite population quantities, is bias at boundary points. By correcting the boundary problems associated with previous model-based estimators, the multiplicative bias corrected estimator produced better results in estimating the finite population quantile function. Furthermore, the asymptotic behavior of the proposed estimators </span><span style="font-family:Verdana;">is</span><span style="font-family:Verdana;"> presented</span><span style="font-family:Verdana;">. </span><span style="font-family:Verdana;">It is observed that the estimator is asymptotically unbiased and statistically consistent when certain conditions are satisfied. The simulation results show that the suggested estimator is quite well in terms of relative bias, mean squared error, and relative root mean error. As a result, the multiplicative bias corrected estimator is strongly suggested for survey sampling estimation of the finite population quantile function.
文摘The research was carried out on the territory of the Karelian Isthmus of the Leningrad Region using Sentinel-2B images and data from a network of ground sample plots. The ground sample plots are located in the studied territory mainly in a regular manner, laid and surveyed according to the ICP-Forests methodology with some additions. The total area of the sample plots is a small part of the entire study area. One of the objectives of the study was to determine the possibility of using the k-NN (nearest neighbor method) to assess the state of forests throughout the whole studied territory by joint statistical processing of data from ground sample plots and Sentinel-2B imagery. The data of the ground-based sample plots were divided into 2 equal parts, one for the application of the k-NN method, the second for checking the results of the method application. The systematic error in determining the mean damage class of the tree stands on sample plots by the k-NN method turned out to be zero, the random error is equal to one point. These results offer a possibility to determine the state of the forest in the entire study area. The second objective of the study was to examine the possibility of using the short-wave vegetation index (SWVI) to assess the state of forests. As a result, a close statistically reliable dependence of the average score of the state of plantations and the value of the SWVI index was established, which makes it possible to use the established relationship to determine the state of forests throughout the studied territory. The joint use and statistical processing of remotely sensed data and ground-based test areas by the two studied methods make it possible to assess the state of forests throughout the large studied area within the image. The results obtained can be used to monitor the state of forests in large areas and design appropriate forestry protective measures.
文摘Srivastava and Jhajj [ 1 6] proposed a class of estimators for estimating population variance using multi auxiliary variables in simple random sampling and they utilized the means and variances of auxiliary variables. In this paper, we adapted this class and motivated by Searle [13], and we suggested more generalized class of estimators for estimating the population variance in simple random sampling. The expressions for the mean square error of proposed class have been derived in general form. Besides obtaining the minimized MSE of the proposed and adapted class, it is shown that the adapted classis the special case of the proposed class. Moreover, these theoretical findings are supported by an empirical study of original data.
基金supported by the National Key R&D Program of China(2020YFC2003102).
文摘Objective:To reveal the distribution characteristics and demographic factors of traditional Chinese medicine(TCM)constitution among elderly individuals in China.Methods: Elderly individuals from seven regions in China were selected as samples in this study using a multistage cluster random sampling method.The basic information questionnaire and Constitution in Chinese Medicine Questionnaire(Elderly Edition)were used.Descriptive statistical analysis,chi-squared tests,and binary logistic regression analysis were used.Results: The single balanced constitution(BC)accounted for 23.9%.The results of the major TCM constitution types showed that BC(43.2%)accounted for the largest proportion and unbalanced constitutions ranged from 0.9%to 15.7%.East China region(odds ratio[OR]=2.097;95%confidence interval[CI],1.912 to 2.301),married status(OR=1.341;95%CI,1.235 to 1.457),and managers(OR=1.254;95%CI,1.044 to 1.505)were significantly associated with BC.Age>70 years was associated with qi-deficiency constitution and blood stasis constitution(BSC).Female sex was significantly associated with yang-deficiency constitution(OR=1.646;95%CI,1.52 to 1.782).Southwest region was significantly associated with phlegm-dampness constitution(OR=1.809;95%CI,1.569 to 2.086).North China region was significantly associated with inherited special constitution(OR=2.521;95%CI,1.569 to 4.05).South China region(OR=2.741;95%CI,1.997 to 1.3.763),Central China region(OR=8.889;95%CI,6.676 to 11.835),senior middle school education(OR=2.442;95%CI,1.932 to 3.088),and managers(OR=1.804;95%CI,1.21 to 2.69)were significantly associated with BSC.Conclusions: This study defined the distribution characteristics and demographic factors of TCM constitution in the elderly population.Adjusting and improving unbalanced constitutions,which are correlated with diseases,can help promote healthy aging through the scientific management of these demographic factors.
文摘Machine learning(ML)algorithms are frequently used in landslide susceptibility modeling.Different data handling strategies may generate variations in landslide susceptibility modeling,even when using the same ML algorithm.This research aims to compare the combinations of inventory data handling,cross validation(CV),and hyperparameter tuning strategies to generate landslide susceptibility maps.The results are expected to provide a general strategy for landslide susceptibility modeling using ML techniques.The authors employed eight landslide inventory data handling scenarios to convert a landslide polygon into a landslide point,i.e.,the landslide point is located on the toe(minimum height),on the scarp(maximum height),at the center of the landslide,randomly inside the polygon(1 point),randomly inside the polygon(3 points),randomly inside the polygon(5 points),randomly inside the polygon(10 points),and 15 m grid sampling.Random forest models using CV-nonspatial hyperparameter tuning,spatial CV-spatial hyperparameter tuning,and spatial CV-forward feature selection-no hyperparameter tuning were applied for each data handling strategy.The combination generated 24 random forest ML workflows,which are applied using a complete inventory of 743 landslides triggered by Tropical Cyclone Cempaka(2017)in Pacitan Regency,Indonesia,and 11 landslide controlling factors.The results show that grid sampling with spatial CV and spatial hyperparameter tuning is favorable because the strategy can minimize overfitting,generate a relatively high-performance predictive model,and reduce the appearance of susceptibility artifacts in the landslide area.Careful data inventory handling,CV,and hyperparameter tuning strategies should be considered in landslide susceptibility modeling to increase the applicability of landslide susceptibility maps in practical application.