This article compares the size of selected subsets using nonparametric subset selection rules with two different scoring rules for the observations. The scoring rules are based on the expected values of order statisti...This article compares the size of selected subsets using nonparametric subset selection rules with two different scoring rules for the observations. The scoring rules are based on the expected values of order statistics of the uniform distribution (yielding rank values) and of the normal distribution (yielding normal score values). The comparison is made using state motor vehicle traffic fatality rates, published in a 2016 article, with fifty-one states (including DC as a state) and over a nineteen-year period (1994 through 2012). The earlier study considered four block design selection rules—two for choosing a subset to contain the “best” population (i.e., state with lowest mean fatality rate) and two for the “worst” population (i.e., highest mean rate) with a probability of correct selection chosen to be 0.90. Two selection rules based on normal scores resulted in selected subset sizes substantially smaller than corresponding rules based on ranks (7 vs. 16 and 3 vs. 12). For two other selection rules, the subsets chosen were very close in size (within one). A comparison is also made using state homicide rates, published in a 2022 article, with fifty states and covering eight years. The results are qualitatively the same as those obtained with the motor vehicle traffic fatality rates.展开更多
This article addresses the issue of computing the constant required to implement a specific nonparametric subset selection procedure based on ranks of data arising in a statistical randomized block experimental design...This article addresses the issue of computing the constant required to implement a specific nonparametric subset selection procedure based on ranks of data arising in a statistical randomized block experimental design. A model of three populations and two blocks is used to compute the probability distribution of the relevant statistic, the maximum of the population rank sums minus the rank sum of the “best” population. Calculations are done for populations following a normal distribution, and for populations following a bi-uniform distribution. The least favorable configuration in these cases is shown to arise when all three populations follow identical distributions. The bi-uniform distribution leads to an asymptotic counterexample to the conjecture that the least favorable configuration, i.e., that configuration minimizing the probability of a correct selection, occurs when all populations are identically distributed. These results are consistent with other large-scale simulation studies. All relevant computational R-codes are provided in appendices.展开更多
The topic of this article is one-sided hypothesis testing for disparity, i.e., the mean of one group is larger than that of another when there is uncertainty as to which group a datum is drawn. For each datum, the unc...The topic of this article is one-sided hypothesis testing for disparity, i.e., the mean of one group is larger than that of another when there is uncertainty as to which group a datum is drawn. For each datum, the uncertainty is captured with a given discrete probability distribution over the groups. Such situations arise, for example, in the use of Bayesian imputation methods to assess race and ethnicity disparities with certain insurance, health, and financial data. A widely used method to implement this assessment is the Bayesian Improved Surname Geocoding (BISG) method which assigns a discrete probability over six race/ethnicity groups to an individual given the individual’s surname and address location. Using a Bayesian framework and Markov Chain Monte Carlo sampling from the joint posterior distribution of the group means, the probability of a disparity hypothesis is estimated. Four methods are developed and compared with an illustrative data set. Three of these methods are implemented in an R-code and one method in WinBUGS. These methods are programed for any number of groups between two and six inclusive. All the codes are provided in the appendices.展开更多
Bayesian inference method has been presented in this paper for the modeling of operational risk. Bank internal and external data are divided into defined loss cells and then fitted into probability distributions. The ...Bayesian inference method has been presented in this paper for the modeling of operational risk. Bank internal and external data are divided into defined loss cells and then fitted into probability distributions. The distribution parameters and their uncertainties are estimated from posterior distributions derived using the Bayesian inference. Loss frequency is fitted into Poisson distributions. While the Poisson parameters, in a similar way, are defined by a posterior distribution developed using Bayesian inference. Bank operation loss typically has some low frequency but high magnitude loss data. These heavy tail low frequency loss data are divided into several buckets where the bucket frequencies are defined by the experts. A probability distribution, as defined by the internal and external data, is used for these data. A Poisson distribution is used for the bucket frequencies. However instead of using any distribution of the Poisson parameters, point estimations are used. Monte Carlo simulation is then carried out to calculate the capital charge of the in- ternal as well as the heavy tail high profile low frequency losses. The output of the Monte Carlo simulation defines the capital requirement that has to be allocated to cover potential operational risk losses for the next year.展开更多
Nonparametric and parametric subset selection procedures are used in the analysis of state homicide rates (SHRs), for the year 2005 and years 2014-2020, to identify subsets of states that contain the “best” (lowest ...Nonparametric and parametric subset selection procedures are used in the analysis of state homicide rates (SHRs), for the year 2005 and years 2014-2020, to identify subsets of states that contain the “best” (lowest SHR) and “worst” (highest SHR) rates with a prescribed probability. A new Bayesian model is developed and applied to the SHR data and the results are contrasted with those obtained with the subset selection procedures. All analyses are applied within the context of a two-way block design.展开更多
文摘This article compares the size of selected subsets using nonparametric subset selection rules with two different scoring rules for the observations. The scoring rules are based on the expected values of order statistics of the uniform distribution (yielding rank values) and of the normal distribution (yielding normal score values). The comparison is made using state motor vehicle traffic fatality rates, published in a 2016 article, with fifty-one states (including DC as a state) and over a nineteen-year period (1994 through 2012). The earlier study considered four block design selection rules—two for choosing a subset to contain the “best” population (i.e., state with lowest mean fatality rate) and two for the “worst” population (i.e., highest mean rate) with a probability of correct selection chosen to be 0.90. Two selection rules based on normal scores resulted in selected subset sizes substantially smaller than corresponding rules based on ranks (7 vs. 16 and 3 vs. 12). For two other selection rules, the subsets chosen were very close in size (within one). A comparison is also made using state homicide rates, published in a 2022 article, with fifty states and covering eight years. The results are qualitatively the same as those obtained with the motor vehicle traffic fatality rates.
文摘This article addresses the issue of computing the constant required to implement a specific nonparametric subset selection procedure based on ranks of data arising in a statistical randomized block experimental design. A model of three populations and two blocks is used to compute the probability distribution of the relevant statistic, the maximum of the population rank sums minus the rank sum of the “best” population. Calculations are done for populations following a normal distribution, and for populations following a bi-uniform distribution. The least favorable configuration in these cases is shown to arise when all three populations follow identical distributions. The bi-uniform distribution leads to an asymptotic counterexample to the conjecture that the least favorable configuration, i.e., that configuration minimizing the probability of a correct selection, occurs when all populations are identically distributed. These results are consistent with other large-scale simulation studies. All relevant computational R-codes are provided in appendices.
文摘The topic of this article is one-sided hypothesis testing for disparity, i.e., the mean of one group is larger than that of another when there is uncertainty as to which group a datum is drawn. For each datum, the uncertainty is captured with a given discrete probability distribution over the groups. Such situations arise, for example, in the use of Bayesian imputation methods to assess race and ethnicity disparities with certain insurance, health, and financial data. A widely used method to implement this assessment is the Bayesian Improved Surname Geocoding (BISG) method which assigns a discrete probability over six race/ethnicity groups to an individual given the individual’s surname and address location. Using a Bayesian framework and Markov Chain Monte Carlo sampling from the joint posterior distribution of the group means, the probability of a disparity hypothesis is estimated. Four methods are developed and compared with an illustrative data set. Three of these methods are implemented in an R-code and one method in WinBUGS. These methods are programed for any number of groups between two and six inclusive. All the codes are provided in the appendices.
文摘Bayesian inference method has been presented in this paper for the modeling of operational risk. Bank internal and external data are divided into defined loss cells and then fitted into probability distributions. The distribution parameters and their uncertainties are estimated from posterior distributions derived using the Bayesian inference. Loss frequency is fitted into Poisson distributions. While the Poisson parameters, in a similar way, are defined by a posterior distribution developed using Bayesian inference. Bank operation loss typically has some low frequency but high magnitude loss data. These heavy tail low frequency loss data are divided into several buckets where the bucket frequencies are defined by the experts. A probability distribution, as defined by the internal and external data, is used for these data. A Poisson distribution is used for the bucket frequencies. However instead of using any distribution of the Poisson parameters, point estimations are used. Monte Carlo simulation is then carried out to calculate the capital charge of the in- ternal as well as the heavy tail high profile low frequency losses. The output of the Monte Carlo simulation defines the capital requirement that has to be allocated to cover potential operational risk losses for the next year.
文摘Nonparametric and parametric subset selection procedures are used in the analysis of state homicide rates (SHRs), for the year 2005 and years 2014-2020, to identify subsets of states that contain the “best” (lowest SHR) and “worst” (highest SHR) rates with a prescribed probability. A new Bayesian model is developed and applied to the SHR data and the results are contrasted with those obtained with the subset selection procedures. All analyses are applied within the context of a two-way block design.