This article addresses the issue of computing the constant required to implement a specific nonparametric subset selection procedure based on ranks of data arising in a statistical randomized block experimental design...This article addresses the issue of computing the constant required to implement a specific nonparametric subset selection procedure based on ranks of data arising in a statistical randomized block experimental design. A model of three populations and two blocks is used to compute the probability distribution of the relevant statistic, the maximum of the population rank sums minus the rank sum of the “best” population. Calculations are done for populations following a normal distribution, and for populations following a bi-uniform distribution. The least favorable configuration in these cases is shown to arise when all three populations follow identical distributions. The bi-uniform distribution leads to an asymptotic counterexample to the conjecture that the least favorable configuration, i.e., that configuration minimizing the probability of a correct selection, occurs when all populations are identically distributed. These results are consistent with other large-scale simulation studies. All relevant computational R-codes are provided in appendices.展开更多
This article compares the size of selected subsets using nonparametric subset selection rules with two different scoring rules for the observations. The scoring rules are based on the expected values of order statisti...This article compares the size of selected subsets using nonparametric subset selection rules with two different scoring rules for the observations. The scoring rules are based on the expected values of order statistics of the uniform distribution (yielding rank values) and of the normal distribution (yielding normal score values). The comparison is made using state motor vehicle traffic fatality rates, published in a 2016 article, with fifty-one states (including DC as a state) and over a nineteen-year period (1994 through 2012). The earlier study considered four block design selection rules—two for choosing a subset to contain the “best” population (i.e., state with lowest mean fatality rate) and two for the “worst” population (i.e., highest mean rate) with a probability of correct selection chosen to be 0.90. Two selection rules based on normal scores resulted in selected subset sizes substantially smaller than corresponding rules based on ranks (7 vs. 16 and 3 vs. 12). For two other selection rules, the subsets chosen were very close in size (within one). A comparison is also made using state homicide rates, published in a 2022 article, with fifty states and covering eight years. The results are qualitatively the same as those obtained with the motor vehicle traffic fatality rates.展开更多
Software module clustering problem is an important and challenging problem in software reverse engineering whose main goal is to obtain a good modular structure of the software system. The large complex software syste...Software module clustering problem is an important and challenging problem in software reverse engineering whose main goal is to obtain a good modular structure of the software system. The large complex software system can be divided into some subsystems that are easy to understand and maintain through the software module clustering. Aiming at solving the problem of slow convergence speed, the poor clustering result, and the complex algorithm, a software module clustering algorithm using probability selection is proposed. Firstly, we convert the software system into complex network diagram, and then we use the operation of merger, adjustment and optimization to get the software module clustering scheme. To evaluate the effectiveness of the algorithm, a set of experiments was performed on 5 real-world module clustering problems. The comparison of the experimental results proves the simplicity of the algorithm as well as the low time complexity and fast convergence speed. This algorithm provides a simple and effective engineering method for software module clustering problem.展开更多
Nonparametric and parametric subset selection procedures are used in the analysis of state homicide rates (SHRs), for the year 2005 and years 2014-2020, to identify subsets of states that contain the “best” (lowest ...Nonparametric and parametric subset selection procedures are used in the analysis of state homicide rates (SHRs), for the year 2005 and years 2014-2020, to identify subsets of states that contain the “best” (lowest SHR) and “worst” (highest SHR) rates with a prescribed probability. A new Bayesian model is developed and applied to the SHR data and the results are contrasted with those obtained with the subset selection procedures. All analyses are applied within the context of a two-way block design.展开更多
Path planning is important in the research of a mobile robot (MR). Methods for it have been used in different applications. An automated guided vehicle( AGV) , which is a kind of MR, is used in a flexible manufact...Path planning is important in the research of a mobile robot (MR). Methods for it have been used in different applications. An automated guided vehicle( AGV) , which is a kind of MR, is used in a flexible manufacturing system (FMS). Path planning for it is essential to improve the efficiency of FMS. A new method was proposed with known obstacle space FMS in this paper. FMS is described by the Augmented Pos Matrix of a Machine ( APMM ) and Relative Pos Matrix of Machines ( RPMM), which is smaller. The optimum path can be obtained according to the probability of the path and the maximal probability path. The suggested algorithm of path planning was good performance through simulation result: simplicity, saving time and reliability.展开更多
Coronavirus disease 2019(COVID-19)has been termed a“Pandemic Disease”that has infected many people and caused many deaths on a nearly unprecedented level.As more people are infected each day,it continues to pose a s...Coronavirus disease 2019(COVID-19)has been termed a“Pandemic Disease”that has infected many people and caused many deaths on a nearly unprecedented level.As more people are infected each day,it continues to pose a serious threat to humanity worldwide.As a result,healthcare systems around the world are facing a shortage of medical space such as wards and sickbeds.In most cases,healthy people experience tolerable symptoms if they are infected.However,in other cases,patients may suffer severe symptoms and require treatment in an intensive care unit.Thus,hospitals should select patients who have a high risk of death and treat them first.To solve this problem,a number of models have been developed for mortality prediction.However,they lack interpretability and generalization.To prepare a model that addresses these issues,we proposed a COVID-19 mortality prediction model that could provide new insights.We identified blood factors that could affect the prediction of COVID-19 mortality.In particular,we focused on dependency reduction using partial correlation and mutual information.Next,we used the Class-Attribute Interdependency Maximization(CAIM)algorithm to bin continuous values.Then,we used Jensen Shannon Divergence(JSD)and Bayesian posterior probability to create less redundant and more accurate rules.We provided a ruleset with its own posterior probability as a result.The extracted rules are in the form of“if antecedent then results,posterior probability(θ)”.If the sample matches the extracted rules,then the result is positive.The average AUC Score was 96.77%for the validation dataset and the F1-score was 92.8%for the test data.Compared to the results of previous studies,it shows good performance in terms of classification performance,generalization,and interpretability.展开更多
Based on the characteristics of the air alliance environment saving transport mileage,the hub location problem of the air cargo network was studied.First,the air alliance selection probability model was introduced to ...Based on the characteristics of the air alliance environment saving transport mileage,the hub location problem of the air cargo network was studied.First,the air alliance selection probability model was introduced to determine the alliance self-operation or outsourcing probability in different segments.Then,according to the location center rule,with the goal of minimizing the total cost,the hub location model was built.The improved immune chaos genetic algorithm was used to solve this model.The results show that the improved algorithm has stronger convergence and better effect than the immune genetic algorithm.When the number of hubs increases,the fixed cost increases,but the transportation cost decreases.The greater the discount factor,the fixed cost,and the self operating cost sharing coefficient,the higher the total network cost.The airline which joins the air alliance can greatly reduce the operating cost of airlines.Therefore,airlines should consider joining the alliance.展开更多
Data discretization contributes much to the induction of classification rules or trees by machine learning methods.The rough set theory is a valid tool for discretizing continuous information systems.Herein,a new meth...Data discretization contributes much to the induction of classification rules or trees by machine learning methods.The rough set theory is a valid tool for discretizing continuous information systems.Herein,a new method is proposed to improve those typical rough set based heuristic algorithms for data discretization,by utilizing decision information to reduce the scales of candidate cuts,and by more reasonably measuring cut significance with a new conception of cut selection probability.Simulations demonstrate that compared with other typical discretization algorithms based on the rough set theory,the proposed method is more capable and valid to discretize continuous information systems.It can effectively improve the predictive accuracies of information systems while still conceptually keeping their consistency.展开更多
文摘This article addresses the issue of computing the constant required to implement a specific nonparametric subset selection procedure based on ranks of data arising in a statistical randomized block experimental design. A model of three populations and two blocks is used to compute the probability distribution of the relevant statistic, the maximum of the population rank sums minus the rank sum of the “best” population. Calculations are done for populations following a normal distribution, and for populations following a bi-uniform distribution. The least favorable configuration in these cases is shown to arise when all three populations follow identical distributions. The bi-uniform distribution leads to an asymptotic counterexample to the conjecture that the least favorable configuration, i.e., that configuration minimizing the probability of a correct selection, occurs when all populations are identically distributed. These results are consistent with other large-scale simulation studies. All relevant computational R-codes are provided in appendices.
文摘This article compares the size of selected subsets using nonparametric subset selection rules with two different scoring rules for the observations. The scoring rules are based on the expected values of order statistics of the uniform distribution (yielding rank values) and of the normal distribution (yielding normal score values). The comparison is made using state motor vehicle traffic fatality rates, published in a 2016 article, with fifty-one states (including DC as a state) and over a nineteen-year period (1994 through 2012). The earlier study considered four block design selection rules—two for choosing a subset to contain the “best” population (i.e., state with lowest mean fatality rate) and two for the “worst” population (i.e., highest mean rate) with a probability of correct selection chosen to be 0.90. Two selection rules based on normal scores resulted in selected subset sizes substantially smaller than corresponding rules based on ranks (7 vs. 16 and 3 vs. 12). For two other selection rules, the subsets chosen were very close in size (within one). A comparison is also made using state homicide rates, published in a 2022 article, with fifty states and covering eight years. The results are qualitatively the same as those obtained with the motor vehicle traffic fatality rates.
基金Supported by the Science Foundation of Education Ministry of Shaanxi Province(15JK1672)the Industrial Research Project of Shaanxi Province(2016GY-089)the Innovation Fund of Xi’an University of Posts and Telecommunications(103-602080012)
文摘Software module clustering problem is an important and challenging problem in software reverse engineering whose main goal is to obtain a good modular structure of the software system. The large complex software system can be divided into some subsystems that are easy to understand and maintain through the software module clustering. Aiming at solving the problem of slow convergence speed, the poor clustering result, and the complex algorithm, a software module clustering algorithm using probability selection is proposed. Firstly, we convert the software system into complex network diagram, and then we use the operation of merger, adjustment and optimization to get the software module clustering scheme. To evaluate the effectiveness of the algorithm, a set of experiments was performed on 5 real-world module clustering problems. The comparison of the experimental results proves the simplicity of the algorithm as well as the low time complexity and fast convergence speed. This algorithm provides a simple and effective engineering method for software module clustering problem.
文摘Nonparametric and parametric subset selection procedures are used in the analysis of state homicide rates (SHRs), for the year 2005 and years 2014-2020, to identify subsets of states that contain the “best” (lowest SHR) and “worst” (highest SHR) rates with a prescribed probability. A new Bayesian model is developed and applied to the SHR data and the results are contrasted with those obtained with the subset selection procedures. All analyses are applied within the context of a two-way block design.
文摘Path planning is important in the research of a mobile robot (MR). Methods for it have been used in different applications. An automated guided vehicle( AGV) , which is a kind of MR, is used in a flexible manufacturing system (FMS). Path planning for it is essential to improve the efficiency of FMS. A new method was proposed with known obstacle space FMS in this paper. FMS is described by the Augmented Pos Matrix of a Machine ( APMM ) and Relative Pos Matrix of Machines ( RPMM), which is smaller. The optimum path can be obtained according to the probability of the path and the maximal probability path. The suggested algorithm of path planning was good performance through simulation result: simplicity, saving time and reliability.
基金This research was supported by the MSIT(Ministry of Science and ICT),Korea,under the ITRC(Information Technology Research Center)support program(IITP-2021–2020–0–01602)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation).
文摘Coronavirus disease 2019(COVID-19)has been termed a“Pandemic Disease”that has infected many people and caused many deaths on a nearly unprecedented level.As more people are infected each day,it continues to pose a serious threat to humanity worldwide.As a result,healthcare systems around the world are facing a shortage of medical space such as wards and sickbeds.In most cases,healthy people experience tolerable symptoms if they are infected.However,in other cases,patients may suffer severe symptoms and require treatment in an intensive care unit.Thus,hospitals should select patients who have a high risk of death and treat them first.To solve this problem,a number of models have been developed for mortality prediction.However,they lack interpretability and generalization.To prepare a model that addresses these issues,we proposed a COVID-19 mortality prediction model that could provide new insights.We identified blood factors that could affect the prediction of COVID-19 mortality.In particular,we focused on dependency reduction using partial correlation and mutual information.Next,we used the Class-Attribute Interdependency Maximization(CAIM)algorithm to bin continuous values.Then,we used Jensen Shannon Divergence(JSD)and Bayesian posterior probability to create less redundant and more accurate rules.We provided a ruleset with its own posterior probability as a result.The extracted rules are in the form of“if antecedent then results,posterior probability(θ)”.If the sample matches the extracted rules,then the result is positive.The average AUC Score was 96.77%for the validation dataset and the F1-score was 92.8%for the test data.Compared to the results of previous studies,it shows good performance in terms of classification performance,generalization,and interpretability.
基金The National Social Science Foundation of China(No.17XGL013)。
文摘Based on the characteristics of the air alliance environment saving transport mileage,the hub location problem of the air cargo network was studied.First,the air alliance selection probability model was introduced to determine the alliance self-operation or outsourcing probability in different segments.Then,according to the location center rule,with the goal of minimizing the total cost,the hub location model was built.The improved immune chaos genetic algorithm was used to solve this model.The results show that the improved algorithm has stronger convergence and better effect than the immune genetic algorithm.When the number of hubs increases,the fixed cost increases,but the transportation cost decreases.The greater the discount factor,the fixed cost,and the self operating cost sharing coefficient,the higher the total network cost.The airline which joins the air alliance can greatly reduce the operating cost of airlines.Therefore,airlines should consider joining the alliance.
基金supported by the Program for New Century Excellent Talents in University, National Natural Science Foundation of China (60573068, 60773113)Natural Science Foundation of Chongqing of China (2005BA2003, 2008BA2017)+1 种基金Starting Research Foundation of Ministry of Education for Chinese Overseas Returnees ([2007]1108)Scientific Research Foundation of Chongqing University of Posts and Telecommunications (A2006-05)
文摘Data discretization contributes much to the induction of classification rules or trees by machine learning methods.The rough set theory is a valid tool for discretizing continuous information systems.Herein,a new method is proposed to improve those typical rough set based heuristic algorithms for data discretization,by utilizing decision information to reduce the scales of candidate cuts,and by more reasonably measuring cut significance with a new conception of cut selection probability.Simulations demonstrate that compared with other typical discretization algorithms based on the rough set theory,the proposed method is more capable and valid to discretize continuous information systems.It can effectively improve the predictive accuracies of information systems while still conceptually keeping their consistency.