The longitudinal dispersion of the projectile in shooting tests of two-dimensional trajectory corrections fused with fixed canards is extremely large that it sometimes exceeds the correction ability of the correction ...The longitudinal dispersion of the projectile in shooting tests of two-dimensional trajectory corrections fused with fixed canards is extremely large that it sometimes exceeds the correction ability of the correction fuse actuator.The impact point easily deviates from the target,and thus the correction result cannot be readily evaluated.However,the cost of shooting tests is considerably high to conduct many tests for data collection.To address this issue,this study proposes an aiming method for shooting tests based on small sample size.The proposed method uses the Bootstrap method to expand the test data;repeatedly iterates and corrects the position of the simulated theoretical impact points through an improved compatibility test method;and dynamically adjusts the weight of the prior distribution of simulation results based on Kullback-Leibler divergence,which to some extent avoids the real data being"submerged"by the simulation data and achieves the fusion Bayesian estimation of the dispersion center.The experimental results show that when the simulation accuracy is sufficiently high,the proposed method yields a smaller mean-square deviation in estimating the dispersion center and higher shooting accuracy than those of the three comparison methods,which is more conducive to reflecting the effect of the control algorithm and facilitating test personnel to iterate their proposed structures and algorithms.;in addition,this study provides a knowledge base for further comprehensive studies in the future.展开更多
Sample size determination typically relies on a power analysis based on a frequentist conditional approach. This latter can be seen as a particular case of the two-priors approach, which allows to build four distinct ...Sample size determination typically relies on a power analysis based on a frequentist conditional approach. This latter can be seen as a particular case of the two-priors approach, which allows to build four distinct power functions to select the optimal sample size. We revise this approach when the focus is on testing a single binomial proportion. We consider exact methods and introduce a conservative criterion to account for the typical non-monotonic behavior of the power functions, when dealing with discrete data. The main purpose of this paper is to present a Shiny App providing a user-friendly, interactive tool to apply these criteria. The app also provides specific tools to elicit the analysis and the design prior distributions, which are the core of the two-priors approach.展开更多
Accurate prediction of the internal corrosion rates of oil and gas pipelines could be an effective way to prevent pipeline leaks.In this study,a proposed framework for predicting corrosion rates under a small sample o...Accurate prediction of the internal corrosion rates of oil and gas pipelines could be an effective way to prevent pipeline leaks.In this study,a proposed framework for predicting corrosion rates under a small sample of metal corrosion data in the laboratory was developed to provide a new perspective on how to solve the problem of pipeline corrosion under the condition of insufficient real samples.This approach employed the bagging algorithm to construct a strong learner by integrating several KNN learners.A total of 99 data were collected and split into training and test set with a 9:1 ratio.The training set was used to obtain the best hyperparameters by 10-fold cross-validation and grid search,and the test set was used to determine the performance of the model.The results showed that theMean Absolute Error(MAE)of this framework is 28.06%of the traditional model and outperforms other ensemblemethods.Therefore,the proposed framework is suitable formetal corrosion prediction under small sample conditions.展开更多
This study presents the design of a modified attributed control chart based on a double sampling(DS)np chart applied in combination with generalized multiple dependent state(GMDS)sampling to monitor the mean life of t...This study presents the design of a modified attributed control chart based on a double sampling(DS)np chart applied in combination with generalized multiple dependent state(GMDS)sampling to monitor the mean life of the product based on the time truncated life test employing theWeibull distribution.The control chart developed supports the examination of the mean lifespan variation for a particular product in the process of manufacturing.Three control limit levels are used:the warning control limit,inner control limit,and outer control limit.Together,they enhance the capability for variation detection.A genetic algorithm can be used for optimization during the in-control process,whereby the optimal parameters can be established for the proposed control chart.The control chart performance is assessed using the average run length,while the influence of the model parameters upon the control chart solution is assessed via sensitivity analysis based on an orthogonal experimental design withmultiple linear regression.A comparative study was conducted based on the out-of-control average run length,in which the developed control chart offered greater sensitivity in the detection of process shifts while making use of smaller samples on average than is the case for existing control charts.Finally,to exhibit the utility of the developed control chart,this paper presents its application using simulated data with parameters drawn from the real set of data.展开更多
Background: When continuous scale measurements are available, agreements between two measuring devices are assessed both graphically and analytically. In clinical investigations, Bland and Altman proposed plotting sub...Background: When continuous scale measurements are available, agreements between two measuring devices are assessed both graphically and analytically. In clinical investigations, Bland and Altman proposed plotting subject-wise differences between raters against subject-wise averages. In order to scientifically assess agreement, Bartko recommended combining the graphical approach with the statistical analytic procedure suggested by Bradley and Blackwood. The advantage of using this approach is that it enables significance testing and sample size estimation. We noted that the direct use of the results of the regression is misleading and we provide a correction in this regard. Methods: Graphical and linear models are used to assess agreements for continuous scale measurements. We demonstrate that software linear regression results should not be readily used and we provided correct analytic procedures. The degrees of freedom of the F-statistics are incorrectly reported, and we propose methods to overcome this problem by introducing the correct analytic form of the F statistic. Methods for sample size estimation using R-functions are also given. Results: We believe that the tutorial and the R-codes are useful tools for testing and estimating agreement between two rating protocols for continuous scale measurements. The interested reader may use the codes and apply them to their available data when the issue of agreement between two raters is the subject of interest.展开更多
Reliability assessment of the braking system in a high?speed train under small sample size and zero?failure data is veryimportant for safe operation. Traditional reliability assessment methods are only performed well ...Reliability assessment of the braking system in a high?speed train under small sample size and zero?failure data is veryimportant for safe operation. Traditional reliability assessment methods are only performed well under conditions of large sample size and complete failure data,which lead to large deviation under conditions of small sample size and zero?failure data. To improve this problem,a new Bayesian method is proposed. Based on the characteristics of the solenoid valve in the braking system of a high?speed train,the modified Weibull distribution is selected to describe the failure rate over the entire lifetime. Based on the assumption of a binomial distribution for the failure probability at censored time,a concave method is employed to obtain the relationships between accumulation failure prob?abilities. A numerical simulation is performed to compare the results of the proposed method with those obtained from maximum likelihood estimation,and to illustrate that the proposed Bayesian model exhibits a better accuracy for the expectation value when the sample size is less than 12. Finally,the robustness of the model is demonstrated by obtaining the reliability indicators for a numerical case involving the solenoid valve of the braking system,which shows that the change in the reliability and failure rate among the di erent hyperparameters is small. The method is provided to avoid misleading of subjective information and improve accuracy of reliability assessment under condi?tions of small sample size and zero?failure data.展开更多
Knowledge on spatial distribution and sampling size optimization of soil copper (Cu) could lay solid foundations for environmetal quality survey of agricultural soils at county scale. In this investigation, cokrigin...Knowledge on spatial distribution and sampling size optimization of soil copper (Cu) could lay solid foundations for environmetal quality survey of agricultural soils at county scale. In this investigation, cokriging method was used to conduct the interpolation of Cu concentraiton in cropland soil in Shuangliu County, Sichuan Province, China. Based on the original 623 physicochmically measured soil samples, 560, 498, and 432 sub-samples were randomly selected as target variable and soil organic matter (SOM) of the whole original samples as auxiliary variable. Interpolation results using Cokriging under different sampling numbers were evaluated for their applicability in estimating the spatial distribution of soil Cu at county sacle. The results showed that the root mean square error (RMSE) produced by Cokriging decreased from 0.9 to 7.77%, correlation coefficient between the predicted values and the measured increased from 1.76 to 9.76% in comparison with the ordinary Kriging under the corresponding sample sizes. The prediction accuracy using Cokriging was still higher than original 623 data using ordinary Kriging even as sample size reduced 10%, and their interpolation maps were highly in agreement. Therefore, Cokriging was proven to be a more accurate and economic method which could provide more information and benefit for the studies on spatial distribution of soil pollutants at county scale.展开更多
The development of a core collection could enhance the utilization of germplasm collections in crop improvement programs and simplify their management. Selection of an appropriate sampling strategy is an important pre...The development of a core collection could enhance the utilization of germplasm collections in crop improvement programs and simplify their management. Selection of an appropriate sampling strategy is an important prerequisite to construct a core collection with appropriate size in order to adequately represent the genetic spectrum and maximally capture the genetic diversity in available crop collections. The present study was initiated to construct nested core collections to determine the appropriate sample size to represent the genetic diversity of rice landrace collection based on 15 quantitative traits and 34 qualitative traits of 2 262 rice accessions. The results showed that 50-225 nested core collections, whose sampling rate was 2.2%-9.9%, were sufficient to maintain the maximum genetic diversity of the initial collections. Of these, 150 accessions (6.6%) could capture the maximal genetic diversity of the initial collection. Three data types, i.e. qualitative traits (QT1), quantitative traits (QT2) and integrated qualitative and quantitative traits (QTT), were compared for their efficiency in constructing core collections based on the weighted pair-group average method combined with stepwise clustering and preferred sampling on adjusted Euclidean distances. Every combining scheme constructed eight rice core collections (225, 200, 175, 150, 125, 100, 75 and 50). The results showed that the QTT data was the best in constructing a core collection as indicated by the genetic diversity of core collections. A core collection constructed only on the information of QT1 could not represent the initial collection effectively. QTT should be used together to construct a productive core collection.展开更多
In order to investigate the effect of sample size on the dynamic torsional behaviour of the 2A12 aluminium alloy. In this paper, torsional split Hopkinson bar tests are conducted on this alloy with different sample di...In order to investigate the effect of sample size on the dynamic torsional behaviour of the 2A12 aluminium alloy. In this paper, torsional split Hopkinson bar tests are conducted on this alloy with different sample dimensions. It is found that with the decreasing gauge length and thickness, the tested yield strength increases. However, the sample innerlouter diameter has little effect on the dynamic torsional behaviour. Based on the finite element method, the stress states in the alloy with different sample sizes are analysed. Due to the effect of stress concentration zone (SCZ), the shorter sample has a higher yield stress. Furthermore, the stress distributes more uniformly in the thinner sample, which leads to the higher tested yield stress. According to the experimental and simulation analysis, some suggestions on choosing the sample size are given as well.展开更多
This study used Ecopath model of the Jiaozhou Bay as an example to evaluate the effect of stomach sample size of three fish species on the projection of this model. The derived ecosystem indices were classified into t...This study used Ecopath model of the Jiaozhou Bay as an example to evaluate the effect of stomach sample size of three fish species on the projection of this model. The derived ecosystem indices were classified into three categories:(1) direct indices, like the trophic level of species, influenced by stomach sample size directly;(2)indirect indices, like ecology efficiency(EE) of invertebrates, influenced by the multiple prey-predator relationships;and(3) systemic indices, like total system throughout(TST), describing the status of the whole ecosystem. The influences of different stomach sample sizes on these indices were evaluated. The results suggest that systemic indices of the ecosystem model were robust to stomach sample sizes, whereas specific indices related to species were indicated to be with low accuracy and precision when stomach samples were insufficient.The indices became more uncertain when the stomach sample sizes varied for more species. This study enhances the understanding of how the quality of diet composition data influences ecosystem modeling outputs. The results can also guide the design of stomach content analysis for developing ecosystem models.展开更多
Sample size re-estimation is essential in oncology studies. However, the use of blinded sample size reassessment for survival data has been rarely reported. Based on the density function of the exponential distributio...Sample size re-estimation is essential in oncology studies. However, the use of blinded sample size reassessment for survival data has been rarely reported. Based on the density function of the exponential distribution, an expectation-maximization(EM) algorithm of the hazard ratio was derived, and several simulation studies were used to verify its applications. The method had obvious variation in the hazard ratio estimates and overestimation for the relatively small hazard ratios. Our studies showed that the stability of the EM estimation results directly correlated with the sample size, the convergence of the EM algorithm was impacted by the initial values, and a balanced design produced the best estimates. No reliable blinded sample size re-estimation inference can be made in our studies, but the results provide useful information to steer the practitioners in this field from repeating the same endeavor.展开更多
Subjective evaluations are nowadays applied more commonly in cosmetic product assessment. They are used in quality control, product development steps and efficacy studies for claim support. Several studies have been p...Subjective evaluations are nowadays applied more commonly in cosmetic product assessment. They are used in quality control, product development steps and efficacy studies for claim support. Several studies have been published to determine the adequate number of panelists, but recommendations and guidelines dealing with this topic are rare in the cosmetic sector. The aim of the present pilot study was to recommend a suitable study plan and define the adequate consumer panel size for cosmetic consumer assessment. A questionnaire-based product evaluation study, with three different cosmetic products, was organized as a consumer test using a seven-point scale. As a last step, a specific statistical calculation was performed to define the minimum sample size. It showed that the minimum sample size, besides the obvious statistical parameters of standard deviation and confidence interval, also depends on age and gender of the panelists and product assessment item. Utilizing a CI of 95% a minimum of 60 panelists seems to be sufficient for home-use-test (HUT) with a given seven-point scale. A minimum of 101 panelists are shown to be sufficient utilizing a CI of 99%.展开更多
The precise and accurate knowledge of genetic parameters is a prerequisite for making efficient selection strategies in breeding programs.A number of estimators of heritability about important economic traits in many ...The precise and accurate knowledge of genetic parameters is a prerequisite for making efficient selection strategies in breeding programs.A number of estimators of heritability about important economic traits in many marine mollusks are available in the literature,however very few research have evaluated about the accuracy of genetic parameters estimated with different family structures.Thus,in the present study,the effect of parent sample size for estimating the precision of genetic parameters of four growth traits in clam M.meretrix by factorial designs were analyzed through restricted maximum likelihood(REML) and Bayesian.The results showed that the average estimated heritabilities of growth traits obtained from REML were 0.23-0.32 for 9 and 16 full-sib families and 0.19-0.22 for 25 full-sib families.When using Bayesian inference,the average estimated heritabilities were0.11-0.12 for 9 and 16 full-sib families and 0.13-0.16 for 25 full-sib families.Compared with REML,Bayesian got lower heritabilities,but still remained at a medium level.When the number of parents increased from 6 to 10,the estimated heritabilities were more closed to 0.20 in REML and 0.12 in Bayesian inference.Genetic correlations among traits were positive and high and had no significant difference between different sizes of designs.The accuracies of estimated breeding values from the 9 and 16 families were less precise than those from 25 families.Our results provide a basic genetic evaluation for growth traits and should be useful for the design and operation of a practical selective breeding program in the clam M.meretrix.展开更多
BACKGROUND Of 25%of randomised controlled trials(RCTs)on interventions for inflammatory bowel disease(IBD)have no power calculation.AIM To systematically review RCTs reporting interventions for the management of IBD a...BACKGROUND Of 25%of randomised controlled trials(RCTs)on interventions for inflammatory bowel disease(IBD)have no power calculation.AIM To systematically review RCTs reporting interventions for the management of IBD and to produce data for minimum sample sizes that would achieve appropriate power using the actual clinical data.METHODS We included RCTs retrieved from Cochrane IBD specialised Trial register and CENTRAL investigating any form of therapy for either induction or maintenance of remission against control,placebo,or no intervention of IBD in patients of any age.The relevant data was extracted,and the studies were grouped according to the intervention used.We recalculated sample size and the achieved difference,as well as minimum sample sizes needed in the future.RESULTS A total of 105 trials were included.There was a large discrepancy between the estimated figure for the minimal clinically important difference used for the calculations(15%group differences observed vs 30%used for calculation)explaining substantial actual sample size deficits.The minimum sample sizes indicated for future trials based on the 25 years of trial data were calculated and grouped by the intervention.CONCLUSION A third of intervention studies in IBD within the last 25 years are underpowered,with large variations in the calculation of sample sizes.The authors present a sample size estimate resource constructed on the published evidence base for future researchers and key stakeholders within the IBD trial field.展开更多
To clarify the most appropriate sample size for obtaining phenotypic data for a single line,we investigated the main-effect QTL(M-QTL) of a quantitative trait plant height(ph) in a recombinant inbred line(RIL) populat...To clarify the most appropriate sample size for obtaining phenotypic data for a single line,we investigated the main-effect QTL(M-QTL) of a quantitative trait plant height(ph) in a recombinant inbred line(RIL) population of rice(derived from the cross between Xieqingzao B and Zhonghui 9308) using five individual plants in 2006 and 2009.Twenty-six ph phenotypic datasets from the completely random combinations of 2,3,4,and 5 plants in a single line,and five ph phenotypic datasets from five individual plants were used to detect the QTLs.Fifteen M-QTLs were detected by 1 to 31 datasets.Of these,qph7a was detected repeatedly by all the 31 ph datasets in 2006 and explained 11.67% to 23.93% of phenotypic variation;qph3 was detected repeatedly by all the 31 datasets and explained 5.21% to 7.93% and 11.51% to 24.46% of phenotypic variance in 2006 and 2009,respectively.The results indicate that the M-QTL for a quantitative trait could be detected repeatedly by the phenotypic values from 5 individual plants and 26 sets of completely random combinations of phenotypic data within a single line in an RIL population under different environments.The sample size for a single line of the RIL population did not affect the efficiency for identification of stably expressed M-QTLs.展开更多
After finishing 102 replicate constant amplitude crack initiation and growth tests on Ly12-CZ aluminum alloy plate, a statistical investigation of the fatigue crack initiation and growth process is conducted in this p...After finishing 102 replicate constant amplitude crack initiation and growth tests on Ly12-CZ aluminum alloy plate, a statistical investigation of the fatigue crack initiation and growth process is conducted in this paper. According to the post-mortem fractographic examination by scanning electron microscopy (SEM), some qualitative observations of the spacial correlation among fatigue striations are developed to reveal the statistical nature of material intrinsic inhomogeneity during the crack growth process. From the test data, an engineering division between crack initiation and growth is defined as the upper limit of small crack. The distributions of crack initiation life N-i, growth life N, and the statistical characteristics of crack growth rate da/dN are also investigated. It is hoped that the work will provide a solid test basis for the study of probabilistic fatigue, probabilistic fracture mechanics, fatigue reliability and its engineering applications.展开更多
Sample size can be a key design feature that not only affects the probability of a trial's success but also determines the duration and feasibility of a trial. If an investigational drug is expected to be effective a...Sample size can be a key design feature that not only affects the probability of a trial's success but also determines the duration and feasibility of a trial. If an investigational drug is expected to be effective and address unmet medical needs of an orphan disease, where the accrual period may require many years with a large sample size to detect a minimal clinically relevant treatment effect, a minimum sample size may be set to maintain nominal power. In limited situations such as this, there may be a need for flexibility in the initial and final sample sizes; thus, it is useful to consider the utility of adaptive sample size designs that use sample size re-estimation or group sequential design. In this paper, we propose a new adaptive performance measure to consider the utility of an adaptive sample size design in a trial simulation. Considering that previously proposed sample size re-estimation methods do not take into account errors in estimation based on interim results, we propose Bayesian sample size re-estimation criteria that take into account prior information on treatment effect, and then, we assess its operating characteristics in a simulation study. We also present a review example of sample size re-estimation mainly based on published paper and review report in Pharmaceuticals and Medical Devices Agency (PMDA).展开更多
The objective of this study is to analyze the sensitivity of the statistical models regarding the size of samples. The study carried out in Ivory Coast is based on annual maximum daily rainfall data collected from 26 ...The objective of this study is to analyze the sensitivity of the statistical models regarding the size of samples. The study carried out in Ivory Coast is based on annual maximum daily rainfall data collected from 26 stations. The methodological approach is based on the statistical modeling of maximum daily rainfall. Adjustments were made on several sample sizes and several return periods (2, 5, 10, 20, 50 and 100 years). The main results have shown that the 30 years series (1931-1960;1961-1990;1991-2020) are better adjusted by the Gumbel (26.92% - 53.85%) and Inverse Gamma (26.92% - 46.15%). Concerning the 60-years series (1931-1990;1961-2020), they are better adjusted by the Inverse Gamma (30.77%), Gamma (15.38% - 46.15%) and Gumbel (15.38% - 42.31%). The full chronicle 1931-2020 (90 years) presents a notable supremacy of 50% of Gumbel model over the Gamma (34.62%) and Gamma Inverse (15.38%) model. It is noted that the Gumbel is the most dominant model overall and more particularly in wet periods. The data for periods with normal and dry trends were better fitted by Gamma and Inverse Gamma.展开更多
Road surface condition evaluation involves the collection of data over pavement surface for different types of distresses. The exercise consumes a lot of resources if the whole road section length is surveyed and may ...Road surface condition evaluation involves the collection of data over pavement surface for different types of distresses. The exercise consumes a lot of resources if the whole road section length is surveyed and may be prone to errors as a result of surveyors' fatigue. It is therefore important to develop a representative sample to be used when evaluating road condition manually. This study aimed at determining an adequate sample size for section level as well as a way forward for network level condition evaluation of highways in Nepal. Again the study was conducted to quantify the effects of altering the sample unit size for performing a distress survey according to the PCI (pavement condition index) and SDI (surface distress index) method separately for asphalt surfaced roads. The effect of reducing/increasing sample unit size was investigated adopting visual examination through field survey by eight teams in July, 2015, along the section of Banepa-Bardibas highway. The PCI was then calculated for each sample unit using standard deduct curves and PCI calculation methodology as per SHRP (Strategic Highway Research Program) recommendations and the computation of SDI was done as per DoR (Department of Roads) guidelines. The results show that 13% sample unit are needed for SDI and 21% for PCI computation, however, the results are out of the significant level. This is higher than DoR and SHRP guidelines. Again no strong relationship is observed between SDI and PCI values.展开更多
Random sample partition(RSP)is a newly developed big data representation and management model to deal with big data approximate computation problems.Academic research and practical applications have confirmed that RSP...Random sample partition(RSP)is a newly developed big data representation and management model to deal with big data approximate computation problems.Academic research and practical applications have confirmed that RSP is an efficient solution for big data processing and analysis.However,a challenge for implementing RSP is determining an appropriate sample size for RSP data blocks.While a large sample size increases the burden of big data computation,a small size will lead to insufficient distribution information for RSP data blocks.To address this problem,this paper presents a novel density estimation-based method(DEM)to determine the optimal sample size for RSP data blocks.First,a theoretical sample size is calculated based on the multivariate Dvoretzky-Kiefer-Wolfowitz(DKW)inequality by using the fixed-point iteration(FPI)method.Second,a practical sample size is determined by minimizing the validation error of a kernel density estimator(KDE)constructed on RSP data blocks for an increasing sample size.Finally,a series of persuasive experiments are conducted to validate the feasibility,rationality,and effectiveness of DEM.Experimental results show that(1)the iteration function of the FPI method is convergent for calculating the theoretical sample size from the multivariate DKW inequality;(2)the KDE constructed on RSP data blocks with sample size determined by DEM can yield a good approximation of the probability density function(p.d.f);and(3)DEM provides more accurate sample sizes than the existing sample size determination methods from the perspective of p.d.f.estimation.This demonstrates that DEM is a viable approach to deal with the sample size determination problem for big data RSP implementation.展开更多
基金the National Natural Science Foundation of China(Grant No.61973033)Preliminary Research of Equipment(Grant No.9090102010305)for funding the experiments。
文摘The longitudinal dispersion of the projectile in shooting tests of two-dimensional trajectory corrections fused with fixed canards is extremely large that it sometimes exceeds the correction ability of the correction fuse actuator.The impact point easily deviates from the target,and thus the correction result cannot be readily evaluated.However,the cost of shooting tests is considerably high to conduct many tests for data collection.To address this issue,this study proposes an aiming method for shooting tests based on small sample size.The proposed method uses the Bootstrap method to expand the test data;repeatedly iterates and corrects the position of the simulated theoretical impact points through an improved compatibility test method;and dynamically adjusts the weight of the prior distribution of simulation results based on Kullback-Leibler divergence,which to some extent avoids the real data being"submerged"by the simulation data and achieves the fusion Bayesian estimation of the dispersion center.The experimental results show that when the simulation accuracy is sufficiently high,the proposed method yields a smaller mean-square deviation in estimating the dispersion center and higher shooting accuracy than those of the three comparison methods,which is more conducive to reflecting the effect of the control algorithm and facilitating test personnel to iterate their proposed structures and algorithms.;in addition,this study provides a knowledge base for further comprehensive studies in the future.
文摘Sample size determination typically relies on a power analysis based on a frequentist conditional approach. This latter can be seen as a particular case of the two-priors approach, which allows to build four distinct power functions to select the optimal sample size. We revise this approach when the focus is on testing a single binomial proportion. We consider exact methods and introduce a conservative criterion to account for the typical non-monotonic behavior of the power functions, when dealing with discrete data. The main purpose of this paper is to present a Shiny App providing a user-friendly, interactive tool to apply these criteria. The app also provides specific tools to elicit the analysis and the design prior distributions, which are the core of the two-priors approach.
基金supported by the National Natural Science Foundation of China(Grant No.52174062).
文摘Accurate prediction of the internal corrosion rates of oil and gas pipelines could be an effective way to prevent pipeline leaks.In this study,a proposed framework for predicting corrosion rates under a small sample of metal corrosion data in the laboratory was developed to provide a new perspective on how to solve the problem of pipeline corrosion under the condition of insufficient real samples.This approach employed the bagging algorithm to construct a strong learner by integrating several KNN learners.A total of 99 data were collected and split into training and test set with a 9:1 ratio.The training set was used to obtain the best hyperparameters by 10-fold cross-validation and grid search,and the test set was used to determine the performance of the model.The results showed that theMean Absolute Error(MAE)of this framework is 28.06%of the traditional model and outperforms other ensemblemethods.Therefore,the proposed framework is suitable formetal corrosion prediction under small sample conditions.
基金the Science,Research and Innovation Promotion Funding(TSRI)(Grant No.FRB660012/0168)managed under Rajamangala University of Technology Thanyaburi(FRB66E0646O.4).
文摘This study presents the design of a modified attributed control chart based on a double sampling(DS)np chart applied in combination with generalized multiple dependent state(GMDS)sampling to monitor the mean life of the product based on the time truncated life test employing theWeibull distribution.The control chart developed supports the examination of the mean lifespan variation for a particular product in the process of manufacturing.Three control limit levels are used:the warning control limit,inner control limit,and outer control limit.Together,they enhance the capability for variation detection.A genetic algorithm can be used for optimization during the in-control process,whereby the optimal parameters can be established for the proposed control chart.The control chart performance is assessed using the average run length,while the influence of the model parameters upon the control chart solution is assessed via sensitivity analysis based on an orthogonal experimental design withmultiple linear regression.A comparative study was conducted based on the out-of-control average run length,in which the developed control chart offered greater sensitivity in the detection of process shifts while making use of smaller samples on average than is the case for existing control charts.Finally,to exhibit the utility of the developed control chart,this paper presents its application using simulated data with parameters drawn from the real set of data.
文摘Background: When continuous scale measurements are available, agreements between two measuring devices are assessed both graphically and analytically. In clinical investigations, Bland and Altman proposed plotting subject-wise differences between raters against subject-wise averages. In order to scientifically assess agreement, Bartko recommended combining the graphical approach with the statistical analytic procedure suggested by Bradley and Blackwood. The advantage of using this approach is that it enables significance testing and sample size estimation. We noted that the direct use of the results of the regression is misleading and we provide a correction in this regard. Methods: Graphical and linear models are used to assess agreements for continuous scale measurements. We demonstrate that software linear regression results should not be readily used and we provided correct analytic procedures. The degrees of freedom of the F-statistics are incorrectly reported, and we propose methods to overcome this problem by introducing the correct analytic form of the F statistic. Methods for sample size estimation using R-functions are also given. Results: We believe that the tutorial and the R-codes are useful tools for testing and estimating agreement between two rating protocols for continuous scale measurements. The interested reader may use the codes and apply them to their available data when the issue of agreement between two raters is the subject of interest.
基金Supported by National Natural Science Foundation of China(Grant No.51175028)Great Scholars Training Project(Grant No.CIT&TCD20150312)Beijing Recognized Talent Project(Grant No.2014018)
文摘Reliability assessment of the braking system in a high?speed train under small sample size and zero?failure data is veryimportant for safe operation. Traditional reliability assessment methods are only performed well under conditions of large sample size and complete failure data,which lead to large deviation under conditions of small sample size and zero?failure data. To improve this problem,a new Bayesian method is proposed. Based on the characteristics of the solenoid valve in the braking system of a high?speed train,the modified Weibull distribution is selected to describe the failure rate over the entire lifetime. Based on the assumption of a binomial distribution for the failure probability at censored time,a concave method is employed to obtain the relationships between accumulation failure prob?abilities. A numerical simulation is performed to compare the results of the proposed method with those obtained from maximum likelihood estimation,and to illustrate that the proposed Bayesian model exhibits a better accuracy for the expectation value when the sample size is less than 12. Finally,the robustness of the model is demonstrated by obtaining the reliability indicators for a numerical case involving the solenoid valve of the braking system,which shows that the change in the reliability and failure rate among the di erent hyperparameters is small. The method is provided to avoid misleading of subjective information and improve accuracy of reliability assessment under condi?tions of small sample size and zero?failure data.
基金supported by the Youth Foundation from Sichuan Education Bureau (2006B009)Key Project from Sichuan Education Bureau (2006A008)Sichuan Youth Science & Technology Foundation,China (06ZQ026-020)
文摘Knowledge on spatial distribution and sampling size optimization of soil copper (Cu) could lay solid foundations for environmetal quality survey of agricultural soils at county scale. In this investigation, cokriging method was used to conduct the interpolation of Cu concentraiton in cropland soil in Shuangliu County, Sichuan Province, China. Based on the original 623 physicochmically measured soil samples, 560, 498, and 432 sub-samples were randomly selected as target variable and soil organic matter (SOM) of the whole original samples as auxiliary variable. Interpolation results using Cokriging under different sampling numbers were evaluated for their applicability in estimating the spatial distribution of soil Cu at county sacle. The results showed that the root mean square error (RMSE) produced by Cokriging decreased from 0.9 to 7.77%, correlation coefficient between the predicted values and the measured increased from 1.76 to 9.76% in comparison with the ordinary Kriging under the corresponding sample sizes. The prediction accuracy using Cokriging was still higher than original 623 data using ordinary Kriging even as sample size reduced 10%, and their interpolation maps were highly in agreement. Therefore, Cokriging was proven to be a more accurate and economic method which could provide more information and benefit for the studies on spatial distribution of soil pollutants at county scale.
基金supported by the National Natural Science Foundation of China (Grant No. 30700494)the Principal Fund of South China Agricultural University, China (Grant No. 2003K053)
文摘The development of a core collection could enhance the utilization of germplasm collections in crop improvement programs and simplify their management. Selection of an appropriate sampling strategy is an important prerequisite to construct a core collection with appropriate size in order to adequately represent the genetic spectrum and maximally capture the genetic diversity in available crop collections. The present study was initiated to construct nested core collections to determine the appropriate sample size to represent the genetic diversity of rice landrace collection based on 15 quantitative traits and 34 qualitative traits of 2 262 rice accessions. The results showed that 50-225 nested core collections, whose sampling rate was 2.2%-9.9%, were sufficient to maintain the maximum genetic diversity of the initial collections. Of these, 150 accessions (6.6%) could capture the maximal genetic diversity of the initial collection. Three data types, i.e. qualitative traits (QT1), quantitative traits (QT2) and integrated qualitative and quantitative traits (QTT), were compared for their efficiency in constructing core collections based on the weighted pair-group average method combined with stepwise clustering and preferred sampling on adjusted Euclidean distances. Every combining scheme constructed eight rice core collections (225, 200, 175, 150, 125, 100, 75 and 50). The results showed that the QTT data was the best in constructing a core collection as indicated by the genetic diversity of core collections. A core collection constructed only on the information of QT1 could not represent the initial collection effectively. QTT should be used together to construct a productive core collection.
基金Financial support is from the NSFC(Grant Nos.11602257,11472257,11272300,11572299)funded by the key subject"Computational Solid Mechanics"of the China Academy of Engineering Physics
文摘In order to investigate the effect of sample size on the dynamic torsional behaviour of the 2A12 aluminium alloy. In this paper, torsional split Hopkinson bar tests are conducted on this alloy with different sample dimensions. It is found that with the decreasing gauge length and thickness, the tested yield strength increases. However, the sample innerlouter diameter has little effect on the dynamic torsional behaviour. Based on the finite element method, the stress states in the alloy with different sample sizes are analysed. Due to the effect of stress concentration zone (SCZ), the shorter sample has a higher yield stress. Furthermore, the stress distributes more uniformly in the thinner sample, which leads to the higher tested yield stress. According to the experimental and simulation analysis, some suggestions on choosing the sample size are given as well.
基金The National Natural Science Foundation of China under contract No.31772852the Fundamental Research Funds for the Central Universities under contract No.201612004。
文摘This study used Ecopath model of the Jiaozhou Bay as an example to evaluate the effect of stomach sample size of three fish species on the projection of this model. The derived ecosystem indices were classified into three categories:(1) direct indices, like the trophic level of species, influenced by stomach sample size directly;(2)indirect indices, like ecology efficiency(EE) of invertebrates, influenced by the multiple prey-predator relationships;and(3) systemic indices, like total system throughout(TST), describing the status of the whole ecosystem. The influences of different stomach sample sizes on these indices were evaluated. The results suggest that systemic indices of the ecosystem model were robust to stomach sample sizes, whereas specific indices related to species were indicated to be with low accuracy and precision when stomach samples were insufficient.The indices became more uncertain when the stomach sample sizes varied for more species. This study enhances the understanding of how the quality of diet composition data influences ecosystem modeling outputs. The results can also guide the design of stomach content analysis for developing ecosystem models.
基金supported by the National Natural Science Foundation of China(81273184)the National Natural Science Foundation of China Grant for Young Scientists (81302512)
文摘Sample size re-estimation is essential in oncology studies. However, the use of blinded sample size reassessment for survival data has been rarely reported. Based on the density function of the exponential distribution, an expectation-maximization(EM) algorithm of the hazard ratio was derived, and several simulation studies were used to verify its applications. The method had obvious variation in the hazard ratio estimates and overestimation for the relatively small hazard ratios. Our studies showed that the stability of the EM estimation results directly correlated with the sample size, the convergence of the EM algorithm was impacted by the initial values, and a balanced design produced the best estimates. No reliable blinded sample size re-estimation inference can be made in our studies, but the results provide useful information to steer the practitioners in this field from repeating the same endeavor.
文摘Subjective evaluations are nowadays applied more commonly in cosmetic product assessment. They are used in quality control, product development steps and efficacy studies for claim support. Several studies have been published to determine the adequate number of panelists, but recommendations and guidelines dealing with this topic are rare in the cosmetic sector. The aim of the present pilot study was to recommend a suitable study plan and define the adequate consumer panel size for cosmetic consumer assessment. A questionnaire-based product evaluation study, with three different cosmetic products, was organized as a consumer test using a seven-point scale. As a last step, a specific statistical calculation was performed to define the minimum sample size. It showed that the minimum sample size, besides the obvious statistical parameters of standard deviation and confidence interval, also depends on age and gender of the panelists and product assessment item. Utilizing a CI of 95% a minimum of 60 panelists seems to be sufficient for home-use-test (HUT) with a given seven-point scale. A minimum of 101 panelists are shown to be sufficient utilizing a CI of 99%.
基金The National High Technology Research and Development Program(863 program)of China under contract No.2012AA10A410the Zhejiang Science and Technology Project of Agricultural Breeding under contract No.2012C12907-4the Scientific and Technological Innovation Project financially supported by Qingdao National Laboratory for Marine Science and Technology under contract No.2015ASKJ02
文摘The precise and accurate knowledge of genetic parameters is a prerequisite for making efficient selection strategies in breeding programs.A number of estimators of heritability about important economic traits in many marine mollusks are available in the literature,however very few research have evaluated about the accuracy of genetic parameters estimated with different family structures.Thus,in the present study,the effect of parent sample size for estimating the precision of genetic parameters of four growth traits in clam M.meretrix by factorial designs were analyzed through restricted maximum likelihood(REML) and Bayesian.The results showed that the average estimated heritabilities of growth traits obtained from REML were 0.23-0.32 for 9 and 16 full-sib families and 0.19-0.22 for 25 full-sib families.When using Bayesian inference,the average estimated heritabilities were0.11-0.12 for 9 and 16 full-sib families and 0.13-0.16 for 25 full-sib families.Compared with REML,Bayesian got lower heritabilities,but still remained at a medium level.When the number of parents increased from 6 to 10,the estimated heritabilities were more closed to 0.20 in REML and 0.12 in Bayesian inference.Genetic correlations among traits were positive and high and had no significant difference between different sizes of designs.The accuracies of estimated breeding values from the 9 and 16 families were less precise than those from 25 families.Our results provide a basic genetic evaluation for growth traits and should be useful for the design and operation of a practical selective breeding program in the clam M.meretrix.
文摘BACKGROUND Of 25%of randomised controlled trials(RCTs)on interventions for inflammatory bowel disease(IBD)have no power calculation.AIM To systematically review RCTs reporting interventions for the management of IBD and to produce data for minimum sample sizes that would achieve appropriate power using the actual clinical data.METHODS We included RCTs retrieved from Cochrane IBD specialised Trial register and CENTRAL investigating any form of therapy for either induction or maintenance of remission against control,placebo,or no intervention of IBD in patients of any age.The relevant data was extracted,and the studies were grouped according to the intervention used.We recalculated sample size and the achieved difference,as well as minimum sample sizes needed in the future.RESULTS A total of 105 trials were included.There was a large discrepancy between the estimated figure for the minimal clinically important difference used for the calculations(15%group differences observed vs 30%used for calculation)explaining substantial actual sample size deficits.The minimum sample sizes indicated for future trials based on the 25 years of trial data were calculated and grouped by the intervention.CONCLUSION A third of intervention studies in IBD within the last 25 years are underpowered,with large variations in the calculation of sample sizes.The authors present a sample size estimate resource constructed on the published evidence base for future researchers and key stakeholders within the IBD trial field.
基金supported by the grants from the Chinese Natural Science Foundation(Grant No.31071398)the National Program on Super Rice Breeding,the Ministry of Agriculture(Grant No.2010-3)+1 种基金National High Technology Research and Development Program of China(Grant No.2006AA10Z1E8)the Provincial Program of ‘8812’,Zhejiang Province,China(Grant No.8812-1)
文摘To clarify the most appropriate sample size for obtaining phenotypic data for a single line,we investigated the main-effect QTL(M-QTL) of a quantitative trait plant height(ph) in a recombinant inbred line(RIL) population of rice(derived from the cross between Xieqingzao B and Zhonghui 9308) using five individual plants in 2006 and 2009.Twenty-six ph phenotypic datasets from the completely random combinations of 2,3,4,and 5 plants in a single line,and five ph phenotypic datasets from five individual plants were used to detect the QTLs.Fifteen M-QTLs were detected by 1 to 31 datasets.Of these,qph7a was detected repeatedly by all the 31 ph datasets in 2006 and explained 11.67% to 23.93% of phenotypic variation;qph3 was detected repeatedly by all the 31 datasets and explained 5.21% to 7.93% and 11.51% to 24.46% of phenotypic variance in 2006 and 2009,respectively.The results indicate that the M-QTL for a quantitative trait could be detected repeatedly by the phenotypic values from 5 individual plants and 26 sets of completely random combinations of phenotypic data within a single line in an RIL population under different environments.The sample size for a single line of the RIL population did not affect the efficiency for identification of stably expressed M-QTLs.
基金The project is supported by the Aeronautic Science Foundation,China
文摘After finishing 102 replicate constant amplitude crack initiation and growth tests on Ly12-CZ aluminum alloy plate, a statistical investigation of the fatigue crack initiation and growth process is conducted in this paper. According to the post-mortem fractographic examination by scanning electron microscopy (SEM), some qualitative observations of the spacial correlation among fatigue striations are developed to reveal the statistical nature of material intrinsic inhomogeneity during the crack growth process. From the test data, an engineering division between crack initiation and growth is defined as the upper limit of small crack. The distributions of crack initiation life N-i, growth life N, and the statistical characteristics of crack growth rate da/dN are also investigated. It is hoped that the work will provide a solid test basis for the study of probabilistic fatigue, probabilistic fracture mechanics, fatigue reliability and its engineering applications.
文摘Sample size can be a key design feature that not only affects the probability of a trial's success but also determines the duration and feasibility of a trial. If an investigational drug is expected to be effective and address unmet medical needs of an orphan disease, where the accrual period may require many years with a large sample size to detect a minimal clinically relevant treatment effect, a minimum sample size may be set to maintain nominal power. In limited situations such as this, there may be a need for flexibility in the initial and final sample sizes; thus, it is useful to consider the utility of adaptive sample size designs that use sample size re-estimation or group sequential design. In this paper, we propose a new adaptive performance measure to consider the utility of an adaptive sample size design in a trial simulation. Considering that previously proposed sample size re-estimation methods do not take into account errors in estimation based on interim results, we propose Bayesian sample size re-estimation criteria that take into account prior information on treatment effect, and then, we assess its operating characteristics in a simulation study. We also present a review example of sample size re-estimation mainly based on published paper and review report in Pharmaceuticals and Medical Devices Agency (PMDA).
文摘The objective of this study is to analyze the sensitivity of the statistical models regarding the size of samples. The study carried out in Ivory Coast is based on annual maximum daily rainfall data collected from 26 stations. The methodological approach is based on the statistical modeling of maximum daily rainfall. Adjustments were made on several sample sizes and several return periods (2, 5, 10, 20, 50 and 100 years). The main results have shown that the 30 years series (1931-1960;1961-1990;1991-2020) are better adjusted by the Gumbel (26.92% - 53.85%) and Inverse Gamma (26.92% - 46.15%). Concerning the 60-years series (1931-1990;1961-2020), they are better adjusted by the Inverse Gamma (30.77%), Gamma (15.38% - 46.15%) and Gumbel (15.38% - 42.31%). The full chronicle 1931-2020 (90 years) presents a notable supremacy of 50% of Gumbel model over the Gamma (34.62%) and Gamma Inverse (15.38%) model. It is noted that the Gumbel is the most dominant model overall and more particularly in wet periods. The data for periods with normal and dry trends were better fitted by Gamma and Inverse Gamma.
文摘Road surface condition evaluation involves the collection of data over pavement surface for different types of distresses. The exercise consumes a lot of resources if the whole road section length is surveyed and may be prone to errors as a result of surveyors' fatigue. It is therefore important to develop a representative sample to be used when evaluating road condition manually. This study aimed at determining an adequate sample size for section level as well as a way forward for network level condition evaluation of highways in Nepal. Again the study was conducted to quantify the effects of altering the sample unit size for performing a distress survey according to the PCI (pavement condition index) and SDI (surface distress index) method separately for asphalt surfaced roads. The effect of reducing/increasing sample unit size was investigated adopting visual examination through field survey by eight teams in July, 2015, along the section of Banepa-Bardibas highway. The PCI was then calculated for each sample unit using standard deduct curves and PCI calculation methodology as per SHRP (Strategic Highway Research Program) recommendations and the computation of SDI was done as per DoR (Department of Roads) guidelines. The results show that 13% sample unit are needed for SDI and 21% for PCI computation, however, the results are out of the significant level. This is higher than DoR and SHRP guidelines. Again no strong relationship is observed between SDI and PCI values.
基金This paper was supported by the National Natural Science Foundation of China(Grant No.61972261)the Natural Science Foundation of Guangdong Province(No.2023A1515011667)+1 种基金the Key Basic Research Foundation of Shenzhen(No.JCYJ20220818100205012)the Basic Research Foundation of Shenzhen(No.JCYJ20210324093609026)。
文摘Random sample partition(RSP)is a newly developed big data representation and management model to deal with big data approximate computation problems.Academic research and practical applications have confirmed that RSP is an efficient solution for big data processing and analysis.However,a challenge for implementing RSP is determining an appropriate sample size for RSP data blocks.While a large sample size increases the burden of big data computation,a small size will lead to insufficient distribution information for RSP data blocks.To address this problem,this paper presents a novel density estimation-based method(DEM)to determine the optimal sample size for RSP data blocks.First,a theoretical sample size is calculated based on the multivariate Dvoretzky-Kiefer-Wolfowitz(DKW)inequality by using the fixed-point iteration(FPI)method.Second,a practical sample size is determined by minimizing the validation error of a kernel density estimator(KDE)constructed on RSP data blocks for an increasing sample size.Finally,a series of persuasive experiments are conducted to validate the feasibility,rationality,and effectiveness of DEM.Experimental results show that(1)the iteration function of the FPI method is convergent for calculating the theoretical sample size from the multivariate DKW inequality;(2)the KDE constructed on RSP data blocks with sample size determined by DEM can yield a good approximation of the probability density function(p.d.f);and(3)DEM provides more accurate sample sizes than the existing sample size determination methods from the perspective of p.d.f.estimation.This demonstrates that DEM is a viable approach to deal with the sample size determination problem for big data RSP implementation.