In this study, the statistical powers of Kolmogorov-Smimov two-sample (KS-2) and Wald Wolfowitz (WW) tests, non-parametric tests used in testing data from two independent samples, have been compared in terms of fi...In this study, the statistical powers of Kolmogorov-Smimov two-sample (KS-2) and Wald Wolfowitz (WW) tests, non-parametric tests used in testing data from two independent samples, have been compared in terms of fixed skewness and fixed kurtosis by means of Monte Carlo simulation. This comparison has been made when the ratio of variance is two as well as with equal and different sample sizes for large sample volumes. The sample used in the study is: (25, 25), (25, 50), (25, 75), (25, 100), (50, 25), (50, 50), (50, 75), (50, 100), (75, 25), (75, 50), (75, 75), (75, 100), (100, 25), (100, 50), (100, 75), and (100, 100). According to the results of the study, it has been observed that the statistical power of both tests decreases when the coefficient of kurtosis is held fixed and the coefficient of skewness is reduced while it increases when the coefficient of skewness is held fixed and the coefficient of kurtosis is reduced. When the ratio of skewness is reduced in the case of fixed kurtosis, the WW test is stronger in sample volumes (25, 25), (25, 50), (25, 75), (25, 100), (50, 75), and (50, 100) while KS-2 test is stronger in other sample volumes. When the ratio of kurtosis is reduced in the case of fixed skewness, the statistical power of WW test is stronger in volume samples (25, 25), (25, 75), (25, 100), and (75, 25) while KS-2 test is stronger in other sample volumes.展开更多
Alignment-based database search and sequence comparison are commonly used to detect horizontal gene transfer(HGT).However,with the rapid increase of sequencing depth,hundreds of thousands of contigs are routinely asse...Alignment-based database search and sequence comparison are commonly used to detect horizontal gene transfer(HGT).However,with the rapid increase of sequencing depth,hundreds of thousands of contigs are routinely assembled from metagenomics studies,which challenges alignment-based HGT analysis by overwhelming the known reference sequences.Detecting HGT by k-mer statistics thus becomes an attractive alternative.These alignment-free statistics have been demonstrated in high performance and efficiency in wholegenome and transcriptome comparisons.To adapt k-mer statistics for HGT detection,we developed two aggregative statistics T^(S)_(sum ) and T^(*)_(sum),which subsample metagenome contigs by their representative regions,and summarize the regional D^(S) _(2) and D^(*)_(2)metrics by their upper bounds.We systematically studied the aggregative statistics’power at different k-mer size using simulations.Our analysis showed that,in general,the power of T^(S)_(sum) and T^(*)_(sum) increases with sequencing coverage,and reaches a maximum power>80%at k=6,with 5%Type-I error and the coverage ratio>0.2x.The statistical power ofT^(S)_(sum) and T^(*)_(sum) was evaluated with realistic simulations of HGT mechanism,sequencing depth,read length,and base error.We expect these statistics to be useful distance metrics for identifying HGT in metagenomic studies.展开更多
The aim of this paper is to present a generalization of the Shapiro-Wilk W-test or Shapiro-Francia W'-test for application to two or more variables. It consists of calculating all the unweighted linear combination...The aim of this paper is to present a generalization of the Shapiro-Wilk W-test or Shapiro-Francia W'-test for application to two or more variables. It consists of calculating all the unweighted linear combinations of the variables and their W- or W'-statistics with the Royston’s log-transformation and standardization, z<sub>ln(1-W)</sub> or z<sub>ln(1-W</sub><sub>'</sub><sub>)</sub>. Because the calculation of the probability of z<sub>ln(1-W)</sub> or z<sub>ln(1-W</sub><sub>'</sub><sub>)</sub> is to the right tail, negative values are truncated to 0 before doing their sum of squares. Independence in the sequence of these half-normally distributed values is required for the test statistic to follow a chi-square distribution. This assumption is checked using the robust Ljung-Box test. One degree of freedom is lost for each cancelled value. Defined the new test with its two variants (Q-test or Q'-test), 50 random samples with 4 variables and 20 participants were generated, 20% following a multivariate normal distribution and 80% deviating from this distribution. The new test was compared with Mardia’s, runs, and Royston’s tests. Central tendency differences in type II error and statistical power were tested using the Friedman’s test and pairwise comparisons using the Wilcoxon’s test. Differences in the frequency of successes in statistical decision making were compared using the Cochran’s Q test and pairwise comparisons using the McNemar’s test. Sensitivity, specificity and efficiency proportions were compared using the McNemar’s Z test. The generated 50 samples were classified into five ordered categories of deviation from multivariate normality, the correlation between this variable and p-value of each test was calculated using the Spearman’s coefficient and these correlations were compared. Family-wise error rate corrections were applied. The new test and the Royston’s test were the best choices, with a very slight advantage Q-test over Q'-test. Based on these promising results, further study and use of this new sensitive, specific and effective test are suggested.展开更多
In this paper, we study an energy efficient multi-antenna unmanned aerial vehicle(UAV)-enabled half-duplex mobile relaying system under Rician fading channels. By assuming that the UAV follows a circular trajectory at...In this paper, we study an energy efficient multi-antenna unmanned aerial vehicle(UAV)-enabled half-duplex mobile relaying system under Rician fading channels. By assuming that the UAV follows a circular trajectory at fixed altitude and applying the decode-and-forward relaying strategy, we maximize the energy efficiency by jointly designing beamforming, power allocation, circular radius and flight speed, subject to the sum transmit power constraint on source node and UAV relay node. First, we maximize the end-to-end signal-to-noise ratio by jointly designing beamforming and statistical power allocation. Based on the obtained beamforming and power allocation results, we then obtain a semi closed-form expression of energy efficiency, and finally maximize energy efficiency by optimizing flight speed and circular radius, allowing optimal circular radius to be obtained via numerical computation. Numerical results demonstrate that the proposed scheme can effectively enhance the system energy efficiency.展开更多
Statistical power, number of replicates and experiment complexity of semi-field and field studies on Apis and non-Apis bee species has become a major issue after publication of the draft European Food Safety Authori...Statistical power, number of replicates and experiment complexity of semi-field and field studies on Apis and non-Apis bee species has become a major issue after publication of the draft European Food Safety Authority (EFSA) Guidance on risk assessment of plant protection products (PPP) on bees (Apis mellifera, Bombus spp. and solitary bees). According to this guidance document, field studies have to be designed to be able to detect significance differences as low as 7% for certain endpoints such as reduction in colony size. This will require an immense number of replicates which is obviously not feasible. In the present study, key endpoints such as mortality, termination rate and number of brood cells in honeybee studies, cocoon production and flight activity in solitary bee studies and number of gynes in bumble bee studies (just to mention some of the endpoints considered) in semi-field studies were analyzed, with Apis mellifera, Bombus terrestris and Osmia bicornis during the past five years (2013-2017). The results indicate huge differences in the percentage minimal detectable differences (%MDDs) (MDD expressed as median of control value of the endpoint in percent) depending on endpoint and species tested. For honeybee semi-field studies, the lowest %MDDs recorded were between 10% and 15% for the endpoints foraging, number of brood cells and colony strength. The highest %MDDs were observed for the endpoint termination rate, with a %MDD of almost 50%. For the endpoints in bumble bee semi-field studies the %MDDs varied between 17% for bumble bee colony weight and 53% for average mortality during the exposure period in the tunnel. The %MDD for the number of gynes (young queens) was slightly below 25%. For the semi-field solitary bee test system, the %MDDs for the measured endpoints seem to be lower than those for the other two species tested. The %MDDs for the endpoints hatching of offspring, nest occupation and number of cocoons were 8%, 13% and 14%, respectively. Most of the %MDDs were between 10% and 30% indicating clearly that the currently performed experimental design for the semi-field pollinator studies allowed to determine relatively small effects on key study endpoints. The analysis indicated that for all the three bee species tested, the semi-field test design detected low %MDDs for most of the endpoints. It was also observed that detectable differences between the control and PPP treatments were much lower in semi-field test designs than in field studies with these bee species. The “perfect sample size” really does not exist but test design and statistical analysis can be adapted to lower the %MDDs. Measured and simulated (according to Student’s t-test-distribution) data and results showed that statistical evaluations parameter selection (e.g., alpha value), data transformation (log10) and the number of replicates had a direct effect on the ability of the test design to detect lower or higher %MDD values. It could show that a change of alpha value from 0.05 to 0.1, increases the ability of the studies to detect lower %MDDs. For most of the measured endpoints, increasing the number of replicates e.g., from four to eight, improved the power of the test design by decreasing the %MDD. The reduction magnitude of the %MDD is dependent on the endpoint and selection of statistical parameters such as the alpha value. Parameters that display effects at a biologically relevant scale will be a better indicator for effects than parameters that are able to detect minor differences that are not biologically relevant.展开更多
While the conventional forensic scientists routinely validate and express the results of their investigations quantitatively using statistical measures from probability theory,digital forensics examiners rarely if eve...While the conventional forensic scientists routinely validate and express the results of their investigations quantitatively using statistical measures from probability theory,digital forensics examiners rarely if ever do so.In this paper,we review some of the quantitative tools and techniques which are available for use in digital forensic investigations,including Bayesian networks,complexity theory,information theory and probability theory,and indicate how they may be used to obtain likelihood ratios or odds ratios for the relative plausibility of alternative explanations for the creation of the recovered digital evidence.The potential benefits of such quantitative measures for modern digital forensics are also outlined.展开更多
Let{Xn:n≥1}be a sequence of independent random variables with common general error distribution GED(v)with shape parameter v>0,and let Mn,r denote the r-th largest order statistics of X1,X2,...,Xn.With different n...Let{Xn:n≥1}be a sequence of independent random variables with common general error distribution GED(v)with shape parameter v>0,and let Mn,r denote the r-th largest order statistics of X1,X2,...,Xn.With different normalizing constants the distributional expansions and the uniform convergence rates of normalized powered order statistics|Mn,r|p are established.An alternative method is presented to estimate the probability of the r-th extremes.Numerical analyses are provided to support the main results.展开更多
This study aims to examine the effect of El Nino and La Nina on the monthly and seasonal climate of Hong Kong against the ENSO-neutral situation from a statistical perspective. Monthly and seasonal temperature and rai...This study aims to examine the effect of El Nino and La Nina on the monthly and seasonal climate of Hong Kong against the ENSO-neutral situation from a statistical perspective. Monthly and seasonal temperature and rainfall of Hong Kong and monthly number of tropical cyclones (TCs) coming within 500 km of the city over the 59-yr (1950–2008) period are examined under three ENSO situations, namely El Nino, La Nina, and ENSO-neutral. It is found that, compared with the ENSO-neutral situation, El Nino tends to be associated with wetter winter (December–February) and spring (March–May) while La Nina tends to be associated with cooler autumn (September–November) and winter. El Nino tends to be associated with a later start of the tropical cyclone season of Hong Kong while La Nina tends to be associated with more TCs coming within 500 km of Hong Kong in August–October. It is also found that, for April and June–December, although the monthly number of TCs coming within 500 km of Hong Kong during El Nino is generally lower than that under the ENSO-neutral situation, the difference is not statistically significant based on the current data sample size.展开更多
Background:Interventional trials in amyotrophic lateral sclerosis(ALS)sufer from the heterogeneity of the disease as it considerably reduces statistical power.We asked if blood neuroflament light chains(NfL)could be u...Background:Interventional trials in amyotrophic lateral sclerosis(ALS)sufer from the heterogeneity of the disease as it considerably reduces statistical power.We asked if blood neuroflament light chains(NfL)could be used to antici‑pate disease progression and increase trial power.Methods:In 125 patients with ALS from three independent prospective studies-one observational study and two interventional trials-we developed and externally validated a multivariate linear model for predicting disease pro‑gression,measured by the monthly decrease of the ALS Functional Rating Scale Revised(ALSFRS-R)score.We trained the prediction model in the observational study and tested the predictive value of the following parameters assessed at diagnosis:NfL levels,sex,age,site of onset,body mass index,disease duration,ALSFRS-R score,and monthly ALSFRS-R score decrease since disease onset.We then applied the resulting model in the other two study cohorts to assess the actual utility for interventional trials.We analyzed the impact on trial power in mixed-efects models and compared the performance of the NfL model with two currently used predictive approaches,which anticipate disease progression using the ALSFRS-R decrease during a three-month observational period(lead-in)or since disease onset(ΔFRS).Results:Among the parameters provided,the NfL levels(P<0.001)and the interaction with site of onset(P<0.01)contributed signifcantly to the prediction,forming a robust NfL prediction model(R=0.67).Model application in the trial cohorts confrmed its applicability and revealed superiority over lead-in andΔFRS-based approaches.The NfL model improved statistical power by 61%and 22%(95%confdence intervals:54%-66%,7%-29%).Conclusion:The use of the NfL-based prediction model to compensate for clinical heterogeneity in ALS could signif‑cantly increase the trial power.NCT00868166,registered March23,2009;NCT02306590,registered December 2,2014.展开更多
文摘In this study, the statistical powers of Kolmogorov-Smimov two-sample (KS-2) and Wald Wolfowitz (WW) tests, non-parametric tests used in testing data from two independent samples, have been compared in terms of fixed skewness and fixed kurtosis by means of Monte Carlo simulation. This comparison has been made when the ratio of variance is two as well as with equal and different sample sizes for large sample volumes. The sample used in the study is: (25, 25), (25, 50), (25, 75), (25, 100), (50, 25), (50, 50), (50, 75), (50, 100), (75, 25), (75, 50), (75, 75), (75, 100), (100, 25), (100, 50), (100, 75), and (100, 100). According to the results of the study, it has been observed that the statistical power of both tests decreases when the coefficient of kurtosis is held fixed and the coefficient of skewness is reduced while it increases when the coefficient of skewness is held fixed and the coefficient of kurtosis is reduced. When the ratio of skewness is reduced in the case of fixed kurtosis, the WW test is stronger in sample volumes (25, 25), (25, 50), (25, 75), (25, 100), (50, 75), and (50, 100) while KS-2 test is stronger in other sample volumes. When the ratio of kurtosis is reduced in the case of fixed skewness, the statistical power of WW test is stronger in volume samples (25, 25), (25, 75), (25, 100), and (75, 25) while KS-2 test is stronger in other sample volumes.
基金L.C.X.was supported by the Innovation in Cancer Informatics Fund.
文摘Alignment-based database search and sequence comparison are commonly used to detect horizontal gene transfer(HGT).However,with the rapid increase of sequencing depth,hundreds of thousands of contigs are routinely assembled from metagenomics studies,which challenges alignment-based HGT analysis by overwhelming the known reference sequences.Detecting HGT by k-mer statistics thus becomes an attractive alternative.These alignment-free statistics have been demonstrated in high performance and efficiency in wholegenome and transcriptome comparisons.To adapt k-mer statistics for HGT detection,we developed two aggregative statistics T^(S)_(sum ) and T^(*)_(sum),which subsample metagenome contigs by their representative regions,and summarize the regional D^(S) _(2) and D^(*)_(2)metrics by their upper bounds.We systematically studied the aggregative statistics’power at different k-mer size using simulations.Our analysis showed that,in general,the power of T^(S)_(sum) and T^(*)_(sum) increases with sequencing coverage,and reaches a maximum power>80%at k=6,with 5%Type-I error and the coverage ratio>0.2x.The statistical power ofT^(S)_(sum) and T^(*)_(sum) was evaluated with realistic simulations of HGT mechanism,sequencing depth,read length,and base error.We expect these statistics to be useful distance metrics for identifying HGT in metagenomic studies.
文摘The aim of this paper is to present a generalization of the Shapiro-Wilk W-test or Shapiro-Francia W'-test for application to two or more variables. It consists of calculating all the unweighted linear combinations of the variables and their W- or W'-statistics with the Royston’s log-transformation and standardization, z<sub>ln(1-W)</sub> or z<sub>ln(1-W</sub><sub>'</sub><sub>)</sub>. Because the calculation of the probability of z<sub>ln(1-W)</sub> or z<sub>ln(1-W</sub><sub>'</sub><sub>)</sub> is to the right tail, negative values are truncated to 0 before doing their sum of squares. Independence in the sequence of these half-normally distributed values is required for the test statistic to follow a chi-square distribution. This assumption is checked using the robust Ljung-Box test. One degree of freedom is lost for each cancelled value. Defined the new test with its two variants (Q-test or Q'-test), 50 random samples with 4 variables and 20 participants were generated, 20% following a multivariate normal distribution and 80% deviating from this distribution. The new test was compared with Mardia’s, runs, and Royston’s tests. Central tendency differences in type II error and statistical power were tested using the Friedman’s test and pairwise comparisons using the Wilcoxon’s test. Differences in the frequency of successes in statistical decision making were compared using the Cochran’s Q test and pairwise comparisons using the McNemar’s test. Sensitivity, specificity and efficiency proportions were compared using the McNemar’s Z test. The generated 50 samples were classified into five ordered categories of deviation from multivariate normality, the correlation between this variable and p-value of each test was calculated using the Spearman’s coefficient and these correlations were compared. Family-wise error rate corrections were applied. The new test and the Royston’s test were the best choices, with a very slight advantage Q-test over Q'-test. Based on these promising results, further study and use of this new sensitive, specific and effective test are suggested.
基金supported in part by the National Science Foundation (NSFC) for Distinguished Young Scholars of China with Grant 61625106the National Natural Science Foundation of China under Grant 61531011
文摘In this paper, we study an energy efficient multi-antenna unmanned aerial vehicle(UAV)-enabled half-duplex mobile relaying system under Rician fading channels. By assuming that the UAV follows a circular trajectory at fixed altitude and applying the decode-and-forward relaying strategy, we maximize the energy efficiency by jointly designing beamforming, power allocation, circular radius and flight speed, subject to the sum transmit power constraint on source node and UAV relay node. First, we maximize the end-to-end signal-to-noise ratio by jointly designing beamforming and statistical power allocation. Based on the obtained beamforming and power allocation results, we then obtain a semi closed-form expression of energy efficiency, and finally maximize energy efficiency by optimizing flight speed and circular radius, allowing optimal circular radius to be obtained via numerical computation. Numerical results demonstrate that the proposed scheme can effectively enhance the system energy efficiency.
文摘Statistical power, number of replicates and experiment complexity of semi-field and field studies on Apis and non-Apis bee species has become a major issue after publication of the draft European Food Safety Authority (EFSA) Guidance on risk assessment of plant protection products (PPP) on bees (Apis mellifera, Bombus spp. and solitary bees). According to this guidance document, field studies have to be designed to be able to detect significance differences as low as 7% for certain endpoints such as reduction in colony size. This will require an immense number of replicates which is obviously not feasible. In the present study, key endpoints such as mortality, termination rate and number of brood cells in honeybee studies, cocoon production and flight activity in solitary bee studies and number of gynes in bumble bee studies (just to mention some of the endpoints considered) in semi-field studies were analyzed, with Apis mellifera, Bombus terrestris and Osmia bicornis during the past five years (2013-2017). The results indicate huge differences in the percentage minimal detectable differences (%MDDs) (MDD expressed as median of control value of the endpoint in percent) depending on endpoint and species tested. For honeybee semi-field studies, the lowest %MDDs recorded were between 10% and 15% for the endpoints foraging, number of brood cells and colony strength. The highest %MDDs were observed for the endpoint termination rate, with a %MDD of almost 50%. For the endpoints in bumble bee semi-field studies the %MDDs varied between 17% for bumble bee colony weight and 53% for average mortality during the exposure period in the tunnel. The %MDD for the number of gynes (young queens) was slightly below 25%. For the semi-field solitary bee test system, the %MDDs for the measured endpoints seem to be lower than those for the other two species tested. The %MDDs for the endpoints hatching of offspring, nest occupation and number of cocoons were 8%, 13% and 14%, respectively. Most of the %MDDs were between 10% and 30% indicating clearly that the currently performed experimental design for the semi-field pollinator studies allowed to determine relatively small effects on key study endpoints. The analysis indicated that for all the three bee species tested, the semi-field test design detected low %MDDs for most of the endpoints. It was also observed that detectable differences between the control and PPP treatments were much lower in semi-field test designs than in field studies with these bee species. The “perfect sample size” really does not exist but test design and statistical analysis can be adapted to lower the %MDDs. Measured and simulated (according to Student’s t-test-distribution) data and results showed that statistical evaluations parameter selection (e.g., alpha value), data transformation (log10) and the number of replicates had a direct effect on the ability of the test design to detect lower or higher %MDD values. It could show that a change of alpha value from 0.05 to 0.1, increases the ability of the studies to detect lower %MDDs. For most of the measured endpoints, increasing the number of replicates e.g., from four to eight, improved the power of the test design by decreasing the %MDD. The reduction magnitude of the %MDD is dependent on the endpoint and selection of statistical parameters such as the alpha value. Parameters that display effects at a biologically relevant scale will be a better indicator for effects than parameters that are able to detect minor differences that are not biologically relevant.
文摘While the conventional forensic scientists routinely validate and express the results of their investigations quantitatively using statistical measures from probability theory,digital forensics examiners rarely if ever do so.In this paper,we review some of the quantitative tools and techniques which are available for use in digital forensic investigations,including Bayesian networks,complexity theory,information theory and probability theory,and indicate how they may be used to obtain likelihood ratios or odds ratios for the relative plausibility of alternative explanations for the creation of the recovered digital evidence.The potential benefits of such quantitative measures for modern digital forensics are also outlined.
文摘Let{Xn:n≥1}be a sequence of independent random variables with common general error distribution GED(v)with shape parameter v>0,and let Mn,r denote the r-th largest order statistics of X1,X2,...,Xn.With different normalizing constants the distributional expansions and the uniform convergence rates of normalized powered order statistics|Mn,r|p are established.An alternative method is presented to estimate the probability of the r-th extremes.Numerical analyses are provided to support the main results.
文摘This study aims to examine the effect of El Nino and La Nina on the monthly and seasonal climate of Hong Kong against the ENSO-neutral situation from a statistical perspective. Monthly and seasonal temperature and rainfall of Hong Kong and monthly number of tropical cyclones (TCs) coming within 500 km of the city over the 59-yr (1950–2008) period are examined under three ENSO situations, namely El Nino, La Nina, and ENSO-neutral. It is found that, compared with the ENSO-neutral situation, El Nino tends to be associated with wetter winter (December–February) and spring (March–May) while La Nina tends to be associated with cooler autumn (September–November) and winter. El Nino tends to be associated with a later start of the tropical cyclone season of Hong Kong while La Nina tends to be associated with more TCs coming within 500 km of Hong Kong in August–October. It is also found that, for April and June–December, although the monthly number of TCs coming within 500 km of Hong Kong during El Nino is generally lower than that under the ENSO-neutral situation, the difference is not statistically significant based on the current data sample size.
基金Open Access funding enabled and organized by Projekt DEAL。
文摘Background:Interventional trials in amyotrophic lateral sclerosis(ALS)sufer from the heterogeneity of the disease as it considerably reduces statistical power.We asked if blood neuroflament light chains(NfL)could be used to antici‑pate disease progression and increase trial power.Methods:In 125 patients with ALS from three independent prospective studies-one observational study and two interventional trials-we developed and externally validated a multivariate linear model for predicting disease pro‑gression,measured by the monthly decrease of the ALS Functional Rating Scale Revised(ALSFRS-R)score.We trained the prediction model in the observational study and tested the predictive value of the following parameters assessed at diagnosis:NfL levels,sex,age,site of onset,body mass index,disease duration,ALSFRS-R score,and monthly ALSFRS-R score decrease since disease onset.We then applied the resulting model in the other two study cohorts to assess the actual utility for interventional trials.We analyzed the impact on trial power in mixed-efects models and compared the performance of the NfL model with two currently used predictive approaches,which anticipate disease progression using the ALSFRS-R decrease during a three-month observational period(lead-in)or since disease onset(ΔFRS).Results:Among the parameters provided,the NfL levels(P<0.001)and the interaction with site of onset(P<0.01)contributed signifcantly to the prediction,forming a robust NfL prediction model(R=0.67).Model application in the trial cohorts confrmed its applicability and revealed superiority over lead-in andΔFRS-based approaches.The NfL model improved statistical power by 61%and 22%(95%confdence intervals:54%-66%,7%-29%).Conclusion:The use of the NfL-based prediction model to compensate for clinical heterogeneity in ALS could signif‑cantly increase the trial power.NCT00868166,registered March23,2009;NCT02306590,registered December 2,2014.