Normality testing is a fundamental hypothesis test in the statistical analysis of key biological indicators of diabetes.If this assumption is violated,it may cause the test results to deviate from the true value,leadi...Normality testing is a fundamental hypothesis test in the statistical analysis of key biological indicators of diabetes.If this assumption is violated,it may cause the test results to deviate from the true value,leading to incorrect inferences and conclusions,and ultimately affecting the validity and accuracy of statistical inferences.Considering this,the study designs a unified analysis scheme for different data types based on parametric statistical test methods and non-parametric test methods.The data were grouped according to sample type and divided into discrete data and continuous data.To account for differences among subgroups,the conventional chi-squared test was used for discrete data.The normal distribution is the basis of many statistical methods;if the data does not follow a normal distribution,many statistical methods will fail or produce incorrect results.Therefore,before data analysis and modeling,the data were divided into normal and non-normal groups through normality testing.For normally distributed data,parametric statistical methods were used to judge the differences between groups.For non-normal data,non-parametric tests were employed to improve the accuracy of the analysis.Statistically significant indicators were retained according to the significance index P-value of the statistical test or corresponding statistics.These indicators were then combined with relevant medical background to further explore the etiology leading to the occurrence or transformation of diabetes status.展开更多
Choosing appropriate statistical tests is crucial but deciding which tests to use can be challenging. Different tests suit different types of data and research questions, so it is important to choose the right one. Kn...Choosing appropriate statistical tests is crucial but deciding which tests to use can be challenging. Different tests suit different types of data and research questions, so it is important to choose the right one. Knowing how to select an appropriate test can lead to more accurate results. Invalid results and misleading conclusions may be drawn from a study if an incorrect statistical test is used. Therefore, to avoid these it is essential to understand the nature of the data, the research question, and the assumptions of the tests before selecting one. This is because there are a wide variety of tests available. This paper provides a step-by-step approach to selecting the right statistical test for any study, with an explanation of when it is appropriate to use it and relevant examples of each statistical test. Furthermore, this guide provides a comprehensive overview of the assumptions of each test and what to do if these assumptions are violated.展开更多
Choosing appropriate statistical tests is crucial but deciding which tests to use can be challenging. Different tests suit different types of data and research questions, so it is important to choose the right one. Kn...Choosing appropriate statistical tests is crucial but deciding which tests to use can be challenging. Different tests suit different types of data and research questions, so it is important to choose the right one. Knowing how to select an appropriate test can lead to more accurate results. Invalid results and misleading conclusions may be drawn from a study if an incorrect statistical test is used. Therefore, to avoid these it is essential to understand the nature of the data, the research question, and the assumptions of the tests before selecting one. This is because there are a wide variety of tests available. This paper provides a step-by-step approach to selecting the right statistical test for any study, with an explanation of when it is appropriate to use it and relevant examples of each statistical test. Furthermore, this guide provides a comprehensive overview of the assumptions of each test and what to do if these assumptions are violated.展开更多
This paper describes the statistical methods of the comparison of the incidence or mortality rates in cancer registry and descriptive epidemiology, and the features of microcomputer program (CANTEST) which was designe...This paper describes the statistical methods of the comparison of the incidence or mortality rates in cancer registry and descriptive epidemiology, and the features of microcomputer program (CANTEST) which was designed to perform the methods. The program was written in IBM BASIC language. Using the program CANTEST we presented here the user can do several statistical tests or estimations as follow: 1. the comparison of the adjusted rates which were calculated by directly or indirectly standardized methods, 2. the calculation of the slope of regression line for testing the linear trends of the adjusted rates, 3. the estimation of the 95% or 99%conndence intervals of the directly adjusted rates, of the cumulative rates (0-64 and 0-74), and of the cumulative risk. Several examples are presented for testing the performances of the program.展开更多
We are very grateful for the letter written by Dr Lange,and indeed apologize for the mistakes noted in the word-ing of our text regarding statistical analysis.This wasdue to changes carried out while revising the manu...We are very grateful for the letter written by Dr Lange,and indeed apologize for the mistakes noted in the word-ing of our text regarding statistical analysis.This wasdue to changes carried out while revising the manuscriptat the request of reviewers,whom we thank for,point-ing out several issues that were actually similar to thosenoted by Dr.Lange.Unfortunately,we were unable todescribe and discuss our findings properly in the context展开更多
The problem of identifying differential activity such as in gene expression is a major defeat in biostatistics and bioinformatics. Equally important, however much less frequently studied, is the question of similar ac...The problem of identifying differential activity such as in gene expression is a major defeat in biostatistics and bioinformatics. Equally important, however much less frequently studied, is the question of similar activity from one biological condition to another. The fold- change, or ratio, is usually considered a relevant criterion for stating difference and similarity between measurements. Importantly, no statistical method for concomitant evaluation of similarity and distinctness currently exists for biological applications. Modern micro- array, digital PCR (dPCR), and Next-Generation Sequencing (NGS) technologies frequently provide a means of coeff^cient of variation estimation for individual measurements. Using fold-change, and by making the assumption that measurements are normally distributed with known variances, we designed a novel statistical test that allows us to detect concomitantly, thus using the same formalism, differ- entially and similarly expressed genes (http:]]cds.ihes.fr). Given two sets of gene measurements in different biological conditions, the probabilities of making type I and type II errors in stating that a gene is differentially or similarly expressed from one condition to the other can be calculated. Furthermore, a confidence interval for the fold-change can be delineated. Finally, we demonstrate that the assumption of normality can be relaxed to consider arbitrary distributions numerically. The Concomitant evaluation of Distinctness and Similarity (CDS) statistical test correctly estimates similarities and differences between measurements of gene expression. The imple- mentation, being time and memory efficient, allows the use of the CDS test in high-throughput data analysis such as microarray, dPCR, and NGS experiments. Importantly, the CDS test can be applied to the comparison of single measurements (N = 1) provided the var- iance (or coefficient of variation) of the signals is known, making CDS a valuable tool also in biomedical analysis where typically a single measurement per subject is available.展开更多
Search-based statistical structural testing(SBSST)is a promising technique that uses automated search to construct input distributions for statistical structural testing.It has been proved that a simple search algorit...Search-based statistical structural testing(SBSST)is a promising technique that uses automated search to construct input distributions for statistical structural testing.It has been proved that a simple search algorithm,for example,the hill-climber is able to optimize an input distribution.However,due to the noisy fitness estimation of the minimum triggering probability among all cover elements(Tri-Low-Bound),the existing approach does not show a satisfactory efficiency.Constructing input distributions to satisfy the Tri-Low-Bound criterion requires an extensive computation time.Tri-Low-Bound is considered a strong criterion,and it is demonstrated to sustain a high fault-detecting ability.This article tries to answer the following question:if we use a relaxed constraint that significantly reduces the time consumption on search,can the optimized input distribution still be effective in faultdetecting ability?In this article,we propose a type of criterion called fairnessenhanced-sum-of-triggering-probability(p-L1-Max).The criterion utilizes the sum of triggering probabilities as the fitness value and leverages a parameter p to adjust the uniformness of test data generation.We conducted extensive experiments to compare the computation time and the fault-detecting ability between the two criteria.The result shows that the 1.0-L1-Max criterion has the highest efficiency,and it is more practical to use than the Tri-Low-Bound criterion.To measure a criterion’s fault-detecting ability,we introduce a definition of expected faults found in the effective test set size region.To measure the effective test set size region,we present a theoretical analysis of the expected faults found with respect to various test set sizes and use the uniform distribution as a baseline to derive the effective test set size region’s definition.展开更多
Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subse...Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.展开更多
Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations inc...Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations include the Ångström-Prescott linear model and four amongst its derivatives, i.e. logarithmic, exponential, power and quadratic functions. Monthly mean values of daily global solar radiation and sunshine duration data for a period of 20 to 23 years, from the Geographical Institute of Burundi (IGEBU), have been used. For any of the six stations, ten single or double linear regressions have been developed from the above-said five functions, to relate in terms of monthly mean values, the daily clearness index () to each of the next two kinds of relative sunshine duration (RSD): and . In those ratios, G<sub>0</sub>, S<sub>0 </sub>and stand for the extraterrestrial daily solar radiation on a horizontal surface, the day length and the modified day length taking into account the natural site’s horizon, respectively. According to the calculated mean values of the clearness index and the RSD, each station experiences a high number of fairly clear (or partially cloudy) days. Estimated values of the dependent variable (y) in each developed linear regression, have been compared to measured values in terms of the coefficients of correlation (R) and of determination (R<sub>2</sub>), the mean bias error (MBE), the root mean square error (RMSE) and the t-statistics. Mean values of these statistical indicators have been used to rank, according to decreasing performance level, firstly the ten developed equations per station on account of the overall six stations, secondly the six stations on account of the overall ten equations. Nevertheless, the obtained values of those indicators lay in the next ranges for all the developed sixty equations:;;;, with . These results lead to assert that any of the sixty developed linear regressions (and thus equations in terms of and ), fits very adequately measured data, and should be used to estimate monthly average daily global solar radiation with sunshine duration for the relevant station. It is also found that using as RSD, is slightly more advantageous than using for estimating the monthly average daily clearness index, . Moreover, values of statistical indicators of this study match adequately data from other works on the same kinds of empirical equations.展开更多
As the increasing popularity and complexity of Web applications and the emergence of their new characteristics, the testing and maintenance of large, complex Web applications are becoming more complex and difficult. W...As the increasing popularity and complexity of Web applications and the emergence of their new characteristics, the testing and maintenance of large, complex Web applications are becoming more complex and difficult. Web applications generally contain lots of pages and are used by enormous users. Statistical testing is an effective way of ensuring their quality. Web usage can be accurately described by Markov chain which has been proved to be an ideal model for software statistical testing. The results of unit testing can be utilized in the latter stages, which is an important strategy for bottom-to-top integration testing, and the other improvement of extended Markov chain model (EMM) is to present the error type vector which is treated as a part of page node. this paper also proposes the algorithm for generating test cases of usage paths. Finally, optional usage reliability evaluation methods and an incremental usability regression testing model for testing and evaluation are presented. Key words statistical testing - evaluation for Web usability - extended Markov chain model (EMM) - Web log mining - reliability evaluation CLC number TP311. 5 Foundation item: Supported by the National Defence Research Project (No. 41315. 9. 2) and National Science and Technology Plan (2001BA102A04-02-03)Biography: MAO Cheng-ying (1978-), male, Ph.D. candidate, research direction: software testing. Research direction: advanced database system, software testing, component technology and data mining.展开更多
“Human-elephant conflict(HEC)”,the alarming issue,in present day context has attracted the attention of environmentalists and policy makers.The rising conflict between human beings and wild elephants is common in Bu...“Human-elephant conflict(HEC)”,the alarming issue,in present day context has attracted the attention of environmentalists and policy makers.The rising conflict between human beings and wild elephants is common in Buxa Tiger Reserve(BTR)and its adjoining area in West Bengal State,India,making the area volatile.People’s attitudes towards elephant conservation activity are very crucial to get rid of HEC,because people’s proximity with wild elephants’habitat can trigger the occurrence of HEC.The aim of this study is to conduct an in-depth investigation about the association of people’s attitudes towards HEC with their locational,demographic,and socio-economic characteristics in BTR and its adjoining area by using Pearson’s bivariate chi-square test and binary logistic regression analysis.BTR is one of the constituent parts of Eastern Doors Elephant Reserve(EDER).We interviewed 500 respondents to understand their perceptions to HEC and investigated their locational,demographic,and socio-economic characteristics including location of village,gender,age,ethnicity,religion,caste,poverty level,education level,primary occupation,secondary occupation,household type,and source of firewood.The results indicate that respondents who are living in enclave forest villages(EFVs),peripheral forest villages(PFVs),corridor village(CVs),or forest and corridor villages(FCVs),mainly males,at the age of 18–48 years old,engaged with agriculture occupation,and living in kancha and mixed houses,have more likelihood to witness HEC.Besides,respondents who are illiterate or at primary education level are more likely to regard elephant as a main problematic animal around their villages and refuse to participate in elephant conservation activity.For the sake of a sustainable environment for both human beings and wildlife,people’s attitudes towards elephants must be friendly in a more prudent way,so that the two communities can live in harmony.展开更多
The population of northern Côte d’Ivoire, especially in the white Bandama watershed, lives for majority in rural areas and depends on farming, which is mainly linked to climate variability. This study evaluat...The population of northern Côte d’Ivoire, especially in the white Bandama watershed, lives for majority in rural areas and depends on farming, which is mainly linked to climate variability. This study evaluates the trends within watershed’s hydro-climatic variables and their level of significance over the period 1950-2000. The methodological approach consists in applying successively standardized indexes to detect trends and breaks in hydro-climatic long-term data. The Mann-Kendall statistical test lets us know the trends significance and the Kendall-Theil Robust Line test reveals their magnitude. The Student’s t test underlines break years. Results show that although rainfall has decreased, this decline is not statistically significant. However, temperature and potential evapotranspiration have strongly rised and discharge was submitted to high decline. These changes in hydrometeorological variables appeared from 1970 to 1980. This study is different from others conducted on climate variability in the northern Côte d’Ivoire by the methodological statistical framework implemented and the understanding of significance level of climate trends. Until now, authors used the standardized index to detect trends in hydro-climatic parameters. For this work, we added the Mann-Kendall statistical test to assess the significance level of these trends at α = 5% and 10%. Then, the Kendall-Theil statistical test was used to highlight the trends magnitude and the student’s t test to know the break years.展开更多
In pulsar timing, timing residuals are the differences between the observed times of arrival and predictions from the timing model. A comprehensive timing model will produce featureless resid- uals, which are presumab...In pulsar timing, timing residuals are the differences between the observed times of arrival and predictions from the timing model. A comprehensive timing model will produce featureless resid- uals, which are presumably composed of dominating noise and weak physical effects excluded from the timing model (e.g. gravitational waves). In order to apply optimal statistical methods for detecting weak gravitational wave signals, we need to know the statistical properties of noise components in the residuals. In this paper we utilize a variety of non-parametric statistical tests to analyze the whiteness and Gaussianity of the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) 5- year timing data, which are obtained from Arecibo Observatory and Green Bank Telescope from 2005 to 2010. We find that most of the data are consistent with white noise; many data deviate from Gaussianity at different levels, nevertheless, removing outliers in some pulsars will mitigate the deviations.展开更多
In this article, the unit root test for AR(p) model with GARCH errors is considered. The Dickey-Fuller test statistics are rewritten in the form of self-normalized sums, and the asymptotic distribution of the test s...In this article, the unit root test for AR(p) model with GARCH errors is considered. The Dickey-Fuller test statistics are rewritten in the form of self-normalized sums, and the asymptotic distribution of the test statistics is derived under the weak conditions.展开更多
We investigate redshift distributions of three long burst samples, with the first sample containing 131 long bursts with observed redshifts, the second including 220 long bursts with pseudo-redshifts calculated by the...We investigate redshift distributions of three long burst samples, with the first sample containing 131 long bursts with observed redshifts, the second including 220 long bursts with pseudo-redshifts calculated by the variability-luminosity relation, and the third including 1194 long bursts with pseudo-redshifls calculated by the lag-luminosity relation, respectively. In the redshift range 0-1 the Kolmogorov-Smirnov probability of the observed redshift distribution and that of the variability-luminosity relation is large. In the redshift ranges 1-2, 2-3, 3-6.3 and 0-37, the Kolmogorov-Smirnov probabilities of the redshift distribution from lag-luminosity relation and the observed redshift distribution are also large. For the GRBs, which appear both in the two pseudo-redshift burst samples, the KS probability of the pseudo-redshift distribution from the lag-luminosity relation and the observed reshift distribution is 0.447, which is very large. Based on these results, some conclusions are drawn: i) the V-Liso relation might be more believable than the τ-Liso relation in low redshift ranges and the τ-Liso relation might be more real than the V-Liso relation in high redshift ranges; ii) if we do not consider the redshift ranges, the τ-Liso relation might be more physical and intrinsical than the V-Liso relation.展开更多
With the increasing popularity of high-resolution remote sensing images,the remote sensing image retrieval(RSIR)has always been a topic of major issue.A combined,global non-subsampled shearlet transform(NSST)-domain s...With the increasing popularity of high-resolution remote sensing images,the remote sensing image retrieval(RSIR)has always been a topic of major issue.A combined,global non-subsampled shearlet transform(NSST)-domain statistical features(NSSTds)and local three dimensional local ternary pattern(3D-LTP)features,is proposed for high-resolution remote sensing images.We model the NSST image coefficients of detail subbands using 2-state laplacian mixture(LM)distribution and its three parameters are estimated using Expectation-Maximization(EM)algorithm.We also calculate the statistical parameters such as subband kurtosis and skewness from detail subbands along with mean and standard deviation calculated from approximation subband,and concatenate all of them with the 2-state LM parameters to describe the global features of the image.The various properties of NSST such as multiscale,localization and flexible directional sensitivity make it a suitable choice to provide an effective approximation of an image.In order to extract the dense local features,a new 3D-LTP is proposed where dimension reduction is performed via selection of‘uniform’patterns.The 3D-LTP is calculated from spatial RGB planes of the input image.The proposed inter-channel 3D-LTP not only exploits the local texture information but the color information is captured too.Finally,a fused feature representation(NSSTds-3DLTP)is proposed using new global(NSSTds)and local(3D-LTP)features to enhance the discriminativeness of features.The retrieval performance of proposed NSSTds-3DLTP features are tested on three challenging remote sensing image datasets such as WHU-RS19,Aerial Image Dataset(AID)and PatternNet in terms of mean average precision(MAP),average normalized modified retrieval rank(ANMRR)and precision-recall(P-R)graph.The experimental results are encouraging and the NSSTds-3DLTP features leads to superior retrieval performance compared to many well known existing descriptors such as Gabor RGB,Granulometry,local binary pattern(LBP),Fisher vector(FV),vector of locally aggregated descriptors(VLAD)and median robust extended local binary pattern(MRELBP).For WHU-RS19 dataset,in terms of{MAP,ANMRR},the NSSTds-3DLTP improves upon Gabor RGB,Granulometry,LBP,FV,VLAD and MRELBP descriptors by{41.93%,20.87%},{92.30%,32.68%},{86.14%,31.97%},{18.18%,15.22%},{8.96%,19.60%}and{15.60%,13.26%},respectively.For AID,in terms of{MAP,ANMRR},the NSSTds-3DLTP improves upon Gabor RGB,Granulometry,LBP,FV,VLAD and MRELBP descriptors by{152.60%,22.06%},{226.65%,25.08%},{185.03%,23.33%},{80.06%,12.16%},{50.58%,10.49%}and{62.34%,3.24%},respectively.For PatternNet,the NSSTds-3DLTP respectively improves upon Gabor RGB,Granulometry,LBP,FV,VLAD and MRELBP descriptors by{32.79%,10.34%},{141.30%,24.72%},{17.47%,10.34%},{83.20%,19.07%},{21.56%,3.60%},and{19.30%,0.48%}in terms of{MAP,ANMRR}.The moderate dimensionality of simple NSSTds-3DLTP allows the system to run in real-time.展开更多
This paper investigates the correlation between tidal stress and earthquakes for periods ranging from hours to months in the limited zone of the Palu region(Central Sulawesi,Indonesia).Through Schuster and binomial te...This paper investigates the correlation between tidal stress and earthquakes for periods ranging from hours to months in the limited zone of the Palu region(Central Sulawesi,Indonesia).Through Schuster and binomial tests,we examined the relation between the seismicity(time density of seismic events)and tidal potential arising from the Moon and Sun,using all tidal components simultaneously and focusing on the estimation of specific terms.The results show significant correlations between the seismicity and tidal potential for S2(0.5 d)and O1(1.075 d)tidal components in the case of solely isolated earthquake events,particularly for shallow earthquakes.Meanwhile,there is a strong relationship between aftershocks and tidal components larger than the Mf period(13.661 d).Finally,the analysis of the temporal variation of the earthquake-tide relation reveals an optimal correlation for about six years before the 2018 great Palu earthquake.The correlation becomes insignificant afterwards.展开更多
This paper develops the modeling of wind speed by Weibull distribution in the intention to evaluate wind energy potential and help for designing small wind energy plant in Batouri in Cameroon. The Weibull distribution...This paper develops the modeling of wind speed by Weibull distribution in the intention to evaluate wind energy potential and help for designing small wind energy plant in Batouri in Cameroon. The Weibull distribution model was developed using wind speed data collected from a metrological station at the small Airport of Batouri. Four numerical methods (Moment method, Graphical method, Empirical method and Energy pattern factor method) were used to estimate weibull parameters K and C. The application of these four methods is effective using a sample wind speed data set. With some statistical analysis, a comparison of the accuracy of each method is also performed. The study helps to determine that Energy pattern factor method is the most effective (K = 3.8262 and C = 2.4659).展开更多
The aim of this study is to establish the prevailing conditions of changing climatic trends and change point dates in four selected meteorological stations of Uyo, Benin, Port Harcourt, and Warri in the Niger Delta re...The aim of this study is to establish the prevailing conditions of changing climatic trends and change point dates in four selected meteorological stations of Uyo, Benin, Port Harcourt, and Warri in the Niger Delta region of Nigeria. Using daily or 24-hourly annual maximum series (AMS) data with the Indian Meteorological Department (IMD) and the modified Chowdury Indian Meteorological Department (MCIMD) models were adopted to downscale the time series data. Mann-Kendall (MK) trend and Sen’s Slope Estimator (SSE) test showed a statistically significant trend for Uyo and Benin, while Port Harcourt and Warri showed mild trends. The Sen’s Slope magnitude and variation rate were 21.6, 10.8, 6.00 and 4.4 mm/decade, respectively. The trend change-point analysis showed the initial rainfall change-point dates as 2002, 2005, 1988, and 2000 for Uyo, Benin, Port Harcourt, and Warri, respectively. These prove positive changing climatic conditions for rainfall in the study area. Erosion and flood control facilities analysis and design in the Niger Delta will require the application of Non-stationary IDF modelling.展开更多
The study focused on the detection of indicators of climate change in 24-hourly annual maximum series (AMS) rainfall data collected for 36 years (1982-2017) for Warri Township, using different statistical methods yiel...The study focused on the detection of indicators of climate change in 24-hourly annual maximum series (AMS) rainfall data collected for 36 years (1982-2017) for Warri Township, using different statistical methods yielded a statistically insignificant positive mild trend. The IMD and MCIMD downscaled model’s time series data respectively produced MK statistics varying from 1.403 to 1.4729, and 1.403 to 1.463 which were less than the critical Z-value of 1.96. Also, the slope magnitude obtained showed a mild increasing trend in variation from 0.0189 to 0.3713, and 0.0175 to 0.5426, with the rate of change in rainfall intensity at 24 hours duration as 0.4536 and 0.42 mm/hr.year (4.536 and 4.2 mm/decade) for the IMD and the MCIMD time series data, respectively. The trend change point date occurred in the year 2000 from the distribution-free CUSUM test with the trend maintaining a significant and steady increase from 2010 to 2015. Thus, this study established the existence of a trend, which is an indication of a changing climate, and satisfied the condition for rainfall Non-stationary intensity-duration-frequency (NS-IDF) modeling required for infrastructural design for combating flooding events.展开更多
基金National Natural Science Foundation of China(No.12271261)Postgraduate Research and Practice Innovation Program of Jiangsu Province,China(Grant No.SJCX230368)。
文摘Normality testing is a fundamental hypothesis test in the statistical analysis of key biological indicators of diabetes.If this assumption is violated,it may cause the test results to deviate from the true value,leading to incorrect inferences and conclusions,and ultimately affecting the validity and accuracy of statistical inferences.Considering this,the study designs a unified analysis scheme for different data types based on parametric statistical test methods and non-parametric test methods.The data were grouped according to sample type and divided into discrete data and continuous data.To account for differences among subgroups,the conventional chi-squared test was used for discrete data.The normal distribution is the basis of many statistical methods;if the data does not follow a normal distribution,many statistical methods will fail or produce incorrect results.Therefore,before data analysis and modeling,the data were divided into normal and non-normal groups through normality testing.For normally distributed data,parametric statistical methods were used to judge the differences between groups.For non-normal data,non-parametric tests were employed to improve the accuracy of the analysis.Statistically significant indicators were retained according to the significance index P-value of the statistical test or corresponding statistics.These indicators were then combined with relevant medical background to further explore the etiology leading to the occurrence or transformation of diabetes status.
文摘Choosing appropriate statistical tests is crucial but deciding which tests to use can be challenging. Different tests suit different types of data and research questions, so it is important to choose the right one. Knowing how to select an appropriate test can lead to more accurate results. Invalid results and misleading conclusions may be drawn from a study if an incorrect statistical test is used. Therefore, to avoid these it is essential to understand the nature of the data, the research question, and the assumptions of the tests before selecting one. This is because there are a wide variety of tests available. This paper provides a step-by-step approach to selecting the right statistical test for any study, with an explanation of when it is appropriate to use it and relevant examples of each statistical test. Furthermore, this guide provides a comprehensive overview of the assumptions of each test and what to do if these assumptions are violated.
文摘Choosing appropriate statistical tests is crucial but deciding which tests to use can be challenging. Different tests suit different types of data and research questions, so it is important to choose the right one. Knowing how to select an appropriate test can lead to more accurate results. Invalid results and misleading conclusions may be drawn from a study if an incorrect statistical test is used. Therefore, to avoid these it is essential to understand the nature of the data, the research question, and the assumptions of the tests before selecting one. This is because there are a wide variety of tests available. This paper provides a step-by-step approach to selecting the right statistical test for any study, with an explanation of when it is appropriate to use it and relevant examples of each statistical test. Furthermore, this guide provides a comprehensive overview of the assumptions of each test and what to do if these assumptions are violated.
文摘This paper describes the statistical methods of the comparison of the incidence or mortality rates in cancer registry and descriptive epidemiology, and the features of microcomputer program (CANTEST) which was designed to perform the methods. The program was written in IBM BASIC language. Using the program CANTEST we presented here the user can do several statistical tests or estimations as follow: 1. the comparison of the adjusted rates which were calculated by directly or indirectly standardized methods, 2. the calculation of the slope of regression line for testing the linear trends of the adjusted rates, 3. the estimation of the 95% or 99%conndence intervals of the directly adjusted rates, of the cumulative rates (0-64 and 0-74), and of the cumulative risk. Several examples are presented for testing the performances of the program.
文摘We are very grateful for the letter written by Dr Lange,and indeed apologize for the mistakes noted in the word-ing of our text regarding statistical analysis.This wasdue to changes carried out while revising the manuscriptat the request of reviewers,whom we thank for,point-ing out several issues that were actually similar to thosenoted by Dr.Lange.Unfortunately,we were unable todescribe and discuss our findings properly in the context
基金funds from the Centre National de la Recherche Scientifique,the Agence Nationale pour la Recherche(Grant No.ANR-07-PHYSIO-013-01)the Fondation pour la Recherche sur l'Hypertension Arterielle (Grant No.AO 2007)the Agence Nationale de Recherches sur le SIDA et les hepatites virales (ANRS) and the Genopole Evry (all awarded to AB),JFBG was recipient of a CONACYTMexico PhD Fellowship (Grant No.207676/302245)
文摘The problem of identifying differential activity such as in gene expression is a major defeat in biostatistics and bioinformatics. Equally important, however much less frequently studied, is the question of similar activity from one biological condition to another. The fold- change, or ratio, is usually considered a relevant criterion for stating difference and similarity between measurements. Importantly, no statistical method for concomitant evaluation of similarity and distinctness currently exists for biological applications. Modern micro- array, digital PCR (dPCR), and Next-Generation Sequencing (NGS) technologies frequently provide a means of coeff^cient of variation estimation for individual measurements. Using fold-change, and by making the assumption that measurements are normally distributed with known variances, we designed a novel statistical test that allows us to detect concomitantly, thus using the same formalism, differ- entially and similarly expressed genes (http:]]cds.ihes.fr). Given two sets of gene measurements in different biological conditions, the probabilities of making type I and type II errors in stating that a gene is differentially or similarly expressed from one condition to the other can be calculated. Furthermore, a confidence interval for the fold-change can be delineated. Finally, we demonstrate that the assumption of normality can be relaxed to consider arbitrary distributions numerically. The Concomitant evaluation of Distinctness and Similarity (CDS) statistical test correctly estimates similarities and differences between measurements of gene expression. The imple- mentation, being time and memory efficient, allows the use of the CDS test in high-throughput data analysis such as microarray, dPCR, and NGS experiments. Importantly, the CDS test can be applied to the comparison of single measurements (N = 1) provided the var- iance (or coefficient of variation) of the signals is known, making CDS a valuable tool also in biomedical analysis where typically a single measurement per subject is available.
基金Publication of this article in an open access journal was funded by the Portland State University Library’s Open Access Fund.
文摘Search-based statistical structural testing(SBSST)is a promising technique that uses automated search to construct input distributions for statistical structural testing.It has been proved that a simple search algorithm,for example,the hill-climber is able to optimize an input distribution.However,due to the noisy fitness estimation of the minimum triggering probability among all cover elements(Tri-Low-Bound),the existing approach does not show a satisfactory efficiency.Constructing input distributions to satisfy the Tri-Low-Bound criterion requires an extensive computation time.Tri-Low-Bound is considered a strong criterion,and it is demonstrated to sustain a high fault-detecting ability.This article tries to answer the following question:if we use a relaxed constraint that significantly reduces the time consumption on search,can the optimized input distribution still be effective in faultdetecting ability?In this article,we propose a type of criterion called fairnessenhanced-sum-of-triggering-probability(p-L1-Max).The criterion utilizes the sum of triggering probabilities as the fitness value and leverages a parameter p to adjust the uniformness of test data generation.We conducted extensive experiments to compare the computation time and the fault-detecting ability between the two criteria.The result shows that the 1.0-L1-Max criterion has the highest efficiency,and it is more practical to use than the Tri-Low-Bound criterion.To measure a criterion’s fault-detecting ability,we introduce a definition of expected faults found in the effective test set size region.To measure the effective test set size region,we present a theoretical analysis of the expected faults found with respect to various test set sizes and use the uniform distribution as a baseline to derive the effective test set size region’s definition.
基金supported in part by NIH grants R01NS39600,U01MH114829RF1MH128693(to GAA)。
文摘Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.
文摘Sunshine duration (S) based empirical equations have been employed in this study to estimate the daily global solar radiation on a horizontal surface (G) for six meteorological stations in Burundi. Those equations include the Ångström-Prescott linear model and four amongst its derivatives, i.e. logarithmic, exponential, power and quadratic functions. Monthly mean values of daily global solar radiation and sunshine duration data for a period of 20 to 23 years, from the Geographical Institute of Burundi (IGEBU), have been used. For any of the six stations, ten single or double linear regressions have been developed from the above-said five functions, to relate in terms of monthly mean values, the daily clearness index () to each of the next two kinds of relative sunshine duration (RSD): and . In those ratios, G<sub>0</sub>, S<sub>0 </sub>and stand for the extraterrestrial daily solar radiation on a horizontal surface, the day length and the modified day length taking into account the natural site’s horizon, respectively. According to the calculated mean values of the clearness index and the RSD, each station experiences a high number of fairly clear (or partially cloudy) days. Estimated values of the dependent variable (y) in each developed linear regression, have been compared to measured values in terms of the coefficients of correlation (R) and of determination (R<sub>2</sub>), the mean bias error (MBE), the root mean square error (RMSE) and the t-statistics. Mean values of these statistical indicators have been used to rank, according to decreasing performance level, firstly the ten developed equations per station on account of the overall six stations, secondly the six stations on account of the overall ten equations. Nevertheless, the obtained values of those indicators lay in the next ranges for all the developed sixty equations:;;;, with . These results lead to assert that any of the sixty developed linear regressions (and thus equations in terms of and ), fits very adequately measured data, and should be used to estimate monthly average daily global solar radiation with sunshine duration for the relevant station. It is also found that using as RSD, is slightly more advantageous than using for estimating the monthly average daily clearness index, . Moreover, values of statistical indicators of this study match adequately data from other works on the same kinds of empirical equations.
文摘As the increasing popularity and complexity of Web applications and the emergence of their new characteristics, the testing and maintenance of large, complex Web applications are becoming more complex and difficult. Web applications generally contain lots of pages and are used by enormous users. Statistical testing is an effective way of ensuring their quality. Web usage can be accurately described by Markov chain which has been proved to be an ideal model for software statistical testing. The results of unit testing can be utilized in the latter stages, which is an important strategy for bottom-to-top integration testing, and the other improvement of extended Markov chain model (EMM) is to present the error type vector which is treated as a part of page node. this paper also proposes the algorithm for generating test cases of usage paths. Finally, optional usage reliability evaluation methods and an incremental usability regression testing model for testing and evaluation are presented. Key words statistical testing - evaluation for Web usability - extended Markov chain model (EMM) - Web log mining - reliability evaluation CLC number TP311. 5 Foundation item: Supported by the National Defence Research Project (No. 41315. 9. 2) and National Science and Technology Plan (2001BA102A04-02-03)Biography: MAO Cheng-ying (1978-), male, Ph.D. candidate, research direction: software testing. Research direction: advanced database system, software testing, component technology and data mining.
文摘“Human-elephant conflict(HEC)”,the alarming issue,in present day context has attracted the attention of environmentalists and policy makers.The rising conflict between human beings and wild elephants is common in Buxa Tiger Reserve(BTR)and its adjoining area in West Bengal State,India,making the area volatile.People’s attitudes towards elephant conservation activity are very crucial to get rid of HEC,because people’s proximity with wild elephants’habitat can trigger the occurrence of HEC.The aim of this study is to conduct an in-depth investigation about the association of people’s attitudes towards HEC with their locational,demographic,and socio-economic characteristics in BTR and its adjoining area by using Pearson’s bivariate chi-square test and binary logistic regression analysis.BTR is one of the constituent parts of Eastern Doors Elephant Reserve(EDER).We interviewed 500 respondents to understand their perceptions to HEC and investigated their locational,demographic,and socio-economic characteristics including location of village,gender,age,ethnicity,religion,caste,poverty level,education level,primary occupation,secondary occupation,household type,and source of firewood.The results indicate that respondents who are living in enclave forest villages(EFVs),peripheral forest villages(PFVs),corridor village(CVs),or forest and corridor villages(FCVs),mainly males,at the age of 18–48 years old,engaged with agriculture occupation,and living in kancha and mixed houses,have more likelihood to witness HEC.Besides,respondents who are illiterate or at primary education level are more likely to regard elephant as a main problematic animal around their villages and refuse to participate in elephant conservation activity.For the sake of a sustainable environment for both human beings and wildlife,people’s attitudes towards elephants must be friendly in a more prudent way,so that the two communities can live in harmony.
基金supported by the Swiss Confederation through the excellence scholarship for foreign students obtained by Franck Zokou YAO.
文摘The population of northern Côte d’Ivoire, especially in the white Bandama watershed, lives for majority in rural areas and depends on farming, which is mainly linked to climate variability. This study evaluates the trends within watershed’s hydro-climatic variables and their level of significance over the period 1950-2000. The methodological approach consists in applying successively standardized indexes to detect trends and breaks in hydro-climatic long-term data. The Mann-Kendall statistical test lets us know the trends significance and the Kendall-Theil Robust Line test reveals their magnitude. The Student’s t test underlines break years. Results show that although rainfall has decreased, this decline is not statistically significant. However, temperature and potential evapotranspiration have strongly rised and discharge was submitted to high decline. These changes in hydrometeorological variables appeared from 1970 to 1980. This study is different from others conducted on climate variability in the northern Côte d’Ivoire by the methodological statistical framework implemented and the understanding of significance level of climate trends. Until now, authors used the standardized index to detect trends in hydro-climatic parameters. For this work, we added the Mann-Kendall statistical test to assess the significance level of these trends at α = 5% and 10%. Then, the Kendall-Theil statistical test was used to highlight the trends magnitude and the student’s t test to know the break years.
基金supported by the National Science Foundation(NSF)under PIRE grant0968296support by the National Natural Science Foundation of China(Grant Nos.11503007,91636111 and 11690021)+2 种基金partial support through the New York Space Grant Consortiumsupport by NASA through the Einstein Fellowship grant PF4-150120upport from the JPL RTD program
文摘In pulsar timing, timing residuals are the differences between the observed times of arrival and predictions from the timing model. A comprehensive timing model will produce featureless resid- uals, which are presumably composed of dominating noise and weak physical effects excluded from the timing model (e.g. gravitational waves). In order to apply optimal statistical methods for detecting weak gravitational wave signals, we need to know the statistical properties of noise components in the residuals. In this paper we utilize a variety of non-parametric statistical tests to analyze the whiteness and Gaussianity of the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) 5- year timing data, which are obtained from Arecibo Observatory and Green Bank Telescope from 2005 to 2010. We find that most of the data are consistent with white noise; many data deviate from Gaussianity at different levels, nevertheless, removing outliers in some pulsars will mitigate the deviations.
基金National Natural Science Foundation of China(1047112610671176).
文摘In this article, the unit root test for AR(p) model with GARCH errors is considered. The Dickey-Fuller test statistics are rewritten in the form of self-normalized sums, and the asymptotic distribution of the test statistics is derived under the weak conditions.
基金supported by the National Natural Science Foundation of China(NSFC, No. 10473023)Scientific Research Fund of the Sichuan Provincial Education Department,the K. C. Wong Education Foundation (Hong Kong)the Jiangsu Planned Projects for PostdoctoralResearch Funds.
文摘We investigate redshift distributions of three long burst samples, with the first sample containing 131 long bursts with observed redshifts, the second including 220 long bursts with pseudo-redshifts calculated by the variability-luminosity relation, and the third including 1194 long bursts with pseudo-redshifls calculated by the lag-luminosity relation, respectively. In the redshift range 0-1 the Kolmogorov-Smirnov probability of the observed redshift distribution and that of the variability-luminosity relation is large. In the redshift ranges 1-2, 2-3, 3-6.3 and 0-37, the Kolmogorov-Smirnov probabilities of the redshift distribution from lag-luminosity relation and the observed redshift distribution are also large. For the GRBs, which appear both in the two pseudo-redshift burst samples, the KS probability of the pseudo-redshift distribution from the lag-luminosity relation and the observed reshift distribution is 0.447, which is very large. Based on these results, some conclusions are drawn: i) the V-Liso relation might be more believable than the τ-Liso relation in low redshift ranges and the τ-Liso relation might be more real than the V-Liso relation in high redshift ranges; ii) if we do not consider the redshift ranges, the τ-Liso relation might be more physical and intrinsical than the V-Liso relation.
文摘With the increasing popularity of high-resolution remote sensing images,the remote sensing image retrieval(RSIR)has always been a topic of major issue.A combined,global non-subsampled shearlet transform(NSST)-domain statistical features(NSSTds)and local three dimensional local ternary pattern(3D-LTP)features,is proposed for high-resolution remote sensing images.We model the NSST image coefficients of detail subbands using 2-state laplacian mixture(LM)distribution and its three parameters are estimated using Expectation-Maximization(EM)algorithm.We also calculate the statistical parameters such as subband kurtosis and skewness from detail subbands along with mean and standard deviation calculated from approximation subband,and concatenate all of them with the 2-state LM parameters to describe the global features of the image.The various properties of NSST such as multiscale,localization and flexible directional sensitivity make it a suitable choice to provide an effective approximation of an image.In order to extract the dense local features,a new 3D-LTP is proposed where dimension reduction is performed via selection of‘uniform’patterns.The 3D-LTP is calculated from spatial RGB planes of the input image.The proposed inter-channel 3D-LTP not only exploits the local texture information but the color information is captured too.Finally,a fused feature representation(NSSTds-3DLTP)is proposed using new global(NSSTds)and local(3D-LTP)features to enhance the discriminativeness of features.The retrieval performance of proposed NSSTds-3DLTP features are tested on three challenging remote sensing image datasets such as WHU-RS19,Aerial Image Dataset(AID)and PatternNet in terms of mean average precision(MAP),average normalized modified retrieval rank(ANMRR)and precision-recall(P-R)graph.The experimental results are encouraging and the NSSTds-3DLTP features leads to superior retrieval performance compared to many well known existing descriptors such as Gabor RGB,Granulometry,local binary pattern(LBP),Fisher vector(FV),vector of locally aggregated descriptors(VLAD)and median robust extended local binary pattern(MRELBP).For WHU-RS19 dataset,in terms of{MAP,ANMRR},the NSSTds-3DLTP improves upon Gabor RGB,Granulometry,LBP,FV,VLAD and MRELBP descriptors by{41.93%,20.87%},{92.30%,32.68%},{86.14%,31.97%},{18.18%,15.22%},{8.96%,19.60%}and{15.60%,13.26%},respectively.For AID,in terms of{MAP,ANMRR},the NSSTds-3DLTP improves upon Gabor RGB,Granulometry,LBP,FV,VLAD and MRELBP descriptors by{152.60%,22.06%},{226.65%,25.08%},{185.03%,23.33%},{80.06%,12.16%},{50.58%,10.49%}and{62.34%,3.24%},respectively.For PatternNet,the NSSTds-3DLTP respectively improves upon Gabor RGB,Granulometry,LBP,FV,VLAD and MRELBP descriptors by{32.79%,10.34%},{141.30%,24.72%},{17.47%,10.34%},{83.20%,19.07%},{21.56%,3.60%},and{19.30%,0.48%}in terms of{MAP,ANMRR}.The moderate dimensionality of simple NSSTds-3DLTP allows the system to run in real-time.
文摘This paper investigates the correlation between tidal stress and earthquakes for periods ranging from hours to months in the limited zone of the Palu region(Central Sulawesi,Indonesia).Through Schuster and binomial tests,we examined the relation between the seismicity(time density of seismic events)and tidal potential arising from the Moon and Sun,using all tidal components simultaneously and focusing on the estimation of specific terms.The results show significant correlations between the seismicity and tidal potential for S2(0.5 d)and O1(1.075 d)tidal components in the case of solely isolated earthquake events,particularly for shallow earthquakes.Meanwhile,there is a strong relationship between aftershocks and tidal components larger than the Mf period(13.661 d).Finally,the analysis of the temporal variation of the earthquake-tide relation reveals an optimal correlation for about six years before the 2018 great Palu earthquake.The correlation becomes insignificant afterwards.
文摘This paper develops the modeling of wind speed by Weibull distribution in the intention to evaluate wind energy potential and help for designing small wind energy plant in Batouri in Cameroon. The Weibull distribution model was developed using wind speed data collected from a metrological station at the small Airport of Batouri. Four numerical methods (Moment method, Graphical method, Empirical method and Energy pattern factor method) were used to estimate weibull parameters K and C. The application of these four methods is effective using a sample wind speed data set. With some statistical analysis, a comparison of the accuracy of each method is also performed. The study helps to determine that Energy pattern factor method is the most effective (K = 3.8262 and C = 2.4659).
文摘The aim of this study is to establish the prevailing conditions of changing climatic trends and change point dates in four selected meteorological stations of Uyo, Benin, Port Harcourt, and Warri in the Niger Delta region of Nigeria. Using daily or 24-hourly annual maximum series (AMS) data with the Indian Meteorological Department (IMD) and the modified Chowdury Indian Meteorological Department (MCIMD) models were adopted to downscale the time series data. Mann-Kendall (MK) trend and Sen’s Slope Estimator (SSE) test showed a statistically significant trend for Uyo and Benin, while Port Harcourt and Warri showed mild trends. The Sen’s Slope magnitude and variation rate were 21.6, 10.8, 6.00 and 4.4 mm/decade, respectively. The trend change-point analysis showed the initial rainfall change-point dates as 2002, 2005, 1988, and 2000 for Uyo, Benin, Port Harcourt, and Warri, respectively. These prove positive changing climatic conditions for rainfall in the study area. Erosion and flood control facilities analysis and design in the Niger Delta will require the application of Non-stationary IDF modelling.
文摘The study focused on the detection of indicators of climate change in 24-hourly annual maximum series (AMS) rainfall data collected for 36 years (1982-2017) for Warri Township, using different statistical methods yielded a statistically insignificant positive mild trend. The IMD and MCIMD downscaled model’s time series data respectively produced MK statistics varying from 1.403 to 1.4729, and 1.403 to 1.463 which were less than the critical Z-value of 1.96. Also, the slope magnitude obtained showed a mild increasing trend in variation from 0.0189 to 0.3713, and 0.0175 to 0.5426, with the rate of change in rainfall intensity at 24 hours duration as 0.4536 and 0.42 mm/hr.year (4.536 and 4.2 mm/decade) for the IMD and the MCIMD time series data, respectively. The trend change point date occurred in the year 2000 from the distribution-free CUSUM test with the trend maintaining a significant and steady increase from 2010 to 2015. Thus, this study established the existence of a trend, which is an indication of a changing climate, and satisfied the condition for rainfall Non-stationary intensity-duration-frequency (NS-IDF) modeling required for infrastructural design for combating flooding events.