This article compares the size of selected subsets using nonparametric subset selection rules with two different scoring rules for the observations. The scoring rules are based on the expected values of order statisti...This article compares the size of selected subsets using nonparametric subset selection rules with two different scoring rules for the observations. The scoring rules are based on the expected values of order statistics of the uniform distribution (yielding rank values) and of the normal distribution (yielding normal score values). The comparison is made using state motor vehicle traffic fatality rates, published in a 2016 article, with fifty-one states (including DC as a state) and over a nineteen-year period (1994 through 2012). The earlier study considered four block design selection rules—two for choosing a subset to contain the “best” population (i.e., state with lowest mean fatality rate) and two for the “worst” population (i.e., highest mean rate) with a probability of correct selection chosen to be 0.90. Two selection rules based on normal scores resulted in selected subset sizes substantially smaller than corresponding rules based on ranks (7 vs. 16 and 3 vs. 12). For two other selection rules, the subsets chosen were very close in size (within one). A comparison is also made using state homicide rates, published in a 2022 article, with fifty states and covering eight years. The results are qualitatively the same as those obtained with the motor vehicle traffic fatality rates.展开更多
To screen for molecular signatures that are commonly dysregulated in subtypes of a certain cancer, a novel meta-analysis is designed to perform rank score (RS) on lists of genes that are derived from different studi...To screen for molecular signatures that are commonly dysregulated in subtypes of a certain cancer, a novel meta-analysis is designed to perform rank score (RS) on lists of genes that are derived from different studies. RS is a promising way to detect signatures across platforms when integrating with one vs. all (OVA) or one vs. one (OVO) schemes of comparison. Among six published microarray expression datasets on acute leukemia, the biological signals hereafter provide stronger clustering support than systematic differences among microarray platforms. Moreover, the pediatric BCR_ABL specific genes can be used to correctly discriminate independent adult BCR ABL cases. The obtained results redound to discover, validate and treat the subtypes from microarray gene expression profiles of cancer, which have been plentifully researched, such as leukemia.展开更多
Traditional precipitation skill scores are affected by the well-known"double penalty"problem caused by the slight spatial or temporal mismatches between forecasts and observations.The fuzzy(neighborhood)meth...Traditional precipitation skill scores are affected by the well-known"double penalty"problem caused by the slight spatial or temporal mismatches between forecasts and observations.The fuzzy(neighborhood)method has been proposed for deterministic simulations and shown some ability to solve this problem.The increasing resolution of ensemble forecasts of precipitation means that they now have similar problems as deterministic forecasts.We developed an ensemble precipitation verification skill score,i.e.,the Spatial Continuous Ranked Probability Score(SCRPS),and used it to extend spatial verification from deterministic into ensemble forecasts.The SCRPS is a spatial technique based on the Continuous Ranked Probability Score(CRPS)and the fuzzy method.A fast binomial random variation generator was used to obtain random indexes based on the climatological mean observed frequency,which were then used in the reference score to calculate the skill score of the SCRPS.The verification results obtained using daily forecast products from the ECMWF ensemble forecasts and quantitative precipitation estimation products from the OPERA datasets during June-August 2018 shows that the spatial score is not affected by the number of ensemble forecast members and that a consistent assessment can be obtained.The score can reflect the performance of ensemble forecasts in modeling precipitation and thus can be widely used.展开更多
Identifying the species composition of a microbial ecosystem is often hampered by difficulties in culturing the organisms and in the low sequencing depth of traditional DNA barcoding.Metagenomic analysis,a huge-scale ...Identifying the species composition of a microbial ecosystem is often hampered by difficulties in culturing the organisms and in the low sequencing depth of traditional DNA barcoding.Metagenomic analysis,a huge-scale nucleotide-sequence-based tool,can overcome such difficulties.In this study,Sanger sequencing of 500 nrITS clones uncovered 29 taxa of 19 fungal genera,whereas metagenomics with next-generation sequencing identified 512 operational taxonomic units(OTUs)for ITS1/2 and 364 for ITS3/4.Nevertheless,high throughput sequencing of PCR amplicons of ITS1/2,ITS3/4,nrLSU-LR,nrLSU-U,mtLSU,and mtATP6,all with at least 1,300×coverage and about 21 million reads in total,yielded a very diverse fungal composition.The fact that 74%of the OTUs were exclusively uncovered with single barcodes indicated that each marker provided its own insights into the fungal flora.To deal with the high heterogeneity in the data and to integrate the information on species composition across barcodes,a rank-scoring strategy was developed.Accordingly,205 genera among 64 orders of fungi were identified in healthy Phalaenopsis roots.Of the barcodes utilized,ITS1/2,ITS3/4,and nrLSU-U were the most competent in uncovering the fungal diversity.These barcodes,though detecting different compositions likely due to primer preference,provided complementary and comprehensive power in deciphering the microbial diversity,especially in revealing rare species.展开更多
As extensions of means, expectiles embrace all the distribution information of a random variable.The expectile regression is computationally friendlier because the asymmetric least square loss function is differentiab...As extensions of means, expectiles embrace all the distribution information of a random variable.The expectile regression is computationally friendlier because the asymmetric least square loss function is differentiable everywhere. This regression also enables effective estimation of the expectiles of a response variable when potential explanatory variables are given. In this study, we propose the partial functional linear expectile regression model. The slope function and constant coefficients are estimated by using the functional principal component basis. The convergence rate of the slope function and the asymptotic normality of the parameter vector are established. To inspect the effect of the parametric component on the response variable, we develop Wald-type and expectile rank score tests and establish their asymptotic properties. The finite performance of the proposed estimators and test statistics are evaluated through simulation study. Results indicate that the proposed estimators are comparable to competing estimation methods and the newly proposed expectile rank score test is useful. The methodologies are illustrated by using two real data examples.展开更多
文摘This article compares the size of selected subsets using nonparametric subset selection rules with two different scoring rules for the observations. The scoring rules are based on the expected values of order statistics of the uniform distribution (yielding rank values) and of the normal distribution (yielding normal score values). The comparison is made using state motor vehicle traffic fatality rates, published in a 2016 article, with fifty-one states (including DC as a state) and over a nineteen-year period (1994 through 2012). The earlier study considered four block design selection rules—two for choosing a subset to contain the “best” population (i.e., state with lowest mean fatality rate) and two for the “worst” population (i.e., highest mean rate) with a probability of correct selection chosen to be 0.90. Two selection rules based on normal scores resulted in selected subset sizes substantially smaller than corresponding rules based on ranks (7 vs. 16 and 3 vs. 12). For two other selection rules, the subsets chosen were very close in size (within one). A comparison is also made using state homicide rates, published in a 2022 article, with fifty states and covering eight years. The results are qualitatively the same as those obtained with the motor vehicle traffic fatality rates.
文摘To screen for molecular signatures that are commonly dysregulated in subtypes of a certain cancer, a novel meta-analysis is designed to perform rank score (RS) on lists of genes that are derived from different studies. RS is a promising way to detect signatures across platforms when integrating with one vs. all (OVA) or one vs. one (OVO) schemes of comparison. Among six published microarray expression datasets on acute leukemia, the biological signals hereafter provide stronger clustering support than systematic differences among microarray platforms. Moreover, the pediatric BCR_ABL specific genes can be used to correctly discriminate independent adult BCR ABL cases. The obtained results redound to discover, validate and treat the subtypes from microarray gene expression profiles of cancer, which have been plentifully researched, such as leukemia.
基金Natural Science Foundation of China(41905091)National Key R&D Program of China(2017YFA0604502,2017YFC1501904)
文摘Traditional precipitation skill scores are affected by the well-known"double penalty"problem caused by the slight spatial or temporal mismatches between forecasts and observations.The fuzzy(neighborhood)method has been proposed for deterministic simulations and shown some ability to solve this problem.The increasing resolution of ensemble forecasts of precipitation means that they now have similar problems as deterministic forecasts.We developed an ensemble precipitation verification skill score,i.e.,the Spatial Continuous Ranked Probability Score(SCRPS),and used it to extend spatial verification from deterministic into ensemble forecasts.The SCRPS is a spatial technique based on the Continuous Ranked Probability Score(CRPS)and the fuzzy method.A fast binomial random variation generator was used to obtain random indexes based on the climatological mean observed frequency,which were then used in the reference score to calculate the skill score of the SCRPS.The verification results obtained using daily forecast products from the ECMWF ensemble forecasts and quantitative precipitation estimation products from the OPERA datasets during June-August 2018 shows that the spatial score is not affected by the number of ensemble forecast members and that a consistent assessment can be obtained.The score can reflect the performance of ensemble forecasts in modeling precipitation and thus can be widely used.
基金This study was financially supported by the National Cheng Kung University and the National Science Council,Taiwan.
文摘Identifying the species composition of a microbial ecosystem is often hampered by difficulties in culturing the organisms and in the low sequencing depth of traditional DNA barcoding.Metagenomic analysis,a huge-scale nucleotide-sequence-based tool,can overcome such difficulties.In this study,Sanger sequencing of 500 nrITS clones uncovered 29 taxa of 19 fungal genera,whereas metagenomics with next-generation sequencing identified 512 operational taxonomic units(OTUs)for ITS1/2 and 364 for ITS3/4.Nevertheless,high throughput sequencing of PCR amplicons of ITS1/2,ITS3/4,nrLSU-LR,nrLSU-U,mtLSU,and mtATP6,all with at least 1,300×coverage and about 21 million reads in total,yielded a very diverse fungal composition.The fact that 74%of the OTUs were exclusively uncovered with single barcodes indicated that each marker provided its own insights into the fungal flora.To deal with the high heterogeneity in the data and to integrate the information on species composition across barcodes,a rank-scoring strategy was developed.Accordingly,205 genera among 64 orders of fungi were identified in healthy Phalaenopsis roots.Of the barcodes utilized,ITS1/2,ITS3/4,and nrLSU-U were the most competent in uncovering the fungal diversity.These barcodes,though detecting different compositions likely due to primer preference,provided complementary and comprehensive power in deciphering the microbial diversity,especially in revealing rare species.
基金supported by National Natural Science Foundation of China(Grant No.11771032)Natural Science Foundation of Shanxi Province of China(Grant No.201901D111279)+1 种基金the Research Grant Council of the Hong Kong Special Administration Region(Grant Nos.14301918 and 14302519)。
文摘As extensions of means, expectiles embrace all the distribution information of a random variable.The expectile regression is computationally friendlier because the asymmetric least square loss function is differentiable everywhere. This regression also enables effective estimation of the expectiles of a response variable when potential explanatory variables are given. In this study, we propose the partial functional linear expectile regression model. The slope function and constant coefficients are estimated by using the functional principal component basis. The convergence rate of the slope function and the asymptotic normality of the parameter vector are established. To inspect the effect of the parametric component on the response variable, we develop Wald-type and expectile rank score tests and establish their asymptotic properties. The finite performance of the proposed estimators and test statistics are evaluated through simulation study. Results indicate that the proposed estimators are comparable to competing estimation methods and the newly proposed expectile rank score test is useful. The methodologies are illustrated by using two real data examples.