A novel RNA virus,the severe acute respiratory syndrome coronavirus 2(SARS-CoV-2),is responsible for the ongoing outbreak of coronavirus disease 2019(COVID-19).Population genetic analysis could be useful for investiga...A novel RNA virus,the severe acute respiratory syndrome coronavirus 2(SARS-CoV-2),is responsible for the ongoing outbreak of coronavirus disease 2019(COVID-19).Population genetic analysis could be useful for investigating the origin and evolutionary dynamics of COVID-19.However,due to extensive sampling bias and existence of infection clusters during the epidemic spread,direct applications of existing approaches can lead to biased parameter estimations and data misinterpretation.In this study,we first present robust estimator for the time to the most recent common ancestor(TMRCA)and the mutation rate,and then apply the approach to analyze 12,909 genomic sequences of SARS-CoV-2.The mutation rate is inferred to be 8.69×10^(−4) per site per year with a 95%confidence interval(CI)of[8.61×10^(−4),8.77×10^(−4)],and the TMRCA of the samples inferred to be Nov 28,2019 with a 95%CI of[Oct 20,2019,Dec 9,2019].The results indicate that COVID-19 might originate earlier than and outside of Wuhan Seafood Market.We further demonstrate that genetic polymorphism patterns,including the enrichment of specific haplotypes and the temporal allele frequency trajectories generated from infection clusters,are similar to those caused by evolutionary forces such as natural selection.Our results show that population genetic methods need to be developed to efficiently detangle the effects of sampling bias and infection clusters to gain insights into the evolutionary mechanism of SARS-CoV-2.Software for implementing VirusMuT can be downloaded at https://bigd.big.ac.cn/biocode/tools/BT007081.展开更多
Estimating amounts of change in forest resources over time is a key function of most national forest inventories(NFI). As this information is used broadly for many management and policy purposes, it is imperative that...Estimating amounts of change in forest resources over time is a key function of most national forest inventories(NFI). As this information is used broadly for many management and policy purposes, it is imperative that accurate estimations are made from the survey sample. Robust sampling designs are often used to help ensure representation of the population, but often the full sample is unrealized due to hazardous conditions or possibly lack of land access permission. Potentially, bias may be imparted to the sample if the nonresponse is nonrandom with respect to forest characteristics, which becomes more difficult to assess for change estimation methods that require measurements of the same sample plots at two points in time, i.e., remeasurement. To examine potential nonresponse bias in change estimates, two synthetic populations were constructed: 1) a typical NFI population consisting of both forest and nonforest plots, and 2) a population that mimics a large catastrophic disturbance event within a forested population. Comparisons of estimates under various nonresponse scenarios were made using a standard implementation of post-stratified estimation as well as an alternative approach that groups plots having similar response probabilities(response homogeneity). When using the post-stratified estimators, the amount of change was overestimated for the NFI population and was underestimated for the disturbance population, whereas the response homogeneity approach produced nearly unbiased estimates under the assumption of equal response probability within groups. These outcomes suggest that formal strategies may be needed to obtain accurate change estimates in the presence of nonrandom nonresponse.展开更多
This study examines the impact of farmers’cooperatives participation and technology adoption on their economic welfare in China.A double selectivity model(DSM)is applied to correct for sample selection bias stemming ...This study examines the impact of farmers’cooperatives participation and technology adoption on their economic welfare in China.A double selectivity model(DSM)is applied to correct for sample selection bias stemming from both observed and unobserved factors,and a propensity score matching(PSM)method is applied to calculate the agricultural income difference with counter factual analysis using survey data from 396 farmers in 15 provinces in China.The findings indicate that farmers who join farmer cooperatives and adopt agricultural technology can increase agricultural income by 2.77 and 2.35%,respectively,compared with those non-participants and non-adopters.Interestingly,the effect on agricultural income is found to be more significant for the low-income farmers than the high-income ones,with income increasing 5.45 and 4.51%when participating in farmer cooperatives and adopting agricultural technology,respectively.Our findings highlight the positive role of farmer cooperatives and agricultural technology in promoting farmers’economic welfare.Based on the findings,government policy implications are also discussed.展开更多
The potential of citizen science projects in research has been increasingly acknowledged,but the substantial engagement of these projects is restricted by the quality of citizen science data.Based on the largest emerg...The potential of citizen science projects in research has been increasingly acknowledged,but the substantial engagement of these projects is restricted by the quality of citizen science data.Based on the largest emerging citizen science project in the country-Birdreport Online Database(BOD),we examined the biases of birdwatching data from the Greater Bay Area of China.The results show that the sampling effort is disparate among land cover types due to contributors’ preference towards urban and suburban areas,indicating the environment suitable for species existence could be underrepresented in the BOD data.We tested the contributors’ skill of species identification via a questionnaire targeting the citizen birders in the Greater Bay Area.The questionnaire show that most citizen birdwatchers could correctly identify the common species widely distributed in Southern China and the less common species with conspicuous morphological characteristics,while failed to identify the species from Alaudidae;Caprimulgidae,Emberizidae,Phylloscopidae,Scolopacidae and Scotocercidae.With a study example,we demonstrate that spatially clustered bird watching visits can cause underestimation of species richness in insufficiently sampled areas;and the result of species richness mapping is sensitive to the contributors’ skill of identifying bird species.Our results address how avian research can be influenced by the reliability of citizen science data in a region of generally high accessibility,and highlight the necessity of pre-analysis scrutiny on data reliability regarding to research aims at all spatial and temporal scales.To improve the data quality,we suggest to equip the data collection frame of BOD with a flexible filter for bird abundance,and questionnaires that collect information related to contributors’ bird identification skill.Statistic modelling approaches are encouraged to apply for correcting the bias of sampling effort.展开更多
For the problem of slow search and tortuous paths in the Rapidly Exploring Random Tree(RRT)algorithm,a feedback-biased sampling RRT,called FS-RRT,is proposedbasedon RRT.Firstly,toimprove the samplingefficiency of RRT ...For the problem of slow search and tortuous paths in the Rapidly Exploring Random Tree(RRT)algorithm,a feedback-biased sampling RRT,called FS-RRT,is proposedbasedon RRT.Firstly,toimprove the samplingefficiency of RRT to shorten the search time,the search area of the randomtree is restricted to improve the sampling efficiency.Secondly,to obtain better information about obstacles to shorten the path length,a feedback-biased sampling strategy is used instead of the traditional random sampling,the collision of the expanding node with an obstacle generates feedback information so that the next expanding node avoids expanding within a specific angle range.Thirdly,this paper proposes using the inverse optimization strategy to remove redundancy points from the initial path,making the path shorter and more accurate.Finally,to satisfy the smooth operation of the robot in practice,auxiliary points are used to optimize the cubic Bezier curve to avoid path-crossing obstacles when using the Bezier curve optimization.The experimental results demonstrate that,compared to the traditional RRT algorithm,the proposed FS-RRT algorithm performs favorably against mainstream algorithms regarding running time,number of search iterations,and path length.Moreover,the improved algorithm also performs well in a narrow obstacle environment,and its effectiveness is further confirmed by experimental verification.展开更多
The reservoir volumetric approach represents a widely accepted, but flawed method of petroleum play resource calculation. In this paper, we propose a combination of techniques that can improve the applicability and qu...The reservoir volumetric approach represents a widely accepted, but flawed method of petroleum play resource calculation. In this paper, we propose a combination of techniques that can improve the applicability and quality of the resource estimation. These techniques include: 1) the use of the Multivariate Discovery Process model (MDP) to derive unbiased distribution parameters of reservoir volumetric variables and to reveal correlations among the variables; 2) the use of the Geo-anchored method to estimate simultaneously the number of oil and gas pools in the same play; and 3) the crossvalidation of assessment results from different methods. These techniques are illustrated by using an example of crude oil and natural gas resource assessment of the Sverdrup Basin, Canadian Archipelago. The example shows that when direct volumetric measurements of the untested prospects are not available, the MDP model can help derive unbiased estimates of the distribution parameters by using information from the discovered oil and gas accumulations. It also shows that an estimation of the number of oil and gas accumulations and associated size ranges from a discovery process model can provide an alternative and efficient approach when inadequate geological data hinder the estimation. Cross-examination of assessment results derived using different methods allows one to focus on and analyze the causes for the major differences, thus providing a more reliable assessment outcome.展开更多
In order to solve the problem of path planning of mobile robots in a dynamic environment,an improved rapidly-exploring random tree^(*)(RRT^(*))algorithm is proposed in this paper.First,the target bias sampling is intr...In order to solve the problem of path planning of mobile robots in a dynamic environment,an improved rapidly-exploring random tree^(*)(RRT^(*))algorithm is proposed in this paper.First,the target bias sampling is introduced to reduce the randomness of the RRT^(*)algorithm,and then the initial path planning is carried out in a static environment.Secondly,apply the path in a dynamic environment,and use the initially planned path as the path cache.When a new obstacle appears in the path,the invalid path is clipped and the path is replanned.At this time,there is a certain probability to select the point in the path cache as the new node,so that the new path maintains the trend of the original path to a greater extent.Finally,MATLAB is used to carry out simulation experiments for the initial planning and replanning algorithms,respectively.More specifically,compared with the original RRT^(*)algorithm,the simulation results show that the number of nodes used by the new improved algorithm is reduced by 43.19%on average.展开更多
Dimension reduction provides a powerful means of reducing the number of random variables under consideration.However,there were many similar tuples in large datasets,and before reducing the dimension of the dataset,we...Dimension reduction provides a powerful means of reducing the number of random variables under consideration.However,there were many similar tuples in large datasets,and before reducing the dimension of the dataset,we removed some similar tuples to retain the main information of the dataset while accelerating the dimension reduc-tion.Accordingly,we propose a dimension reduction technique based on biased sampling,a new procedure that incorporates features of both dimensional reduction and biased sampling to obtain a computationally efficient means of reducing the number of random variables under consid-eration.In this paper,we choose Principal Components Analysis(PCA)as the main dimensional reduction algorithm to study,and we show how this approach works.展开更多
In survival analysis,data are frequently collected by some complex sampling schemes,e.g.,length biased sampling,case-cohort sampling and so on.In this paper,we consider the additive hazards model for the general biase...In survival analysis,data are frequently collected by some complex sampling schemes,e.g.,length biased sampling,case-cohort sampling and so on.In this paper,we consider the additive hazards model for the general biased survival data.A simple and unified estimating equation method is developed to estimate the regression parameters and baseline hazard function.The asymptotic properties of the resulting estimators are also derived.Furthermore,to check the adequacy of the fitted model with general biased survival data,we present a test statistic based on the cumulative sum of the martingale-type residuals.Simulation studies are conducted to evaluate the performance of proposed methods,and applications to the shrub and Welsh Nickel Refiners datasets are given to illustrate the methodology.展开更多
Aims The potential for mixtures of plant species to produce more biomass than every one of their constituent species in monoculture is still controversially discussed in the literature.Here we tested how this socalled...Aims The potential for mixtures of plant species to produce more biomass than every one of their constituent species in monoculture is still controversially discussed in the literature.Here we tested how this socalled transgressive overyielding is affected by variation between and within species in monoculture yields in biodiversity experiments.Methods We use basic statistical principles to calculate expected maximum monoculture yield in a species pool used for a biodiversity experiment.Using a real example we show how between-and withinspecies variance components in monoculture yields can be obtained.Combining the two components we estimate the importance of sampling bias in transgressive overyielding analysis.Important Findings The net biodiversity effect(difference between mixture and average monoculture yield)needed to achieve transgressive overyielding increases with the number of species in a mixture and with the variation between constituent species in monoculture yields.If there is no significant variation between species,transgressive overyielding should not be calculated using the best monoculture,because in this case the difference between this species and the other species could exclusively reflect a sampling bias.The sampling bias decreases with increasing variation between species.Tests for transgressive overyielding require replicated species’monocultures.However,it can be doubted whether such an emphasis on monocultures in biodiversity experiments is justified if an analysis of transgressive overyielding is not the major goal.展开更多
In this article we study a semiparametric mixture model for the two-sample problem with right censored data. The model implies that the densities for the continuous outcomes are related by a parametric tilt but otherw...In this article we study a semiparametric mixture model for the two-sample problem with right censored data. The model implies that the densities for the continuous outcomes are related by a parametric tilt but otherwise unspecified. It provides a useful alternative to the Cox (1972) proportional hazards model for the comparison of treatments based on right censored survival data. We propose an iterative algorithm for the semiparametric maximum likelihood estimates of the parametric and nonparametric components of the model. The performance of the proposed method is studied using simulation. We illustrate our method in an application to melanoma.展开更多
基金This study was supported by the National Key R&D Program of China(Grant No.2020YFC0847000)the National Natural Science Foundation of China(Grant Nos.31571370,91731302,and 31772435).
文摘A novel RNA virus,the severe acute respiratory syndrome coronavirus 2(SARS-CoV-2),is responsible for the ongoing outbreak of coronavirus disease 2019(COVID-19).Population genetic analysis could be useful for investigating the origin and evolutionary dynamics of COVID-19.However,due to extensive sampling bias and existence of infection clusters during the epidemic spread,direct applications of existing approaches can lead to biased parameter estimations and data misinterpretation.In this study,we first present robust estimator for the time to the most recent common ancestor(TMRCA)and the mutation rate,and then apply the approach to analyze 12,909 genomic sequences of SARS-CoV-2.The mutation rate is inferred to be 8.69×10^(−4) per site per year with a 95%confidence interval(CI)of[8.61×10^(−4),8.77×10^(−4)],and the TMRCA of the samples inferred to be Nov 28,2019 with a 95%CI of[Oct 20,2019,Dec 9,2019].The results indicate that COVID-19 might originate earlier than and outside of Wuhan Seafood Market.We further demonstrate that genetic polymorphism patterns,including the enrichment of specific haplotypes and the temporal allele frequency trajectories generated from infection clusters,are similar to those caused by evolutionary forces such as natural selection.Our results show that population genetic methods need to be developed to efficiently detangle the effects of sampling bias and infection clusters to gain insights into the evolutionary mechanism of SARS-CoV-2.Software for implementing VirusMuT can be downloaded at https://bigd.big.ac.cn/biocode/tools/BT007081.
文摘Estimating amounts of change in forest resources over time is a key function of most national forest inventories(NFI). As this information is used broadly for many management and policy purposes, it is imperative that accurate estimations are made from the survey sample. Robust sampling designs are often used to help ensure representation of the population, but often the full sample is unrealized due to hazardous conditions or possibly lack of land access permission. Potentially, bias may be imparted to the sample if the nonresponse is nonrandom with respect to forest characteristics, which becomes more difficult to assess for change estimation methods that require measurements of the same sample plots at two points in time, i.e., remeasurement. To examine potential nonresponse bias in change estimates, two synthetic populations were constructed: 1) a typical NFI population consisting of both forest and nonforest plots, and 2) a population that mimics a large catastrophic disturbance event within a forested population. Comparisons of estimates under various nonresponse scenarios were made using a standard implementation of post-stratified estimation as well as an alternative approach that groups plots having similar response probabilities(response homogeneity). When using the post-stratified estimators, the amount of change was overestimated for the NFI population and was underestimated for the disturbance population, whereas the response homogeneity approach produced nearly unbiased estimates under the assumption of equal response probability within groups. These outcomes suggest that formal strategies may be needed to obtain accurate change estimates in the presence of nonrandom nonresponse.
基金the Special Project of Major Theoretical Research and Interpretation of Philosophy and Social Sciences of Chongqing Municipal Education Commission,China(19SKZDZX15)the Key Project of Humanities and Social Sciences Research of Chongqing Education Commission,China(18SKSJ003)the Funding for Cultivating Major Projects in Humanities and Social Sciences of Southwest University,China(SWU1809009)。
文摘This study examines the impact of farmers’cooperatives participation and technology adoption on their economic welfare in China.A double selectivity model(DSM)is applied to correct for sample selection bias stemming from both observed and unobserved factors,and a propensity score matching(PSM)method is applied to calculate the agricultural income difference with counter factual analysis using survey data from 396 farmers in 15 provinces in China.The findings indicate that farmers who join farmer cooperatives and adopt agricultural technology can increase agricultural income by 2.77 and 2.35%,respectively,compared with those non-participants and non-adopters.Interestingly,the effect on agricultural income is found to be more significant for the low-income farmers than the high-income ones,with income increasing 5.45 and 4.51%when participating in farmer cooperatives and adopting agricultural technology,respectively.Our findings highlight the positive role of farmer cooperatives and agricultural technology in promoting farmers’economic welfare.Based on the findings,government policy implications are also discussed.
基金the Estuary wetland wildlife survey project of the Greater Bay Area of China(Science and Technology Planning Projects of Guangdong Province,2021B1212110002).
文摘The potential of citizen science projects in research has been increasingly acknowledged,but the substantial engagement of these projects is restricted by the quality of citizen science data.Based on the largest emerging citizen science project in the country-Birdreport Online Database(BOD),we examined the biases of birdwatching data from the Greater Bay Area of China.The results show that the sampling effort is disparate among land cover types due to contributors’ preference towards urban and suburban areas,indicating the environment suitable for species existence could be underrepresented in the BOD data.We tested the contributors’ skill of species identification via a questionnaire targeting the citizen birders in the Greater Bay Area.The questionnaire show that most citizen birdwatchers could correctly identify the common species widely distributed in Southern China and the less common species with conspicuous morphological characteristics,while failed to identify the species from Alaudidae;Caprimulgidae,Emberizidae,Phylloscopidae,Scolopacidae and Scotocercidae.With a study example,we demonstrate that spatially clustered bird watching visits can cause underestimation of species richness in insufficiently sampled areas;and the result of species richness mapping is sensitive to the contributors’ skill of identifying bird species.Our results address how avian research can be influenced by the reliability of citizen science data in a region of generally high accessibility,and highlight the necessity of pre-analysis scrutiny on data reliability regarding to research aims at all spatial and temporal scales.To improve the data quality,we suggest to equip the data collection frame of BOD with a flexible filter for bird abundance,and questionnaires that collect information related to contributors’ bird identification skill.Statistic modelling approaches are encouraged to apply for correcting the bias of sampling effort.
基金provided by Shaanxi Province’s Key Research and Development Plan(No.2022NY-087).
文摘For the problem of slow search and tortuous paths in the Rapidly Exploring Random Tree(RRT)algorithm,a feedback-biased sampling RRT,called FS-RRT,is proposedbasedon RRT.Firstly,toimprove the samplingefficiency of RRT to shorten the search time,the search area of the randomtree is restricted to improve the sampling efficiency.Secondly,to obtain better information about obstacles to shorten the path length,a feedback-biased sampling strategy is used instead of the traditional random sampling,the collision of the expanding node with an obstacle generates feedback information so that the next expanding node avoids expanding within a specific angle range.Thirdly,this paper proposes using the inverse optimization strategy to remove redundancy points from the initial path,making the path shorter and more accurate.Finally,to satisfy the smooth operation of the robot in practice,auxiliary points are used to optimize the cubic Bezier curve to avoid path-crossing obstacles when using the Bezier curve optimization.The experimental results demonstrate that,compared to the traditional RRT algorithm,the proposed FS-RRT algorithm performs favorably against mainstream algorithms regarding running time,number of search iterations,and path length.Moreover,the improved algorithm also performs well in a narrow obstacle environment,and its effectiveness is further confirmed by experimental verification.
文摘The reservoir volumetric approach represents a widely accepted, but flawed method of petroleum play resource calculation. In this paper, we propose a combination of techniques that can improve the applicability and quality of the resource estimation. These techniques include: 1) the use of the Multivariate Discovery Process model (MDP) to derive unbiased distribution parameters of reservoir volumetric variables and to reveal correlations among the variables; 2) the use of the Geo-anchored method to estimate simultaneously the number of oil and gas pools in the same play; and 3) the crossvalidation of assessment results from different methods. These techniques are illustrated by using an example of crude oil and natural gas resource assessment of the Sverdrup Basin, Canadian Archipelago. The example shows that when direct volumetric measurements of the untested prospects are not available, the MDP model can help derive unbiased estimates of the distribution parameters by using information from the discovered oil and gas accumulations. It also shows that an estimation of the number of oil and gas accumulations and associated size ranges from a discovery process model can provide an alternative and efficient approach when inadequate geological data hinder the estimation. Cross-examination of assessment results derived using different methods allows one to focus on and analyze the causes for the major differences, thus providing a more reliable assessment outcome.
基金National Natural Science Foundation of China(No.61903291)。
文摘In order to solve the problem of path planning of mobile robots in a dynamic environment,an improved rapidly-exploring random tree^(*)(RRT^(*))algorithm is proposed in this paper.First,the target bias sampling is introduced to reduce the randomness of the RRT^(*)algorithm,and then the initial path planning is carried out in a static environment.Secondly,apply the path in a dynamic environment,and use the initially planned path as the path cache.When a new obstacle appears in the path,the invalid path is clipped and the path is replanned.At this time,there is a certain probability to select the point in the path cache as the new node,so that the new path maintains the trend of the original path to a greater extent.Finally,MATLAB is used to carry out simulation experiments for the initial planning and replanning algorithms,respectively.More specifically,compared with the original RRT^(*)algorithm,the simulation results show that the number of nodes used by the new improved algorithm is reduced by 43.19%on average.
基金This paper was supported by The National Key Research and Development Program of China(2020YFB1006104)The Opening Project of Intelligent Policing Key Laboratory of Sichuan Province(ZNJW2023KFZD004)+1 种基金Sichuan Police College(CJKY202001)NSFC grant(62232005).
文摘Dimension reduction provides a powerful means of reducing the number of random variables under consideration.However,there were many similar tuples in large datasets,and before reducing the dimension of the dataset,we removed some similar tuples to retain the main information of the dataset while accelerating the dimension reduc-tion.Accordingly,we propose a dimension reduction technique based on biased sampling,a new procedure that incorporates features of both dimensional reduction and biased sampling to obtain a computationally efficient means of reducing the number of random variables under consid-eration.In this paper,we choose Principal Components Analysis(PCA)as the main dimensional reduction algorithm to study,and we show how this approach works.
文摘In survival analysis,data are frequently collected by some complex sampling schemes,e.g.,length biased sampling,case-cohort sampling and so on.In this paper,we consider the additive hazards model for the general biased survival data.A simple and unified estimating equation method is developed to estimate the regression parameters and baseline hazard function.The asymptotic properties of the resulting estimators are also derived.Furthermore,to check the adequacy of the fitted model with general biased survival data,we present a test statistic based on the cumulative sum of the martingale-type residuals.Simulation studies are conducted to evaluate the performance of proposed methods,and applications to the shrub and Welsh Nickel Refiners datasets are given to illustrate the methodology.
基金German Science Foundation(FOR 456–WE 2618/6-1 to B.S.)Swiss National Science Foundation(31–65224.01 to B.S.)Natural Sciences and Engineering Research Council of Canada(M.L.)。
文摘Aims The potential for mixtures of plant species to produce more biomass than every one of their constituent species in monoculture is still controversially discussed in the literature.Here we tested how this socalled transgressive overyielding is affected by variation between and within species in monoculture yields in biodiversity experiments.Methods We use basic statistical principles to calculate expected maximum monoculture yield in a species pool used for a biodiversity experiment.Using a real example we show how between-and withinspecies variance components in monoculture yields can be obtained.Combining the two components we estimate the importance of sampling bias in transgressive overyielding analysis.Important Findings The net biodiversity effect(difference between mixture and average monoculture yield)needed to achieve transgressive overyielding increases with the number of species in a mixture and with the variation between constituent species in monoculture yields.If there is no significant variation between species,transgressive overyielding should not be calculated using the best monoculture,because in this case the difference between this species and the other species could exclusively reflect a sampling bias.The sampling bias decreases with increasing variation between species.Tests for transgressive overyielding require replicated species’monocultures.However,it can be doubted whether such an emphasis on monocultures in biodiversity experiments is justified if an analysis of transgressive overyielding is not the major goal.
基金supported in part by the U.S.National Institute of Health(No.CA016042,No.P01AT003960)Chien-Tai Lin's research was supported in part by the National Science Council of Taiwan(No.89-2118-M-032-021,No.96-2628-M-032-002-MY3)
文摘In this article we study a semiparametric mixture model for the two-sample problem with right censored data. The model implies that the densities for the continuous outcomes are related by a parametric tilt but otherwise unspecified. It provides a useful alternative to the Cox (1972) proportional hazards model for the comparison of treatments based on right censored survival data. We propose an iterative algorithm for the semiparametric maximum likelihood estimates of the parametric and nonparametric components of the model. The performance of the proposed method is studied using simulation. We illustrate our method in an application to melanoma.