In this paper,the estimation for a class of generalized varying coefficient models with error-prone covariates is considered.By combining basis function approximations with some auxiliary variables,an instrumental var...In this paper,the estimation for a class of generalized varying coefficient models with error-prone covariates is considered.By combining basis function approximations with some auxiliary variables,an instrumental variable type estimation procedure is proposed.The asymptotic results of the estimator,such as the consistency and the weak convergence rate,are obtained.The proposed procedure can attenuate the effect of measurement errors and have proved workable for finite samples.展开更多
This paper studies the estimation and inference for a class of varying-coefficient regression models with error-prone covariates.The authors focus on the situation where the covariates are unobserved,there are no repe...This paper studies the estimation and inference for a class of varying-coefficient regression models with error-prone covariates.The authors focus on the situation where the covariates are unobserved,there are no repeated measurements,and the covariance matrix of the measurement errors is unknown,but some auxiliary information is available.The authors propose an instrumental variable type local polynomial estimator for the unknown varying-coefficient functions,and show that the estimator achieves the optimal nonparametric convergence rate,is asymptotically normal,and avoids using undersmoothing to allow the bandwidths to be selected using data-driven methods.A simulation is carried out to study the finite sample performance of the proposed estimator,and a real date set is analyzed to illustrate the usefulness of the developed methodology.展开更多
In the process of fault detection and classification,the operation mode usually drifts over time,which brings great challenges to the algorithms.Because traditional machine learning based fault classification cannot d...In the process of fault detection and classification,the operation mode usually drifts over time,which brings great challenges to the algorithms.Because traditional machine learning based fault classification cannot dynamically update the trained model according to the probability distribution of the testing dataset,the accuracy of these traditional methods usually drops significantly in the case of covariate shift.In this paper,an importance-weighted transfer learning method is proposed for fault classification in the nonlinear multi-mode industrial process.It effectively alters the drift between the training and testing dataset.Firstly,the mutual information method is utilized to perform feature selection on the original data,and a number of characteristic parameters associated with fault classification are selected according to their mutual information.Then,the importance-weighted least-squares probabilistic classifier(IWLSPC)is utilized for binary fault detection and multi-fault classification in covariate shift.Finally,the Tennessee Eastman(TE)benchmark is carried out to confirm the effectiveness of the proposed method.The experimental result shows that the covariate shift adaptation based on importance-weight sampling is superior to the traditional machine learning fault classification algorithms.Moreover,IWLSPC can not only be used for binary fault classification,but also can be applied to the multi-classification target in the process of fault diagnosis.展开更多
Environmental covariates are the basis of predictive soil mapping.Their selection determines the performance of soil mapping to a great extent,especially in cases where the number of soil samples is limited but soil s...Environmental covariates are the basis of predictive soil mapping.Their selection determines the performance of soil mapping to a great extent,especially in cases where the number of soil samples is limited but soil spatial heterogeneity is high.In this study,we proposed an integrated method to select environmental covariates for predictive soil depth mapping.First,candidate variables that may influence the development of soil depth were selected based on pedogenetic knowledge.Second,three conventional methods(Pearson correlation analysis(PsCA),generalized additive models(GAMs),and Random Forest(RF))were used to generate optimal combinations of environmental covariates.Finally,three optimal combinations were integrated to produce a final combination based on the importance and occurrence frequency of each environmental covariate.We tested this method for soil depth mapping in the upper reaches of the Heihe River Basin in Northwest China.A total of 129 soil sampling sites were collected using a representative sampling strategy,and RF and support vector machine(SVM)models were used to map soil depth.The results showed that compared to the set of environmental covariates selected by the three conventional selection methods,the set of environmental covariates selected by the proposed method achieved higher mapping accuracy.The combination from the proposed method obtained a root mean square error(RMSE)of 11.88 cm,which was 2.25–7.64 cm lower than the other methods,and an R^2 value of 0.76,which was 0.08–0.26 higher than the other methods.The results suggest that our method can be used as an alternative to the conventional methods for soil depth mapping and may also be effective for mapping other soil properties.展开更多
It is often important to incorporate covariate information in the design of clinical trials. In literature there are many designs of using stratification and covariate-adaptive randomization to balance certain known c...It is often important to incorporate covariate information in the design of clinical trials. In literature there are many designs of using stratification and covariate-adaptive randomization to balance certain known covariate. Recently, some covariate-adjusted response-adaptive (CARA) designs have been proposed and their asymptotic properties have been studied (Ann. Statist. 2007). However, these CARA designs usually have high variabilities. In this paper, a new family of covariate-adjusted response-adaptive (CARA) designs is presented. It is shown that the new designs have less variables and therefore are more efficient.展开更多
Selecting a proper set of covariates is one of the most important factors that influence the accuracy of digital soil mapping(DSM).The statistical or machine learning methods for selecting DSM covariates are not avail...Selecting a proper set of covariates is one of the most important factors that influence the accuracy of digital soil mapping(DSM).The statistical or machine learning methods for selecting DSM covariates are not available for those situations with limited samples.To solve the problem,this paper proposed a case-based method which could formalize the covariate selection knowledge contained in practical DSM applications.The proposed method trained Random Forest(RF)classifiers with DSM cases extracted from the practical DSM applications and then used the trained classifiers to determine whether each one potential covariate should be used in a new DSM application.In this study,we took topographic covariates as examples of covariates and extracted 191 DSM cases from 56 peer-reviewed journal articles to evaluate the performance of the proposed case-based method by Leave-One-Out cross validation.Compared with a novices’commonly-used way of selecting DSM covariates,the proposed case-based method improved more than 30%accuracy according to three quantitative evaluation indices(i.e.,recall,precision,and F1-score).The proposed method could be also applied to selecting the proper set of covariates for other similar geographical modeling domains,such as landslide susceptibility mapping,and species distribution modeling.展开更多
In WLANs, stations sharing a common wireless channel are governed by IEEE 802.11 protocol. Many conscious studies have been conducted to utilize this precious medium efficiently. However, most of these studies have be...In WLANs, stations sharing a common wireless channel are governed by IEEE 802.11 protocol. Many conscious studies have been conducted to utilize this precious medium efficiently. However, most of these studies have been done either under assumption of idealistic channel condition or with unlimited retransmitting number. This paper is devoted to investigate influence of limited retransmissions and error level in the utilizing channel on the network throughput, probability of packet dropping and time to drop a packet. The results show that for networks using basic access mechanism the throughput is suppressed with increasing amount of errors in the transmitting channel over all the range of the retry limit. It is also quite sensitive to the size of the network. On the other side, the networks using four-way handshaking mechanism has a good immunity against the error over the available range of retry limits. Also the throughput is unchangeable with size of the network over the range of retransmission limits. However, the throughput does not change with retry limits when it exceeds the maximum number of the backoff stage in both DCF’s mechanisms. In both mechanisms the probability of dropping a packet is a decreasing function with number of retransmissions and the time to drop a packet in the queue of a station is a strong function to the number of retry limit, size of the network, the utilizing medium access mechanism and amount of errors in the channel.展开更多
Biometric gait recognition is a lesser-known but emerging and effective biometric recognition method which enables subjects’walking patterns to be recognized.Existing research in this area has primarily focused on fe...Biometric gait recognition is a lesser-known but emerging and effective biometric recognition method which enables subjects’walking patterns to be recognized.Existing research in this area has primarily focused on feature analysis through the extraction of individual features,which captures most of the information but fails to capture subtle variations in gait dynamics.Therefore,a novel feature taxonomy and an approach for deriving a relationship between a function of one set of gait features with another set are introduced.The gait features extracted from body halves divided by anatomical planes on vertical,horizontal,and diagonal axes are grouped to form canonical gait covariates.Canonical Correlation Analysis is utilized to measure the strength of association between the canonical covariates of gait.Thus,gait assessment and identification are enhancedwhenmore semantic information is available through CCA-basedmulti-feature fusion.Hence,CarnegieMellon University’s 3D gait database,which contains 32 gait samples taken at different paces,is utilized in analyzing gait characteristics.The performance of Linear Discriminant Analysis,K-Nearest Neighbors,Naive Bayes,Artificial Neural Networks,and Support Vector Machines was improved by a 4%average when the CCA-utilized gait identification approachwas used.Asignificant maximumaccuracy rate of 97.8%was achieved throughCCA-based gait identification.Beyond that,the rate of false identifications and unrecognized gaits went down to half,demonstrating state-of-the-art for gait identification.展开更多
A comprehensive study was presented for WLAN 802.11b using error-prone channel. It was theoretically and numerically evaluated the performance of three different network sizes with the bit rates that available in 802....A comprehensive study was presented for WLAN 802.11b using error-prone channel. It was theoretically and numerically evaluated the performance of three different network sizes with the bit rates that available in 802.11b protocol. Results show that throughput does not change with the size of the network for wide range of bit error rates (BERs) and the channel bit rates play a significant role in the main characteristics of the network. A comprehensive explanation has given for the phenomenon of the packet delay suppression at relatively high level of BERs in view of the size of the networks and the BERs. The effect length of the transmitting packets is also investigated.展开更多
Mark-recapture models are extensively used in quantitative population ecology, providing estimates of population vital rates, such as survival, that are difficult to obtain using other methods. Vital rates are commonl...Mark-recapture models are extensively used in quantitative population ecology, providing estimates of population vital rates, such as survival, that are difficult to obtain using other methods. Vital rates are commonly modeled as functions of explanatory covariates, adding considerable flexibility to mark-recapture models, but also increasing the subjectivity and complexity of the modeling process. Consequently, model selection and the evaluation of covariate structure remain critical aspects of mark-recapture modeling. The difficulties involved in model selection are compounded in Cormack-Jolly-Seber models because they are composed of separate sub-models for survival and recapture probabilities, which are conceptualized independently even though their parameters are not statistically independent. The construction of models as combinations of sub-models, together with multiple potential covariates, can lead to a large model set. Although desirable, estimation of the parameters of all models may not be feasible. Strategies to search a model space and base inference on a subset of all models exist and enjoy widespread use. However, even though the methods used to search a model space can be expected to influence parameter estimation, the assessment of covariate importance, and therefore the ecological interpretation of the modeling results, the performance of these strategies has received limited investigation. We present a new strategy for searching the space of a candidate set of Cormack-Jolly-Seber models and explore its performance relative to existing strategies using computer simulation. The new strategy provides an improved assessment of the importance of covariates and covariate combinations used to model survival and recapture probabilities, while requiring only a modest increase in the number of models on which inference is based in comparison to existing techniques.展开更多
Given a sample of regression data from (Y, Z), a new diagnostic plotting method is proposed for checking the hypothesis H0: the data are from a given Cox model with the time-dependent covariates Z. It compares two est...Given a sample of regression data from (Y, Z), a new diagnostic plotting method is proposed for checking the hypothesis H0: the data are from a given Cox model with the time-dependent covariates Z. It compares two estimates of the marginal distribution FY of Y. One is an estimate of the modified expression of FY under H0, based on a consistent estimate of the parameter under H0, and based on the baseline distribution of the data. The other is the Kaplan-Meier-estimator of FY, together with its confidence band. The new plot, called the marginal distribution plot, can be viewed as a test for testing H0. The main advantage of the test over the existing residual tests is in the case that the data do not satisfy any Cox model or the Cox model is mis-specified. Then the new test is still valid, but not the residual tests and the residual tests often make type II error with a very large probability.展开更多
The consideration of the time-varying covariate and time-varying coefficient effect in survival models are plausible and robust techniques. Such kind of analysis can be carried out with a general class of semiparametr...The consideration of the time-varying covariate and time-varying coefficient effect in survival models are plausible and robust techniques. Such kind of analysis can be carried out with a general class of semiparametric transformation models. The aim of this article is to develop modified estimating equations under semiparametric transformation models of survival time with time-varying coefficient effect and time-varying continuous covariates. For this, it is important to organize the data in a counting process style and transform the time with standard transformation classes which shall be applied in this article. In the situation when the effect of coefficient and covariates change over time, the widely used maximum likelihood estimation method becomes more complex and burdensome in estimating consistent estimates. To overcome this problem, alternatively, the modified estimating equations were applied to estimate the unknown parameters and unspecified monotone transformation functions. The estimating equations were modified to incorporate the time-varying effect in both coefficient and covariates. The performance of the proposed methods is tested through a simulation study. To sum up the study, the effect of possibly time-varying covariates and time-varying coefficients was evaluated in some special cases of semiparametric transformation models. Finally, the results have shown that the role of the time-varying covariate in the semiparametric transformation models was plausible and credible.展开更多
The empirical likelihood-based inference for varying coefficient models with missing covariates is investigated. An imputed empirical likelihood ratio function for the coefficient functions is proposed, and it is show...The empirical likelihood-based inference for varying coefficient models with missing covariates is investigated. An imputed empirical likelihood ratio function for the coefficient functions is proposed, and it is shown that iis limiting distribution is standard chi-squared. Then the corresponding confidence intervals for the regression coefficients are constructed. Some simulations show that the proposed procedure can attenuate the effect of the missing data, and performs well for the finite sample.展开更多
Library construction is a common method used to screen target genes in molecular biology.Most library constructions are not suitable for a small DNA library(<100 base pair(bp))and low RNA library output.To maximize...Library construction is a common method used to screen target genes in molecular biology.Most library constructions are not suitable for a small DNA library(<100 base pair(bp))and low RNA library output.To maximize the library’s complexity,error-prone polymerase chain reaction(PCR)was used to increase the base mutation rate.After introducing the DNA fragments into the competent cell,the library complexity could reach 109.Library mutation rate increased exponentially with the dilution and amplification of error-prone PCR.The error-prone PCR conditions were optimized including deoxyribonucleotide triphosphate(dNTP)concentration,Mn^(2+)concentration,Mg^(2+)concentration,PCR cycle number,and primer length.Then,a RNA library with high complexity can be obtained by in vitro transcription to meet most molecular biological screening requirements,and can also be used for mRNA vaccine screening.展开更多
Characterizing foliar trait variation in sun and shade leaves can provide insights into inter-and intra-species resource use strategies and plant response to environmental change.However,datasets with records of multi...Characterizing foliar trait variation in sun and shade leaves can provide insights into inter-and intra-species resource use strategies and plant response to environmental change.However,datasets with records of multiple foliar traits from the same individual and including shade leaves are sparse,which limits our ability to investigate trait-trait,trait-environment relationships and trait coordination in both sun and shade leaves.We presented a comprehensive dataset of 15 foliar traits from sun and shade leaves sampled with leaf spectroscopy,including 424 individuals of 110 plant species from 19 sites across eastern North America.We investigated trait variation,covariation,scaling relationships with leaf mass,and the effects of environment,canopy position,and taxonomy on trait expression.Generally,sun leaves had higher leaf mass per area,nonstructural carbohydrates and total phenolics,lower mass-based chlorophyll a+b,carotenoids,phosphorus,and potassium,but exhibited species-specific characteristics.Covariation between sun and shade leaf traits,and trait-environment relationships were overall consistent across species.The main dimensions of foliar trait variation in seed plants were revealed including leaf economics traits,photosynthetic pigments,defense,and structural traits.Taxonomy and canopy position collectively explained most of the foliar trait variation.This study highlights the importance of including intra-individual and intra-specific trait variation to improve our understanding of ecosystem functions.Our findings have implications for efficient field sampling,and trait mapping with remote sensing.展开更多
Spatial covariance matrix(SCM) is essential in many multi-antenna systems such as massive multiple-input multiple-output(MIMO). For multi-antenna systems operating at millimeter-wave bands, hybrid analog-digital struc...Spatial covariance matrix(SCM) is essential in many multi-antenna systems such as massive multiple-input multiple-output(MIMO). For multi-antenna systems operating at millimeter-wave bands, hybrid analog-digital structure has been widely adopted to reduce the cost of radio frequency chains.In this situation, signals received at the antennas are unavailable to the digital receiver, and as a consequence, traditional sample average approach cannot be used for SCM reconstruction in hybrid multi-antenna systems. To address this issue, beam sweeping algorithm(BSA) which can reconstruct the SCM effectively for a hybrid uniform linear array, has been proposed in our previous works. However, direct extension of BSA to a hybrid uniform circular array(UCA)will result in a huge computational burden. To this end, a low-complexity approach is proposed in this paper. By exploiting the symmetry features of SCM for the UCA, the number of unknowns can be reduced significantly and thus the complexity of reconstruction can be saved accordingly. Furthermore, an insightful analysis is also presented in this paper, showing that the reduction of the number of unknowns can also improve the accuracy of the reconstructed SCM. Simulation results are also shown to demonstrate the proposed approach.展开更多
The objective of this study was to explore the relationship of sociodemographic, clinical, and health-services use-related variables with transitions between disability-based profiles. In a longitudinal study of 1386 ...The objective of this study was to explore the relationship of sociodemographic, clinical, and health-services use-related variables with transitions between disability-based profiles. In a longitudinal study of 1386 people aged 75 and over living in the community at baseline, disabilities were assessed annually for up to four years with the Functional Autonomy Measurement System (SMAF), which generates 14 Iso-SMAF profiles. These profiles are grouped into 4 disability states, which are predominant alterations in instrumental activities of daily living (IADLs), mobility, mental functions as well as severe and mixed disabilities. Continuous-time, multi-state Markov modeling was used to identify the factors associated with transitions made by older people between these states and to institutionalization and death. Greater age and receiving help for ADL were associated with four transitions, while altered cognitive functions and hospitalization were associated with three, all involving more decline or less recovery. From mild IADL profiles, men have a higher risk of transitioning to intermediate predominantly mental profiles, while women are at higher risk of transitioning to intermediate predominantly mobility profiles. Unmet needs are associated with deterioration, from mild IADL to intermediate predominantly mobility profiles. These results help understanding the complex progression of disabilities in older people.展开更多
An improved method for estimation of causal effects from observational data is demonstrated. Applications in medicine have been few, and the purpose of the present study is to contribute new clinical insight by means ...An improved method for estimation of causal effects from observational data is demonstrated. Applications in medicine have been few, and the purpose of the present study is to contribute new clinical insight by means of this new and more sophisticated analysis. Long term effect of medication for adult ADHD patients is not resolved. A model with causal parameters to represent effect of medication was formulated, which accounts for time-varying confounding and selection-bias from loss to follow-up. The popular marginal structural model (MSM) for causal inference, of Robins et al., adjusts for time-varying confounding, but suffers from lack of robustness for misspecification in the weights. Recent work by Imai and Ratkovic?[1][2] achieves robustness in the MSM, through improved covariate balance (CBMSM). The CBMSM (freely available software) was compared with a standard fit of a MSM and a naive regression model, to give a robust estimate of the true treatment effect in 250 previously non-medicated adults, treated for one year, in a specialized ADHD outpatient clinic in Norway. Covariate balance was greatly improved, resulting in a stronger treatment effect than without this improvement. In terms of treatment effect per week, early stages seemed to have the strongest influence. An estimated average reduction of 4 units on the symptom scale assessed at 12 weeks, for hypothetical medication in the 9 - 12 weeks period compared to no medication in this period, was found. The treatment effect persisted throughout the whole year, with an estimated average reduction of 0.7 units per week on symptoms assessed at one year, for hypothetical medication in the last 13 weeks of the year, compared to no medication in this period. The present findings support a strong and causal direct and indirect effect of pharmacological treatment of adults with ADHD on improvement in symptoms, and with a stronger treatment effect than has been reported.展开更多
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o...The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.展开更多
基金Supported by the National Natural Science Foundation of China(11101119)the Natural Science Foundation of Guangxi(2010GXNSFB013051)the Philosophy and Social Sciences Foundation of Guangxi(11FTJ002)
文摘In this paper,the estimation for a class of generalized varying coefficient models with error-prone covariates is considered.By combining basis function approximations with some auxiliary variables,an instrumental variable type estimation procedure is proposed.The asymptotic results of the estimator,such as the consistency and the weak convergence rate,are obtained.The proposed procedure can attenuate the effect of measurement errors and have proved workable for finite samples.
基金supported by the Graduate Student Innovation Foundation of SHUFE(#CXJJ-2011-351)supported by the Natural Sciences and Engineering Research Council of Canada
文摘This paper studies the estimation and inference for a class of varying-coefficient regression models with error-prone covariates.The authors focus on the situation where the covariates are unobserved,there are no repeated measurements,and the covariance matrix of the measurement errors is unknown,but some auxiliary information is available.The authors propose an instrumental variable type local polynomial estimator for the unknown varying-coefficient functions,and show that the estimator achieves the optimal nonparametric convergence rate,is asymptotically normal,and avoids using undersmoothing to allow the bandwidths to be selected using data-driven methods.A simulation is carried out to study the finite sample performance of the proposed estimator,and a real date set is analyzed to illustrate the usefulness of the developed methodology.
文摘In the process of fault detection and classification,the operation mode usually drifts over time,which brings great challenges to the algorithms.Because traditional machine learning based fault classification cannot dynamically update the trained model according to the probability distribution of the testing dataset,the accuracy of these traditional methods usually drops significantly in the case of covariate shift.In this paper,an importance-weighted transfer learning method is proposed for fault classification in the nonlinear multi-mode industrial process.It effectively alters the drift between the training and testing dataset.Firstly,the mutual information method is utilized to perform feature selection on the original data,and a number of characteristic parameters associated with fault classification are selected according to their mutual information.Then,the importance-weighted least-squares probabilistic classifier(IWLSPC)is utilized for binary fault detection and multi-fault classification in covariate shift.Finally,the Tennessee Eastman(TE)benchmark is carried out to confirm the effectiveness of the proposed method.The experimental result shows that the covariate shift adaptation based on importance-weight sampling is superior to the traditional machine learning fault classification algorithms.Moreover,IWLSPC can not only be used for binary fault classification,but also can be applied to the multi-classification target in the process of fault diagnosis.
基金supported financially by the National Natural Science Foundation of China (91325301, 41571212 and 41137224)the Project of "One-Three-Five" Strategic Planning & Frontier Sciences of the Institute of Soil Science, Chinese Academy of Sciences (ISSASIP1622)the National Key Basic Research Special Foundation of China (2012FY112100)
文摘Environmental covariates are the basis of predictive soil mapping.Their selection determines the performance of soil mapping to a great extent,especially in cases where the number of soil samples is limited but soil spatial heterogeneity is high.In this study,we proposed an integrated method to select environmental covariates for predictive soil depth mapping.First,candidate variables that may influence the development of soil depth were selected based on pedogenetic knowledge.Second,three conventional methods(Pearson correlation analysis(PsCA),generalized additive models(GAMs),and Random Forest(RF))were used to generate optimal combinations of environmental covariates.Finally,three optimal combinations were integrated to produce a final combination based on the importance and occurrence frequency of each environmental covariate.We tested this method for soil depth mapping in the upper reaches of the Heihe River Basin in Northwest China.A total of 129 soil sampling sites were collected using a representative sampling strategy,and RF and support vector machine(SVM)models were used to map soil depth.The results showed that compared to the set of environmental covariates selected by the three conventional selection methods,the set of environmental covariates selected by the proposed method achieved higher mapping accuracy.The combination from the proposed method obtained a root mean square error(RMSE)of 11.88 cm,which was 2.25–7.64 cm lower than the other methods,and an R^2 value of 0.76,which was 0.08–0.26 higher than the other methods.The results suggest that our method can be used as an alternative to the conventional methods for soil depth mapping and may also be effective for mapping other soil properties.
基金Partially supported by the National Natural Science Foundation of China (10771192)NSF Awards DMS-0349048 of USA
文摘It is often important to incorporate covariate information in the design of clinical trials. In literature there are many designs of using stratification and covariate-adaptive randomization to balance certain known covariate. Recently, some covariate-adjusted response-adaptive (CARA) designs have been proposed and their asymptotic properties have been studied (Ann. Statist. 2007). However, these CARA designs usually have high variabilities. In this paper, a new family of covariate-adjusted response-adaptive (CARA) designs is presented. It is shown that the new designs have less variables and therefore are more efficient.
基金supported by grants from the National Natural Science Foundation of China(41431177 and 41871300)the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD),China+4 种基金the Innovation Project of State Key Laboratory of Resources and Environmental Information System(LREIS),China(O88RA20CYA)the Outstanding Innovation Team in Colleges and Universities in Jiangsu Province,ChinaSupports to A-Xing Zhu through the Vilas Associate Awardthe Hammel Faculty Fellow Awardthe Manasse Chair Professorship from the University of Wisconsin-Madison。
文摘Selecting a proper set of covariates is one of the most important factors that influence the accuracy of digital soil mapping(DSM).The statistical or machine learning methods for selecting DSM covariates are not available for those situations with limited samples.To solve the problem,this paper proposed a case-based method which could formalize the covariate selection knowledge contained in practical DSM applications.The proposed method trained Random Forest(RF)classifiers with DSM cases extracted from the practical DSM applications and then used the trained classifiers to determine whether each one potential covariate should be used in a new DSM application.In this study,we took topographic covariates as examples of covariates and extracted 191 DSM cases from 56 peer-reviewed journal articles to evaluate the performance of the proposed case-based method by Leave-One-Out cross validation.Compared with a novices’commonly-used way of selecting DSM covariates,the proposed case-based method improved more than 30%accuracy according to three quantitative evaluation indices(i.e.,recall,precision,and F1-score).The proposed method could be also applied to selecting the proper set of covariates for other similar geographical modeling domains,such as landslide susceptibility mapping,and species distribution modeling.
文摘In WLANs, stations sharing a common wireless channel are governed by IEEE 802.11 protocol. Many conscious studies have been conducted to utilize this precious medium efficiently. However, most of these studies have been done either under assumption of idealistic channel condition or with unlimited retransmitting number. This paper is devoted to investigate influence of limited retransmissions and error level in the utilizing channel on the network throughput, probability of packet dropping and time to drop a packet. The results show that for networks using basic access mechanism the throughput is suppressed with increasing amount of errors in the transmitting channel over all the range of the retry limit. It is also quite sensitive to the size of the network. On the other side, the networks using four-way handshaking mechanism has a good immunity against the error over the available range of retry limits. Also the throughput is unchangeable with size of the network over the range of retransmission limits. However, the throughput does not change with retry limits when it exceeds the maximum number of the backoff stage in both DCF’s mechanisms. In both mechanisms the probability of dropping a packet is a decreasing function with number of retransmissions and the time to drop a packet in the queue of a station is a strong function to the number of retry limit, size of the network, the utilizing medium access mechanism and amount of errors in the channel.
基金supported by Istanbul University Scientific Research Project Department with IRP-51706 Project Number.
文摘Biometric gait recognition is a lesser-known but emerging and effective biometric recognition method which enables subjects’walking patterns to be recognized.Existing research in this area has primarily focused on feature analysis through the extraction of individual features,which captures most of the information but fails to capture subtle variations in gait dynamics.Therefore,a novel feature taxonomy and an approach for deriving a relationship between a function of one set of gait features with another set are introduced.The gait features extracted from body halves divided by anatomical planes on vertical,horizontal,and diagonal axes are grouped to form canonical gait covariates.Canonical Correlation Analysis is utilized to measure the strength of association between the canonical covariates of gait.Thus,gait assessment and identification are enhancedwhenmore semantic information is available through CCA-basedmulti-feature fusion.Hence,CarnegieMellon University’s 3D gait database,which contains 32 gait samples taken at different paces,is utilized in analyzing gait characteristics.The performance of Linear Discriminant Analysis,K-Nearest Neighbors,Naive Bayes,Artificial Neural Networks,and Support Vector Machines was improved by a 4%average when the CCA-utilized gait identification approachwas used.Asignificant maximumaccuracy rate of 97.8%was achieved throughCCA-based gait identification.Beyond that,the rate of false identifications and unrecognized gaits went down to half,demonstrating state-of-the-art for gait identification.
文摘A comprehensive study was presented for WLAN 802.11b using error-prone channel. It was theoretically and numerically evaluated the performance of three different network sizes with the bit rates that available in 802.11b protocol. Results show that throughput does not change with the size of the network for wide range of bit error rates (BERs) and the channel bit rates play a significant role in the main characteristics of the network. A comprehensive explanation has given for the phenomenon of the packet delay suppression at relatively high level of BERs in view of the size of the networks and the BERs. The effect length of the transmitting packets is also investigated.
文摘Mark-recapture models are extensively used in quantitative population ecology, providing estimates of population vital rates, such as survival, that are difficult to obtain using other methods. Vital rates are commonly modeled as functions of explanatory covariates, adding considerable flexibility to mark-recapture models, but also increasing the subjectivity and complexity of the modeling process. Consequently, model selection and the evaluation of covariate structure remain critical aspects of mark-recapture modeling. The difficulties involved in model selection are compounded in Cormack-Jolly-Seber models because they are composed of separate sub-models for survival and recapture probabilities, which are conceptualized independently even though their parameters are not statistically independent. The construction of models as combinations of sub-models, together with multiple potential covariates, can lead to a large model set. Although desirable, estimation of the parameters of all models may not be feasible. Strategies to search a model space and base inference on a subset of all models exist and enjoy widespread use. However, even though the methods used to search a model space can be expected to influence parameter estimation, the assessment of covariate importance, and therefore the ecological interpretation of the modeling results, the performance of these strategies has received limited investigation. We present a new strategy for searching the space of a candidate set of Cormack-Jolly-Seber models and explore its performance relative to existing strategies using computer simulation. The new strategy provides an improved assessment of the importance of covariates and covariate combinations used to model survival and recapture probabilities, while requiring only a modest increase in the number of models on which inference is based in comparison to existing techniques.
文摘Given a sample of regression data from (Y, Z), a new diagnostic plotting method is proposed for checking the hypothesis H0: the data are from a given Cox model with the time-dependent covariates Z. It compares two estimates of the marginal distribution FY of Y. One is an estimate of the modified expression of FY under H0, based on a consistent estimate of the parameter under H0, and based on the baseline distribution of the data. The other is the Kaplan-Meier-estimator of FY, together with its confidence band. The new plot, called the marginal distribution plot, can be viewed as a test for testing H0. The main advantage of the test over the existing residual tests is in the case that the data do not satisfy any Cox model or the Cox model is mis-specified. Then the new test is still valid, but not the residual tests and the residual tests often make type II error with a very large probability.
文摘The consideration of the time-varying covariate and time-varying coefficient effect in survival models are plausible and robust techniques. Such kind of analysis can be carried out with a general class of semiparametric transformation models. The aim of this article is to develop modified estimating equations under semiparametric transformation models of survival time with time-varying coefficient effect and time-varying continuous covariates. For this, it is important to organize the data in a counting process style and transform the time with standard transformation classes which shall be applied in this article. In the situation when the effect of coefficient and covariates change over time, the widely used maximum likelihood estimation method becomes more complex and burdensome in estimating consistent estimates. To overcome this problem, alternatively, the modified estimating equations were applied to estimate the unknown parameters and unspecified monotone transformation functions. The estimating equations were modified to incorporate the time-varying effect in both coefficient and covariates. The performance of the proposed methods is tested through a simulation study. To sum up the study, the effect of possibly time-varying covariates and time-varying coefficients was evaluated in some special cases of semiparametric transformation models. Finally, the results have shown that the role of the time-varying covariate in the semiparametric transformation models was plausible and credible.
文摘The empirical likelihood-based inference for varying coefficient models with missing covariates is investigated. An imputed empirical likelihood ratio function for the coefficient functions is proposed, and it is shown that iis limiting distribution is standard chi-squared. Then the corresponding confidence intervals for the regression coefficients are constructed. Some simulations show that the proposed procedure can attenuate the effect of the missing data, and performs well for the finite sample.
基金Shanghai Science and Technology Commission’s“Belt and Road Initiative”International Cooperation Project,China(No.19410741800)。
文摘Library construction is a common method used to screen target genes in molecular biology.Most library constructions are not suitable for a small DNA library(<100 base pair(bp))and low RNA library output.To maximize the library’s complexity,error-prone polymerase chain reaction(PCR)was used to increase the base mutation rate.After introducing the DNA fragments into the competent cell,the library complexity could reach 109.Library mutation rate increased exponentially with the dilution and amplification of error-prone PCR.The error-prone PCR conditions were optimized including deoxyribonucleotide triphosphate(dNTP)concentration,Mn^(2+)concentration,Mg^(2+)concentration,PCR cycle number,and primer length.Then,a RNA library with high complexity can be obtained by in vitro transcription to meet most molecular biological screening requirements,and can also be used for mRNA vaccine screening.
基金supported by National Natural Science Foundation of China (42001305)Guangdong Basic and Applied Basic Research Foundation (2022A1515011459)+3 种基金GDAS'Special Project of Science and Technology Development (2020GDASYL-20200102001)Guangzhou Basic and Applied Basic Research Foundation (2023A04J1534) to Z.W.the US National Science Foundation (NSF) Macrosystems Biology and NEON-Enabled Science grant 1638720 to P.A.T.,and E.L.K.NSF Biology Integration Institute award ASCEND,DBI-2021898 to P.A.T.
文摘Characterizing foliar trait variation in sun and shade leaves can provide insights into inter-and intra-species resource use strategies and plant response to environmental change.However,datasets with records of multiple foliar traits from the same individual and including shade leaves are sparse,which limits our ability to investigate trait-trait,trait-environment relationships and trait coordination in both sun and shade leaves.We presented a comprehensive dataset of 15 foliar traits from sun and shade leaves sampled with leaf spectroscopy,including 424 individuals of 110 plant species from 19 sites across eastern North America.We investigated trait variation,covariation,scaling relationships with leaf mass,and the effects of environment,canopy position,and taxonomy on trait expression.Generally,sun leaves had higher leaf mass per area,nonstructural carbohydrates and total phenolics,lower mass-based chlorophyll a+b,carotenoids,phosphorus,and potassium,but exhibited species-specific characteristics.Covariation between sun and shade leaf traits,and trait-environment relationships were overall consistent across species.The main dimensions of foliar trait variation in seed plants were revealed including leaf economics traits,photosynthetic pigments,defense,and structural traits.Taxonomy and canopy position collectively explained most of the foliar trait variation.This study highlights the importance of including intra-individual and intra-specific trait variation to improve our understanding of ecosystem functions.Our findings have implications for efficient field sampling,and trait mapping with remote sensing.
基金supported by National Key Research and Development Program of China under Grant 2020YFB1804901State Key Laboratory of Rail Traffic Control and Safety(Contract:No.RCS2022ZT 015)Special Key Project of Technological Innovation and Application Development of Chongqing Science and Technology Bureau(cstc2019jscx-fxydX0053).
文摘Spatial covariance matrix(SCM) is essential in many multi-antenna systems such as massive multiple-input multiple-output(MIMO). For multi-antenna systems operating at millimeter-wave bands, hybrid analog-digital structure has been widely adopted to reduce the cost of radio frequency chains.In this situation, signals received at the antennas are unavailable to the digital receiver, and as a consequence, traditional sample average approach cannot be used for SCM reconstruction in hybrid multi-antenna systems. To address this issue, beam sweeping algorithm(BSA) which can reconstruct the SCM effectively for a hybrid uniform linear array, has been proposed in our previous works. However, direct extension of BSA to a hybrid uniform circular array(UCA)will result in a huge computational burden. To this end, a low-complexity approach is proposed in this paper. By exploiting the symmetry features of SCM for the UCA, the number of unknowns can be reduced significantly and thus the complexity of reconstruction can be saved accordingly. Furthermore, an insightful analysis is also presented in this paper, showing that the reduction of the number of unknowns can also improve the accuracy of the reconstructed SCM. Simulation results are also shown to demonstrate the proposed approach.
基金the Canadian Institutes of Health Research Quebec’s Ministry of Health and Social Services+1 种基金 the Estrie Regional Health and Social Services Agencythe Universite de Sherbrooke
文摘The objective of this study was to explore the relationship of sociodemographic, clinical, and health-services use-related variables with transitions between disability-based profiles. In a longitudinal study of 1386 people aged 75 and over living in the community at baseline, disabilities were assessed annually for up to four years with the Functional Autonomy Measurement System (SMAF), which generates 14 Iso-SMAF profiles. These profiles are grouped into 4 disability states, which are predominant alterations in instrumental activities of daily living (IADLs), mobility, mental functions as well as severe and mixed disabilities. Continuous-time, multi-state Markov modeling was used to identify the factors associated with transitions made by older people between these states and to institutionalization and death. Greater age and receiving help for ADL were associated with four transitions, while altered cognitive functions and hospitalization were associated with three, all involving more decline or less recovery. From mild IADL profiles, men have a higher risk of transitioning to intermediate predominantly mental profiles, while women are at higher risk of transitioning to intermediate predominantly mobility profiles. Unmet needs are associated with deterioration, from mild IADL to intermediate predominantly mobility profiles. These results help understanding the complex progression of disabilities in older people.
文摘An improved method for estimation of causal effects from observational data is demonstrated. Applications in medicine have been few, and the purpose of the present study is to contribute new clinical insight by means of this new and more sophisticated analysis. Long term effect of medication for adult ADHD patients is not resolved. A model with causal parameters to represent effect of medication was formulated, which accounts for time-varying confounding and selection-bias from loss to follow-up. The popular marginal structural model (MSM) for causal inference, of Robins et al., adjusts for time-varying confounding, but suffers from lack of robustness for misspecification in the weights. Recent work by Imai and Ratkovic?[1][2] achieves robustness in the MSM, through improved covariate balance (CBMSM). The CBMSM (freely available software) was compared with a standard fit of a MSM and a naive regression model, to give a robust estimate of the true treatment effect in 250 previously non-medicated adults, treated for one year, in a specialized ADHD outpatient clinic in Norway. Covariate balance was greatly improved, resulting in a stronger treatment effect than without this improvement. In terms of treatment effect per week, early stages seemed to have the strongest influence. An estimated average reduction of 4 units on the symptom scale assessed at 12 weeks, for hypothetical medication in the 9 - 12 weeks period compared to no medication in this period, was found. The treatment effect persisted throughout the whole year, with an estimated average reduction of 0.7 units per week on symptoms assessed at one year, for hypothetical medication in the last 13 weeks of the year, compared to no medication in this period. The present findings support a strong and causal direct and indirect effect of pharmacological treatment of adults with ADHD on improvement in symptoms, and with a stronger treatment effect than has been reported.
文摘The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.