Data from the 2013 Canadian Tobacco, Alcohol and Drugs Survey, and two other surveys are used to determine the effects of cannabis use on self-reported physical and mental health. Daily or almost daily marijuana use i...Data from the 2013 Canadian Tobacco, Alcohol and Drugs Survey, and two other surveys are used to determine the effects of cannabis use on self-reported physical and mental health. Daily or almost daily marijuana use is shown to be detrimental to both measures of health for some age groups but not all. The age group specific effects depend on gender. Males and females respond differently to cannabis use. The health costs of regularly using cannabis are significant but they are much smaller than those associated with tobacco use. These costs are attributed to both the presence of delta9-tetrahydrocannabinol and the fact that smoking cannabis is itself a health hazard because of the toxic properties of the smoke ingested. Cannabis use is costlier to regular smokers and age of first use below the age of 15 or 20 and being a former user leads to reduced physical and mental capacities which are permanent. These results strongly suggest that the legalization of marijuana be accompanied by educational programs, counseling services, and a delivery system, which minimizes juvenile and young adult usage.展开更多
In ordcr to asscss the school attendance status of children aged 7-14 to determine the causes of non-at-tendance,and to formulate appropriate policics for the implementation of the ninc-ycar compulsory cduca-tion prog...In ordcr to asscss the school attendance status of children aged 7-14 to determine the causes of non-at-tendance,and to formulate appropriate policics for the implementation of the ninc-ycar compulsory cduca-tion programme,a sample survcy on school--agc:children was carried out in Jianhc,Lcishan and Taijang,Guizhou Province in October 1993.展开更多
In computer security,the number of malware threats is increasing and causing damage to systems for individuals or organizations,necessitating a new detection technique capable of detecting a new variant of malware mor...In computer security,the number of malware threats is increasing and causing damage to systems for individuals or organizations,necessitating a new detection technique capable of detecting a new variant of malware more efficiently than traditional anti-malware methods.Traditional antimalware software cannot detect new malware variants,and conventional techniques such as static analysis,dynamic analysis,and hybrid analysis are time-consuming and rely on domain experts.Visualization-based malware detection has recently gained popularity due to its accuracy,independence from domain experts,and faster detection time.Visualization-based malware detection uses the image representation of the malware binary and applies image processing techniques to the image.This paper aims to provide readers with a comprehensive understanding of malware detection and focuses on visualization-based malware detection.展开更多
In the field work of populationbased research, 3 groups of eyes were graded by 2 observers in LOCS Ⅱ. The reproducibility of LOCS Ⅱwas evaluated by agreements(85%-100%) and k values(0.661-1) obtained in our study. T...In the field work of populationbased research, 3 groups of eyes were graded by 2 observers in LOCS Ⅱ. The reproducibility of LOCS Ⅱwas evaluated by agreements(85%-100%) and k values(0.661-1) obtained in our study. The satisfying results show that LOCS Ⅱis not only easy to be learned and to be applied consistently by different observers, but also good reproducibility in the field work. The longitudinal cataract study is going to be performed in our plan.展开更多
In this paper, the problem of non-response with significant travel costs in multivariate stratified sample surveys has been formulated of as a Multi-Objective Geometric Programming Problem (MOGPP). The fuzzy programmi...In this paper, the problem of non-response with significant travel costs in multivariate stratified sample surveys has been formulated of as a Multi-Objective Geometric Programming Problem (MOGPP). The fuzzy programming approach has been described for solving the formulated MOGPP. The formulated MOGPP has been solved with the help of LINGO Software and the dual solution is obtained. The optimum allocations of sample sizes of respondents and non respondents are obtained with the help of dual solutions and primal-dual relationship theorem. A numerical example is given to illustrate the procedure.展开更多
Research surveys are believed to have originated in antiquity with evidence of them being performed in ancient Egypt and Greece.In the past century,their use has grown significantly and they are now one of the most fr...Research surveys are believed to have originated in antiquity with evidence of them being performed in ancient Egypt and Greece.In the past century,their use has grown significantly and they are now one of the most frequently employed research methods including in the field of healthcare.Modern validation techniques and processes have allowed researchers to broaden the scope of qualitative data they can gather through these surveys such as an individual’s views on service quality to nationwide surveys that are undertaken regularly to follow healthcare trends.This article focuses on the evolution and current utility of research surveys,different methodologies employed in their creation,the advantages and disadvantages of different forms and their future use in healthcare research.We also review the role artificial intelligence and the importance of increased patient participation in the development of these surveys in order to obtain more accurate and clinically relevant data.展开更多
Fishery-independent surveys are often used for collecting high quality biological and ecological data to support fisheries management. A careful optimization of fishery-independent survey design is necessary to improv...Fishery-independent surveys are often used for collecting high quality biological and ecological data to support fisheries management. A careful optimization of fishery-independent survey design is necessary to improve the precision of survey estimates with cost-effective sampling efforts. We developed a simulation approach to evaluate and optimize the stratification scheme for a fishery-independent survey with multiple goals including estimation of abundance indices of individual species and species diversity indices. We compared the performances of the sampling designs with different stratification schemes for different goals over different months. Gains in precision of survey estimates from the stratification schemes were acquired compared to simple random sampling design for most indices. The stratification scheme with five strata performed the best. This study showed that the loss of precision of survey estimates due to the reduction of sampling efforts could be compensated by improved stratification schemes, which would reduce the cost and negative impacts of survey trawling on those species with low abundance in the fishery-independent survey. This study also suggests that optimization of a survey design differed with different survey objectives. A post-survey analysis can improve the stratification scheme of fishery-independent survey designs.展开更多
Objective: To investigate the uptake rate of prostate specific antigen(PSA) testing among Hong Kong Chinese males aged 50 or above, and identify factors associated with the likelihood of undergoing a PSA test.Methods:...Objective: To investigate the uptake rate of prostate specific antigen(PSA) testing among Hong Kong Chinese males aged 50 or above, and identify factors associated with the likelihood of undergoing a PSA test.Methods: A population-based telephone survey was conducted in Hong Kong in 2007. The survey covered demographic information, perceived health status, use of complementary therapy, cancer screening behavior, perceived susceptibility to cancer and family history of cancer. Descriptive statistics, percentages and logistic regression analysis were used for data analysis.Results: A total of 1,002 men aged 50 or above took part in the study(response rate =67%), and the uptake rate of PSA testing was found to be 10%. Employment status, use of complementary therapy, perceiving regular visits to a doctor as good for health and the recommendations of health professionals were significant factors associated with PSA testing.Conclusion: The uptake rate of PSA testing in the study population was very low. Among all the factors identified, recommendations from health professionals had the strongest association with the uptake of PSA testing, and they should therefore take an active role in educating this population about cancer prevention and detection.展开更多
This paper proposes a new method for increasing the precision in survey sam- pling, i.e., a method combining sampling with prediction. The two cases where auxiliary information is or not available are considered. A nu...This paper proposes a new method for increasing the precision in survey sam- pling, i.e., a method combining sampling with prediction. The two cases where auxiliary information is or not available are considered. A numerical example is given.展开更多
This paper develops a sampling method to estimate the integral of a function of the area with a strategy to cover the area with parallel lines of observation. This sampling strategy is special in that lines very close...This paper develops a sampling method to estimate the integral of a function of the area with a strategy to cover the area with parallel lines of observation. This sampling strategy is special in that lines very close to each other are selected much more seldom than under a uniformly random design for the positions of the parallel lines. It is also special in that the positions of some of the lines are deterministic. Two different variance estimators are derived and investigated by sampling different man made signal functions. They show different properties in that the estimator that estimate the biggest variance gives an error interval that, in some situations, may be more than ten times the error interval computed from the other estimator. It is also obvious that the second estimator underestimates the variance. The author has not succeeded to derive an expression for the expectation of this estimator. This work is motivated towards finding the variance of acoustic abundance estimates.展开更多
In stratified survey sampling, sometimes we have complete auxiliary information. One of the fundamental questions is how to effectively use the complete auxiliary information at the estimation stage. In this paper, we...In stratified survey sampling, sometimes we have complete auxiliary information. One of the fundamental questions is how to effectively use the complete auxiliary information at the estimation stage. In this paper, we extend the model-calibration method to obtain estimators of the finite population mean by using complete auxiliary information from stratified sampling survey data. We show that the resulting estimators effectively use auxiliary information at the estimation stage and possess a number of attractive features such as asymptotically design-unbiased irrespective of the working model and approximately model-unbiased under the model. When a linear working-model is used, the resulting estimators reduce to the usual calibration estimator(or GREG).展开更多
[ Objective] The study aimed to understand the status of piglet diarrhea virus in large-scale farms in surrounding areas of Beijing. [ Method] By collecting 570 umbilical cord blood samples of diseased pigs, nucleic a...[ Objective] The study aimed to understand the status of piglet diarrhea virus in large-scale farms in surrounding areas of Beijing. [ Method] By collecting 570 umbilical cord blood samples of diseased pigs, nucleic acids of porcine reproductive and respiratory syndrome virus (PRRSV) , classical swine fever virus (CSFV), porcine pseudorabies virus (PRV), porcine circovirus (PCV), porcine epidemic diarrhea virus (PEDV), transmissible gastroenteritis virus (TGEV), and porcine rotavirus (RV) were detected by PCR or RT-PCR assay. [Result] The detection rates of PRRSV, PCV-2, PEI)V, PPV, TGEV, PRV, RV and CSFV were 38.7%, 29.6%, 21.8%, 13.03%, 5.6%, 4.2%, 3.8% and 1.7%, respectively. [ Conclusion] The major pathogens were PRRSV, PCV-2, PEDV and PPV, and mixed infection was serious.展开更多
Complex survey designs often involve unequal selection probabilities of clus-ters or units within clusters. When estimating models for complex survey data, scaled weights are incorporated into the likelihood, producin...Complex survey designs often involve unequal selection probabilities of clus-ters or units within clusters. When estimating models for complex survey data, scaled weights are incorporated into the likelihood, producing a pseudo likeli-hood. In a 3-level weighted analysis for a binary outcome, we implemented two methods for scaling the sampling weights in the National Health Survey of Pa-kistan (NHSP). For NHSP with health care utilization as a binary outcome we found age, gender, household (HH) goods, urban/rural status, community de-velopment index, province and marital status as significant predictors of health care utilization (p-value < 0.05). The variance of the random intercepts using scaling method 1 is estimated as 0.0961 (standard error 0.0339) for PSU level, and 0.2726 (standard error 0.0995) for household level respectively. Both esti-mates are significantly different from zero (p-value < 0.05) and indicate consid-erable heterogeneity in health care utilization with respect to households and PSUs. The results of the NHSP data analysis showed that all three analyses, weighted (two scaling methods) and un-weighted, converged to almost identical results with few exceptions. This may have occurred because of the large num-ber of 3rd and 2nd level clusters and relatively small ICC. We performed a sim-ulation study to assess the effect of varying prevalence and intra-class correla-tion coefficients (ICCs) on bias of fixed effect parameters and variance components of a multilevel pseudo maximum likelihood (weighted) analysis. The simulation results showed that the performance of the scaled weighted estimators is satisfactory for both scaling methods. Incorporating simulation into the analysis of complex multilevel surveys allows the integrity of the results to be tested and is recommended as good practice.展开更多
An innovative use of spatial sampling designs is here presented. Sampling methods which consider spatial locations of statistical units are already used in agricultural and environmental contexts, while they have neve...An innovative use of spatial sampling designs is here presented. Sampling methods which consider spatial locations of statistical units are already used in agricultural and environmental contexts, while they have never been exploited for establishment surveys. However, the rapidly increasing availability of geo- referenced information about business units makes that possible. In business studies, it may indeed be important to take into account the presence of spatial autocorrelation or spatial trends in the variables of interest, in order to have more precise and efficient estimates. The opportunity of using the most innovative spatial sampling designs in business surveys, in order to produce samples that are well spread in space, is here tested by means of Monte Carlo experiments. For all designs, the Horvitz-Thompson estimator of the population total is used both with equal and unequal inclusion probabilities. The efficiency of sampling designs is evaluated in terms of relative RMSE and efficiency gain compared with designs ignoring the spatial information. Furthermore, an evaluation of spatially balancing samples is also conducted.展开更多
Simple passive diffusive samplers were used for the determination of SO 2, NO 2 and NH 3 in atmosphere from three stations at high latitudes and in Arctic area. The concentrations of SO 2, NO 2 and NH 3 were fou...Simple passive diffusive samplers were used for the determination of SO 2, NO 2 and NH 3 in atmosphere from three stations at high latitudes and in Arctic area. The concentrations of SO 2, NO 2 and NH 3 were found to be below 1 0, 0 3 and 2 0 μg/m 3 respectively. These values were obtained with sampling periods of 5 10 d. These preliminary data suggest that SO 2, NO 2 concentrations should be lower 2 order of magnitude than those of Beijing area, and an order of magnitude than those of other areas with less pollution in China.展开更多
Logistic Regression Models have been widely used in many areas of research, namely in health sciences, to study risk factors associated to diseases. Many population based surveys, such as Demographic and Health Survey...Logistic Regression Models have been widely used in many areas of research, namely in health sciences, to study risk factors associated to diseases. Many population based surveys, such as Demographic and Health Survey (DHS), are constructed assuming complex sampling, i.e., probabilistic, stratified and multistage sampling, with unequal weights in the observations;this complex design must be taken into account in order to have reliable results. However, this very relevant issue usually is not well analyzed in the literature. The aim of the study is to specify the logistic regression model with complex sample design, and to demonstrate how to estimate it using the R software survey package. More specifically, we used Mozambique Demographic Health and Survey data 2011 (MDHS 2011) to illustrate how to correct for the effect of sample design in the particular case of estimating the risk factors associated to the probability of using mosquito bed nets. Our results show that in the presence of complex sampling, appropriate methods must be used both in descriptive and inferential statistics.展开更多
Recently,self-supervised learning has shown great potential in Graph Neural Networks (GNNs) through contrastive learning,which aims to learn discriminative features for each node without label information. The key to ...Recently,self-supervised learning has shown great potential in Graph Neural Networks (GNNs) through contrastive learning,which aims to learn discriminative features for each node without label information. The key to graph contrastive learning is data augmentation. The anchor node regards its augmented samples as positive samples,and the rest of the samples are regarded as negative samples,some of which may be positive samples. We call these mislabeled samples as “false negative” samples,which will seriously affect the final learning effect. Since such semantically similar samples are ubiquitous in the graph,the problem of false negative samples is very significant. To address this issue,the paper proposes a novel model,False negative sample Detection for Graph Contrastive Learning (FD4GCL),which uses attribute and structure-aware to detect false negative samples. Experimental results on seven datasets show that FD4GCL outperforms the state-of-the-art baselines and even exceeds several supervised methods.展开更多
Accurate and reliable fault detection is essential for the safe operation of electric vehicles.Support vector data description(SVDD)has been widely used in the field of fault detection.However,constructing the hypersp...Accurate and reliable fault detection is essential for the safe operation of electric vehicles.Support vector data description(SVDD)has been widely used in the field of fault detection.However,constructing the hypersphere boundary only describes the distribution of unlabeled samples,while the distribution of faulty samples cannot be effectively described and easilymisses detecting faulty data due to the imbalance of sample distribution.Meanwhile,selecting parameters is critical to the detection performance,and empirical parameterization is generally timeconsuming and laborious and may not result in finding the optimal parameters.Therefore,this paper proposes a semi-supervised data-driven method based on which the SVDD algorithm is improved and achieves excellent fault detection performance.By incorporating faulty samples into the underlying SVDD model,training deals better with the problem of missing detection of faulty samples caused by the imbalance in the distribution of abnormal samples,and the hypersphere boundary ismodified to classify the samplesmore accurately.The Bayesian Optimization NSVDD(BO-NSVDD)model was constructed to quickly and accurately optimize hyperparameter combinations.In the experiments,electric vehicle operation data with four common fault types are used to evaluate the performance with other five models,and the results show that the BO-NSVDD model presents superior detection performance for each type of fault data,especially in the imperceptible early and minor faults,which has seen very obvious advantages.Finally,the strong robustness of the proposed method is verified by adding different intensities of noise in the dataset.展开更多
文摘Data from the 2013 Canadian Tobacco, Alcohol and Drugs Survey, and two other surveys are used to determine the effects of cannabis use on self-reported physical and mental health. Daily or almost daily marijuana use is shown to be detrimental to both measures of health for some age groups but not all. The age group specific effects depend on gender. Males and females respond differently to cannabis use. The health costs of regularly using cannabis are significant but they are much smaller than those associated with tobacco use. These costs are attributed to both the presence of delta9-tetrahydrocannabinol and the fact that smoking cannabis is itself a health hazard because of the toxic properties of the smoke ingested. Cannabis use is costlier to regular smokers and age of first use below the age of 15 or 20 and being a former user leads to reduced physical and mental capacities which are permanent. These results strongly suggest that the legalization of marijuana be accompanied by educational programs, counseling services, and a delivery system, which minimizes juvenile and young adult usage.
文摘In ordcr to asscss the school attendance status of children aged 7-14 to determine the causes of non-at-tendance,and to formulate appropriate policics for the implementation of the ninc-ycar compulsory cduca-tion programme,a sample survcy on school--agc:children was carried out in Jianhc,Lcishan and Taijang,Guizhou Province in October 1993.
文摘In computer security,the number of malware threats is increasing and causing damage to systems for individuals or organizations,necessitating a new detection technique capable of detecting a new variant of malware more efficiently than traditional anti-malware methods.Traditional antimalware software cannot detect new malware variants,and conventional techniques such as static analysis,dynamic analysis,and hybrid analysis are time-consuming and rely on domain experts.Visualization-based malware detection has recently gained popularity due to its accuracy,independence from domain experts,and faster detection time.Visualization-based malware detection uses the image representation of the malware binary and applies image processing techniques to the image.This paper aims to provide readers with a comprehensive understanding of malware detection and focuses on visualization-based malware detection.
文摘In the field work of populationbased research, 3 groups of eyes were graded by 2 observers in LOCS Ⅱ. The reproducibility of LOCS Ⅱwas evaluated by agreements(85%-100%) and k values(0.661-1) obtained in our study. The satisfying results show that LOCS Ⅱis not only easy to be learned and to be applied consistently by different observers, but also good reproducibility in the field work. The longitudinal cataract study is going to be performed in our plan.
文摘In this paper, the problem of non-response with significant travel costs in multivariate stratified sample surveys has been formulated of as a Multi-Objective Geometric Programming Problem (MOGPP). The fuzzy programming approach has been described for solving the formulated MOGPP. The formulated MOGPP has been solved with the help of LINGO Software and the dual solution is obtained. The optimum allocations of sample sizes of respondents and non respondents are obtained with the help of dual solutions and primal-dual relationship theorem. A numerical example is given to illustrate the procedure.
文摘Research surveys are believed to have originated in antiquity with evidence of them being performed in ancient Egypt and Greece.In the past century,their use has grown significantly and they are now one of the most frequently employed research methods including in the field of healthcare.Modern validation techniques and processes have allowed researchers to broaden the scope of qualitative data they can gather through these surveys such as an individual’s views on service quality to nationwide surveys that are undertaken regularly to follow healthcare trends.This article focuses on the evolution and current utility of research surveys,different methodologies employed in their creation,the advantages and disadvantages of different forms and their future use in healthcare research.We also review the role artificial intelligence and the importance of increased patient participation in the development of these surveys in order to obtain more accurate and clinically relevant data.
基金The Public Science and Technology Research Funds Projects of Ocean under contract No.201305030the Specialized Research Fund for the Doctoral Program of Higher Education under contract No.20120132130001
文摘Fishery-independent surveys are often used for collecting high quality biological and ecological data to support fisheries management. A careful optimization of fishery-independent survey design is necessary to improve the precision of survey estimates with cost-effective sampling efforts. We developed a simulation approach to evaluate and optimize the stratification scheme for a fishery-independent survey with multiple goals including estimation of abundance indices of individual species and species diversity indices. We compared the performances of the sampling designs with different stratification schemes for different goals over different months. Gains in precision of survey estimates from the stratification schemes were acquired compared to simple random sampling design for most indices. The stratification scheme with five strata performed the best. This study showed that the loss of precision of survey estimates due to the reduction of sampling efforts could be compensated by improved stratification schemes, which would reduce the cost and negative impacts of survey trawling on those species with low abundance in the fishery-independent survey. This study also suggests that optimization of a survey design differed with different survey objectives. A post-survey analysis can improve the stratification scheme of fishery-independent survey designs.
基金supported by the Chinese University of Hong Kong
文摘Objective: To investigate the uptake rate of prostate specific antigen(PSA) testing among Hong Kong Chinese males aged 50 or above, and identify factors associated with the likelihood of undergoing a PSA test.Methods: A population-based telephone survey was conducted in Hong Kong in 2007. The survey covered demographic information, perceived health status, use of complementary therapy, cancer screening behavior, perceived susceptibility to cancer and family history of cancer. Descriptive statistics, percentages and logistic regression analysis were used for data analysis.Results: A total of 1,002 men aged 50 or above took part in the study(response rate =67%), and the uptake rate of PSA testing was found to be 10%. Employment status, use of complementary therapy, perceiving regular visits to a doctor as good for health and the recommendations of health professionals were significant factors associated with PSA testing.Conclusion: The uptake rate of PSA testing in the study population was very low. Among all the factors identified, recommendations from health professionals had the strongest association with the uptake of PSA testing, and they should therefore take an active role in educating this population about cancer prevention and detection.
基金Supported by the National Natural Science Foundation of China
文摘This paper proposes a new method for increasing the precision in survey sam- pling, i.e., a method combining sampling with prediction. The two cases where auxiliary information is or not available are considered. A numerical example is given.
文摘This paper develops a sampling method to estimate the integral of a function of the area with a strategy to cover the area with parallel lines of observation. This sampling strategy is special in that lines very close to each other are selected much more seldom than under a uniformly random design for the positions of the parallel lines. It is also special in that the positions of some of the lines are deterministic. Two different variance estimators are derived and investigated by sampling different man made signal functions. They show different properties in that the estimator that estimate the biggest variance gives an error interval that, in some situations, may be more than ten times the error interval computed from the other estimator. It is also obvious that the second estimator underestimates the variance. The author has not succeeded to derive an expression for the expectation of this estimator. This work is motivated towards finding the variance of acoustic abundance estimates.
基金Supported by the National Natural Science Foundation of China(10571093)
文摘In stratified survey sampling, sometimes we have complete auxiliary information. One of the fundamental questions is how to effectively use the complete auxiliary information at the estimation stage. In this paper, we extend the model-calibration method to obtain estimators of the finite population mean by using complete auxiliary information from stratified sampling survey data. We show that the resulting estimators effectively use auxiliary information at the estimation stage and possess a number of attractive features such as asymptotically design-unbiased irrespective of the working model and approximately model-unbiased under the model. When a linear working-model is used, the resulting estimators reduce to the usual calibration estimator(or GREG).
文摘[ Objective] The study aimed to understand the status of piglet diarrhea virus in large-scale farms in surrounding areas of Beijing. [ Method] By collecting 570 umbilical cord blood samples of diseased pigs, nucleic acids of porcine reproductive and respiratory syndrome virus (PRRSV) , classical swine fever virus (CSFV), porcine pseudorabies virus (PRV), porcine circovirus (PCV), porcine epidemic diarrhea virus (PEDV), transmissible gastroenteritis virus (TGEV), and porcine rotavirus (RV) were detected by PCR or RT-PCR assay. [Result] The detection rates of PRRSV, PCV-2, PEI)V, PPV, TGEV, PRV, RV and CSFV were 38.7%, 29.6%, 21.8%, 13.03%, 5.6%, 4.2%, 3.8% and 1.7%, respectively. [ Conclusion] The major pathogens were PRRSV, PCV-2, PEDV and PPV, and mixed infection was serious.
文摘Complex survey designs often involve unequal selection probabilities of clus-ters or units within clusters. When estimating models for complex survey data, scaled weights are incorporated into the likelihood, producing a pseudo likeli-hood. In a 3-level weighted analysis for a binary outcome, we implemented two methods for scaling the sampling weights in the National Health Survey of Pa-kistan (NHSP). For NHSP with health care utilization as a binary outcome we found age, gender, household (HH) goods, urban/rural status, community de-velopment index, province and marital status as significant predictors of health care utilization (p-value < 0.05). The variance of the random intercepts using scaling method 1 is estimated as 0.0961 (standard error 0.0339) for PSU level, and 0.2726 (standard error 0.0995) for household level respectively. Both esti-mates are significantly different from zero (p-value < 0.05) and indicate consid-erable heterogeneity in health care utilization with respect to households and PSUs. The results of the NHSP data analysis showed that all three analyses, weighted (two scaling methods) and un-weighted, converged to almost identical results with few exceptions. This may have occurred because of the large num-ber of 3rd and 2nd level clusters and relatively small ICC. We performed a sim-ulation study to assess the effect of varying prevalence and intra-class correla-tion coefficients (ICCs) on bias of fixed effect parameters and variance components of a multilevel pseudo maximum likelihood (weighted) analysis. The simulation results showed that the performance of the scaled weighted estimators is satisfactory for both scaling methods. Incorporating simulation into the analysis of complex multilevel surveys allows the integrity of the results to be tested and is recommended as good practice.
文摘An innovative use of spatial sampling designs is here presented. Sampling methods which consider spatial locations of statistical units are already used in agricultural and environmental contexts, while they have never been exploited for establishment surveys. However, the rapidly increasing availability of geo- referenced information about business units makes that possible. In business studies, it may indeed be important to take into account the presence of spatial autocorrelation or spatial trends in the variables of interest, in order to have more precise and efficient estimates. The opportunity of using the most innovative spatial sampling designs in business surveys, in order to produce samples that are well spread in space, is here tested by means of Monte Carlo experiments. For all designs, the Horvitz-Thompson estimator of the population total is used both with equal and unequal inclusion probabilities. The efficiency of sampling designs is evaluated in terms of relative RMSE and efficiency gain compared with designs ignoring the spatial information. Furthermore, an evaluation of spatially balancing samples is also conducted.
文摘Simple passive diffusive samplers were used for the determination of SO 2, NO 2 and NH 3 in atmosphere from three stations at high latitudes and in Arctic area. The concentrations of SO 2, NO 2 and NH 3 were found to be below 1 0, 0 3 and 2 0 μg/m 3 respectively. These values were obtained with sampling periods of 5 10 d. These preliminary data suggest that SO 2, NO 2 concentrations should be lower 2 order of magnitude than those of Beijing area, and an order of magnitude than those of other areas with less pollution in China.
文摘Logistic Regression Models have been widely used in many areas of research, namely in health sciences, to study risk factors associated to diseases. Many population based surveys, such as Demographic and Health Survey (DHS), are constructed assuming complex sampling, i.e., probabilistic, stratified and multistage sampling, with unequal weights in the observations;this complex design must be taken into account in order to have reliable results. However, this very relevant issue usually is not well analyzed in the literature. The aim of the study is to specify the logistic regression model with complex sample design, and to demonstrate how to estimate it using the R software survey package. More specifically, we used Mozambique Demographic Health and Survey data 2011 (MDHS 2011) to illustrate how to correct for the effect of sample design in the particular case of estimating the risk factors associated to the probability of using mosquito bed nets. Our results show that in the presence of complex sampling, appropriate methods must be used both in descriptive and inferential statistics.
基金supported by the National Key Research and Development Program of China(No.2021YFB3300503)Regional Innovation and Development Joint Fund of National Natural Science Foundation of China(No.U22A20167)National Natural Science Foundation of China(No.61872260).
文摘Recently,self-supervised learning has shown great potential in Graph Neural Networks (GNNs) through contrastive learning,which aims to learn discriminative features for each node without label information. The key to graph contrastive learning is data augmentation. The anchor node regards its augmented samples as positive samples,and the rest of the samples are regarded as negative samples,some of which may be positive samples. We call these mislabeled samples as “false negative” samples,which will seriously affect the final learning effect. Since such semantically similar samples are ubiquitous in the graph,the problem of false negative samples is very significant. To address this issue,the paper proposes a novel model,False negative sample Detection for Graph Contrastive Learning (FD4GCL),which uses attribute and structure-aware to detect false negative samples. Experimental results on seven datasets show that FD4GCL outperforms the state-of-the-art baselines and even exceeds several supervised methods.
基金supported partially by NationalNatural Science Foundation of China(NSFC)(No.U21A20146)Collaborative Innovation Project of Anhui Universities(No.GXXT-2020-070)+8 种基金Cooperation Project of Anhui Future Technology Research Institute and Enterprise(No.2023qyhz32)Development of a New Dynamic Life Prediction Technology for Energy Storage Batteries(No.KH10003598)Opening Project of Key Laboratory of Electric Drive and Control of Anhui Province(No.DQKJ202304)Anhui Provincial Department of Education New Era Education Quality Project(No.2023dshwyx019)Special Fund for Collaborative Innovation between Anhui Polytechnic University and Jiujiang District(No.2022cyxtb10)Key Research and Development Program of Wuhu City(No.2022yf42)Open Research Fund of Anhui Key Laboratory of Detection Technology and Energy Saving Devices(No.JCKJ2021B06)Anhui Provincial Graduate Student Innovation and Entrepreneurship Practice Project(No.2022cxcysj123)Key Scientific Research Project for Anhui Universities(No.2022AH050981).
文摘Accurate and reliable fault detection is essential for the safe operation of electric vehicles.Support vector data description(SVDD)has been widely used in the field of fault detection.However,constructing the hypersphere boundary only describes the distribution of unlabeled samples,while the distribution of faulty samples cannot be effectively described and easilymisses detecting faulty data due to the imbalance of sample distribution.Meanwhile,selecting parameters is critical to the detection performance,and empirical parameterization is generally timeconsuming and laborious and may not result in finding the optimal parameters.Therefore,this paper proposes a semi-supervised data-driven method based on which the SVDD algorithm is improved and achieves excellent fault detection performance.By incorporating faulty samples into the underlying SVDD model,training deals better with the problem of missing detection of faulty samples caused by the imbalance in the distribution of abnormal samples,and the hypersphere boundary ismodified to classify the samplesmore accurately.The Bayesian Optimization NSVDD(BO-NSVDD)model was constructed to quickly and accurately optimize hyperparameter combinations.In the experiments,electric vehicle operation data with four common fault types are used to evaluate the performance with other five models,and the results show that the BO-NSVDD model presents superior detection performance for each type of fault data,especially in the imperceptible early and minor faults,which has seen very obvious advantages.Finally,the strong robustness of the proposed method is verified by adding different intensities of noise in the dataset.