Background Combinations of coronary heart disease(CHD) and other chronic conditions complicate clinical management and increase healthcare costs. The aim of this study was to evaluate gender-specific relationships bet...Background Combinations of coronary heart disease(CHD) and other chronic conditions complicate clinical management and increase healthcare costs. The aim of this study was to evaluate gender-specific relationships between CHD and other comorbidities. Methods We analyzed data from the German Health Interview and Examination Survey(DEGS1), a national survey of 8152 adults aged 18-79 years. Female and male participants with self-reported CHD were compared for 23 chronic medical conditions. Regression models were applied to determine potential associations between CHD and these 23 conditions. Results The prevalence of CHD was 9%(547 participants): 34%(185) were female CHD participants and 66%(362) male. In women, CHD was associated with hypertension(OR = 3.28(1.81-5.9)), lipid disorders(OR = 2.40(1.50-3.83)), diabetes mellitus(OR = 2.08(1.24-3.50)), kidney disease(OR = 2.66(1.101-6.99)), thyroid disease(OR = 1.81(1.18-2.79)), gout/high uric acid levels(OR = 2.08(1.22-3.56)) and osteoporosis(OR = 1.69(1.01-2.84)). In men, CHD patients were more likely to have hypertension(OR = 2.80(1.94-4.04)), diabetes mellitus(OR = 1.87(1.29-2.71)), lipid disorder(OR = 1.82(1.34-2.47)), and chronic kidney disease(OR = 3.28(1.81-5.9)). Conclusion Our analysis revealed two sets of chronic conditions associated with CHD. The first set occurred in both women and men, and comprised known risk factors: hypertension, lipid disorders, kidney disease, and diabetes mellitus. The second set appeared unique to women: thyroid disease, osteoporosis, and gout/high uric acid. Identification of shared and unique gender-related associations between CHD and other conditions provides potential to tailor screening, preventive, and therapeutic options.展开更多
The Large sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST) general survey is a spectroscopic survey that will eventually cover approximately half of the celestial sphere and collect 10 million spectra of ...The Large sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST) general survey is a spectroscopic survey that will eventually cover approximately half of the celestial sphere and collect 10 million spectra of stars, galaxies and QSOs. Objects in both the pilot survey and the first year regular survey are included in the LAMOST DR1. The pilot survey started in October 2011 and ended in June 2012, and the data have been released to the public as the LAMOST Pilot Data Release in August 2012. The regular survey started in September 2012, and completed its first year of operation in June 2013. The LAMOST DR1 includes a total of 1202 plates containing 2 955 336 spectra, of which 1 790 879 spectra have observed signalto-noise ratio(SNR) ≥ 10. All data with SNR ≥ 2 are formally released as LAMOST DR1 under the LAMOST data policy. This data release contains a total of 2 204 696 spectra, of which 1 944 329 are stellar spectra, 12 082 are galaxy spectra and 5017 are quasars. The DR1 not only includes spectra, but also three stellar catalogs with measured parameters: late A,FGK-type stars with high quality spectra(1 061 918 entries), A-type stars(100 073 entries), and M-type stars(121 522 entries). This paper introduces the survey design, the observational and instrumental limitations, data reduction and analysis, and some caveats. A description of the FITS structure of spectral files and parameter catalogs is also provided.展开更多
The objective of this paper is to improve the monitoring speed and precision of fractional vegetation cover (fc). It mainly focuses on fc estimation when fcmax and fcmin are not approximately equal to 100% and 0%, res...The objective of this paper is to improve the monitoring speed and precision of fractional vegetation cover (fc). It mainly focuses on fc estimation when fcmax and fcmin are not approximately equal to 100% and 0%, respectively due to using remote sensing image with medium or low spatial resolution. Meanwhile, we present a new method of fc estimation based on a random set of fc maximum and minimum values from digital camera (DC) survey data and a di- midiate pixel model. The results show that this is a convenient, efficient and accurate method for fc monitoring, with the maximum error -0.172 and correlation coefficient of 0.974 between DC survey data and the estimated value of the remote sensing model. The remaining DC survey data can be used as verification data for the precision of the fc estimation. In general, the estimation of fc based on DC survey data and a remote sensing model is a brand-new development trend and deserves further extensive utilization.展开更多
This paper describes the data release of the LAMOST pilot survey, which includes data reduction, calibration, spectral analysis, data products and data access. The accuracy of the released data and the information abo...This paper describes the data release of the LAMOST pilot survey, which includes data reduction, calibration, spectral analysis, data products and data access. The accuracy of the released data and the information about the FITS headers of spectra are also introduced. The released data set includes 319 000 spectra and a catalog of these objects.展开更多
’Long tail’data is the difficult-to-get-at data that sits in libraries,institutes and on the computers of individual scientists.Informatics specialists like to contrast it with the smaller number of large,more acces...’Long tail’data is the difficult-to-get-at data that sits in libraries,institutes and on the computers of individual scientists.Informatics specialists like to contrast it with the smaller number of large,more accessible data sets(e.g.Sinha et al.,2013).The name’long tail’derives from graphs drawn of the size of data sets against their number:there are relatively few large datasets and a lot of smaller ones.展开更多
Hyperspectral remote sensing is now a frontier of the remote sensing technology. Airborne hyperspectral remote sensing data have hundreds of narrow bands to obtain complete and continuous ground-object spectra. Theref...Hyperspectral remote sensing is now a frontier of the remote sensing technology. Airborne hyperspectral remote sensing data have hundreds of narrow bands to obtain complete and continuous ground-object spectra. Therefore, they can be effectively used to identify these grotmd objects which are difficult to discriminate by using wide-band data, and show much promise in geological survey. At the height of 1500 m, have 36 bands in visible to the CASI hyperspectral data near-infrared spectral range, with a spectral resolution of 19 nm and a space resolution of 0.9 m. The SASI data have 101 bands in the shortwave infrared spectral range, with a spectral resolution of 15 nm and a space resolution of 2.25 m. In 2010, China Geological Survey deployed an airborne CASI/SASI hyperspectral measurement project, and selected the Liuyuan and Fangshankou areas in the Beishan metallogenic belt of Gansu Province, and the Nachitai area of East Kunlun metallogenic belt in Qinghai Province to conduct geological survey. The work period of this project was three years.展开更多
A novel outlier recognition method in surveying data is presented based on Shannon information entropy. The probability distribution of surveying data does not need to be known or hypothesized in this method, and it i...A novel outlier recognition method in surveying data is presented based on Shannon information entropy. The probability distribution of surveying data does not need to be known or hypothesized in this method, and it is not only accurate but also convenient to calculate in this method compared with statistical recognition method.展开更多
We developed a GPU based single-pulse search pipeline(GSP)with a candidate-archiving database.Largely based upon the infrastructure of the open source PulsaR Exploration and Search Toolkit(PRESTO),GSP implements GPU a...We developed a GPU based single-pulse search pipeline(GSP)with a candidate-archiving database.Largely based upon the infrastructure of the open source PulsaR Exploration and Search Toolkit(PRESTO),GSP implements GPU acceleration of the de-dispersion and integrates a candidate-archiving database.We applied GSP to the data streams from the Commensal Radio Astronomy FAST Survey(CRAFTS),which resulted in quasi-real-time processing.The integrated candidate database facilitates synergistic usage of multiple machine-learning tools and thus improves efficient identification of radio pulsars such as rotating radio transients(RRATs)and fast radio bursts(FRBs).We first tested GSP on pilot CRAFTS observations with the FAST Ultra-Wide Band(UWB)receiver.GSP detected all pulsars known from the the Parkes multibeam pulsar survey in the corresponding sky area covered by the FAST-UWB.GSP also discovered 13 new pulsars.We measured the computational efficiency of GSP to be~120 times faster than the original PRESTO and~60 times faster than an MPI-parallelized version of PRESTO.展开更多
We have investigated the feasibilities and accuracies of the identifications of RR Lyrae stars and quasars from the simulated data of the Multi-channel Photometric Survey Telescope(Mephisto)W Survey.Based on the varia...We have investigated the feasibilities and accuracies of the identifications of RR Lyrae stars and quasars from the simulated data of the Multi-channel Photometric Survey Telescope(Mephisto)W Survey.Based on the variable sources light curve libraries from the Sloan Digital Sky Survey(SDSS)Stripe 82 data and the observation history simulation from the Mephisto-W Survey Scheduler,we have simulated the uvgriz multi-band light curves of RR Lyrae stars,quasars and other variable sources for the first year observation of Mephisto W Survey.We have applied the ensemble machine learning algorithm Random Forest Classifier(RFC)to identify RR Lyrae stars and quasars,respectively.We build training and test samples and extract~150 features from the simulated light curves and train two RFCs respectively for the RR Lyrae star and quasar classification.We find that,our RFCs are able to select the RR Lyrae stars and quasars with remarkably high precision and completeness,with purity=95.4%and completeness=96.9%for the RR Lyrae RFC and purity=91.4%and completeness=90.2%for the quasar RFC.We have also derived relative importances of the extracted features utilized to classify RR Lyrae stars and quasars.展开更多
The Large sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST) medium-resolution spectral survey of Galactic Nebulae(MRS-N) has conducted for more than three years since 2018 September and observed more than190...The Large sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST) medium-resolution spectral survey of Galactic Nebulae(MRS-N) has conducted for more than three years since 2018 September and observed more than190 thousand nebular spectra and 20 thousand stellar spectra.However,there is not yet a data processing pipeline for nebular spectra.To significantly improve the accuracy of nebulae classification and their physical parameters,we developed the MRS-N Pipeline.This article presented in detail each data processing step of the MRS-N Pipeline,such as removing cosmic rays,merging single exposure,fitting sky light emission lines,wavelength recalibration,subtracting skylight,measuring nebular parameters,creating catalogs and packing spectra.Finally,a description of the data products,including nebular spectra files and parameter catalogs,is provided.展开更多
Complex survey designs often involve unequal selection probabilities of clus-ters or units within clusters. When estimating models for complex survey data, scaled weights are incorporated into the likelihood, producin...Complex survey designs often involve unequal selection probabilities of clus-ters or units within clusters. When estimating models for complex survey data, scaled weights are incorporated into the likelihood, producing a pseudo likeli-hood. In a 3-level weighted analysis for a binary outcome, we implemented two methods for scaling the sampling weights in the National Health Survey of Pa-kistan (NHSP). For NHSP with health care utilization as a binary outcome we found age, gender, household (HH) goods, urban/rural status, community de-velopment index, province and marital status as significant predictors of health care utilization (p-value < 0.05). The variance of the random intercepts using scaling method 1 is estimated as 0.0961 (standard error 0.0339) for PSU level, and 0.2726 (standard error 0.0995) for household level respectively. Both esti-mates are significantly different from zero (p-value < 0.05) and indicate consid-erable heterogeneity in health care utilization with respect to households and PSUs. The results of the NHSP data analysis showed that all three analyses, weighted (two scaling methods) and un-weighted, converged to almost identical results with few exceptions. This may have occurred because of the large num-ber of 3rd and 2nd level clusters and relatively small ICC. We performed a sim-ulation study to assess the effect of varying prevalence and intra-class correla-tion coefficients (ICCs) on bias of fixed effect parameters and variance components of a multilevel pseudo maximum likelihood (weighted) analysis. The simulation results showed that the performance of the scaled weighted estimators is satisfactory for both scaling methods. Incorporating simulation into the analysis of complex multilevel surveys allows the integrity of the results to be tested and is recommended as good practice.展开更多
Data from the 2013 Canadian Tobacco, Alcohol and Drugs Survey, and two other surveys are used to determine the effects of cannabis use on self-reported physical and mental health. Daily or almost daily marijuana use i...Data from the 2013 Canadian Tobacco, Alcohol and Drugs Survey, and two other surveys are used to determine the effects of cannabis use on self-reported physical and mental health. Daily or almost daily marijuana use is shown to be detrimental to both measures of health for some age groups but not all. The age group specific effects depend on gender. Males and females respond differently to cannabis use. The health costs of regularly using cannabis are significant but they are much smaller than those associated with tobacco use. These costs are attributed to both the presence of delta9-tetrahydrocannabinol and the fact that smoking cannabis is itself a health hazard because of the toxic properties of the smoke ingested. Cannabis use is costlier to regular smokers and age of first use below the age of 15 or 20 and being a former user leads to reduced physical and mental capacities which are permanent. These results strongly suggest that the legalization of marijuana be accompanied by educational programs, counseling services, and a delivery system, which minimizes juvenile and young adult usage.展开更多
In studies of HIV, interval-censored data occur naturally. HIV infection time is not usually known exactly, only that it occurred before the survey, within some time interval or has not occurred at the time of the sur...In studies of HIV, interval-censored data occur naturally. HIV infection time is not usually known exactly, only that it occurred before the survey, within some time interval or has not occurred at the time of the survey. Infections are often clustered within geographical areas such as enumerator areas (EAs) and thus inducing unobserved frailty. In this paper we consider an approach for estimating parameters when infection time is unknown and assumed correlated within an EA where dependency is modeled as frailties assuming a normal distribution for frailties and a Weibull distribution for baseline hazards. The data was from a household based population survey that used a multi-stage stratified sample design to randomly select 23,275 interviewed individuals from 10,584 households of whom 15,851 interviewed individuals were further tested for HIV (crude prevalence = 9.1%). A further test conducted among those that tested HIV positive found 181 (12.5%) recently infected. Results show high degree of heterogeneity in HIV distribution between EAs translating to a modest correlation of 0.198. Intervention strategies should target geographical areas that contribute disproportionately to the epidemic of HIV. Further research needs to identify such hot spot areas and understand what factors make these areas prone to HIV.展开更多
In the formation process of megacities,urban population is constantly gathering from other cities to large and medium-sized cities,and from the periphery of the city to the center of the city,inducing traffic congesti...In the formation process of megacities,urban population is constantly gathering from other cities to large and medium-sized cities,and from the periphery of the city to the center of the city,inducing traffic congestion,overload of main center,reduction of space efficiency,and other major urban diseases.Multi-center urban spatial structure is an effective way to solve the above negative externalities.It is helpful to evaluate the efficiency of the existing industrial distribution structure in the cities and make a reasonable estimate of the future enterprise growth by clarifying the interactive relationship between the development process of a city and the change of the employment center.Based on survey data of Shenzhen "Four Up-scale Enterprises" in 2017,threshold method is used to identify the existing employment centers of Shenzhen,and describe the characteristics of each employment center.Combining calculation of industry location entropy,below conclusions are obtained:①the double main centers (Futian-Luohu center and Nanshan center) in Shenzhen are obvious,and the weak multi-center has initially formed;②the development of the industry is relatively balanced in Futian-Luohu center,and Nanshan center has absolute advantage in information technology industry;③the employment centers outside the original special economic zone are mainly manufacturing industries,showing a highly specialized trend of industries,and cultivation of service industries is insufficient.展开更多
Long runout landslides involve a massive amount of energy and can be extremely hazardous owing to their long movement distance,high mobility and strong destructive power.Numerical methods have been widely used to pred...Long runout landslides involve a massive amount of energy and can be extremely hazardous owing to their long movement distance,high mobility and strong destructive power.Numerical methods have been widely used to predict the landslide runout but a fundamental problem remained is how to determine the reliable numerical parameters.This study proposes a framework to predict the runout of potential landslides through multi-source data collaboration and numerical analysis of historical landslide events.Specifically,for the historical landslide cases,the landslide-induced seismic signal,geophysical surveys,and possible in-situ drone/phone videos(multi-source data collaboration)can validate the numerical results in terms of landslide dynamics and deposit features and help calibrate the numerical(rheological)parameters.Subsequently,the calibrated numerical parameters can be used to numerically predict the runout of potential landslides in the region with a similar geological setting to the recorded events.Application of the runout prediction approach to the 2020 Jiashanying landslide in Guizhou,China gives reasonable results in comparison to the field observations.The numerical parameters are determined from the multi-source data collaboration analysis of a historical case in the region(2019 Shuicheng landslide).The proposed framework for landslide runout prediction can be of great utility for landslide risk assessment and disaster reduction in mountainous regions worldwide.展开更多
Automated pavement condition survey is of critical importance to road network management.There are three primary tasks involved in pavement condition surveys,namely data collection,data processing and condition evalua...Automated pavement condition survey is of critical importance to road network management.There are three primary tasks involved in pavement condition surveys,namely data collection,data processing and condition evaluation.Artificial intelligence(AI)has achieved many breakthroughs in almost every aspect of modern technology over the past decade,and undoubtedly offers a more robust approach to automated pavement condition survey.This article aims to provide a comprehensive review on data collection systems,data processing algorithms and condition evaluation methods proposed between 2010 and 2023 for intelligent pavement condition survey.In particular,the data collection system includes AI-driven hardware devices and automated pavement data collection vehicles.The AI-driven hardware devices including right-of-way(ROW)cameras,ground penetrating radar(GPR)devices,light detection and ranging(LiDAR)devices,and advanced laser imaging systems,etc.These different hardware components can be selectively mounted on a vehicle to simultaneously collect multimedia information about the pavement.In addition,this article pays close attention to the application of artificial intelligence methods in detecting pavement distresses,measuring pavement roughness,identifying pavement rutting,analyzing skid resistance and evaluating structural strength of pavements.Based upon the analysis of a variety of the state-of-the-art artificial intelligence methodologies,remaining challenges and future needs with respect to intelligent pavement condition survey are discussed eventually.展开更多
Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical I...Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.展开更多
文摘Background Combinations of coronary heart disease(CHD) and other chronic conditions complicate clinical management and increase healthcare costs. The aim of this study was to evaluate gender-specific relationships between CHD and other comorbidities. Methods We analyzed data from the German Health Interview and Examination Survey(DEGS1), a national survey of 8152 adults aged 18-79 years. Female and male participants with self-reported CHD were compared for 23 chronic medical conditions. Regression models were applied to determine potential associations between CHD and these 23 conditions. Results The prevalence of CHD was 9%(547 participants): 34%(185) were female CHD participants and 66%(362) male. In women, CHD was associated with hypertension(OR = 3.28(1.81-5.9)), lipid disorders(OR = 2.40(1.50-3.83)), diabetes mellitus(OR = 2.08(1.24-3.50)), kidney disease(OR = 2.66(1.101-6.99)), thyroid disease(OR = 1.81(1.18-2.79)), gout/high uric acid levels(OR = 2.08(1.22-3.56)) and osteoporosis(OR = 1.69(1.01-2.84)). In men, CHD patients were more likely to have hypertension(OR = 2.80(1.94-4.04)), diabetes mellitus(OR = 1.87(1.29-2.71)), lipid disorder(OR = 1.82(1.34-2.47)), and chronic kidney disease(OR = 3.28(1.81-5.9)). Conclusion Our analysis revealed two sets of chronic conditions associated with CHD. The first set occurred in both women and men, and comprised known risk factors: hypertension, lipid disorders, kidney disease, and diabetes mellitus. The second set appeared unique to women: thyroid disease, osteoporosis, and gout/high uric acid. Identification of shared and unique gender-related associations between CHD and other conditions provides potential to tailor screening, preventive, and therapeutic options.
基金funded by the National Basic Research Program of China (973 Program, 2014CB845700)the National Natural Science Foundation of China (Grant Nos. 11390371)Funding for the project has been provided by the National Development and Reform Commission
文摘The Large sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST) general survey is a spectroscopic survey that will eventually cover approximately half of the celestial sphere and collect 10 million spectra of stars, galaxies and QSOs. Objects in both the pilot survey and the first year regular survey are included in the LAMOST DR1. The pilot survey started in October 2011 and ended in June 2012, and the data have been released to the public as the LAMOST Pilot Data Release in August 2012. The regular survey started in September 2012, and completed its first year of operation in June 2013. The LAMOST DR1 includes a total of 1202 plates containing 2 955 336 spectra, of which 1 790 879 spectra have observed signalto-noise ratio(SNR) ≥ 10. All data with SNR ≥ 2 are formally released as LAMOST DR1 under the LAMOST data policy. This data release contains a total of 2 204 696 spectra, of which 1 944 329 are stellar spectra, 12 082 are galaxy spectra and 5017 are quasars. The DR1 not only includes spectra, but also three stellar catalogs with measured parameters: late A,FGK-type stars with high quality spectra(1 061 918 entries), A-type stars(100 073 entries), and M-type stars(121 522 entries). This paper introduces the survey design, the observational and instrumental limitations, data reduction and analysis, and some caveats. A description of the FITS structure of spectral files and parameter catalogs is also provided.
基金Projects NCET-04-0484 supported by the New-Century Outstanding Young Scientist Program from the Ministry of Education and D0605046040191-101Beijing Science and Technology Program
文摘The objective of this paper is to improve the monitoring speed and precision of fractional vegetation cover (fc). It mainly focuses on fc estimation when fcmax and fcmin are not approximately equal to 100% and 0%, respectively due to using remote sensing image with medium or low spatial resolution. Meanwhile, we present a new method of fc estimation based on a random set of fc maximum and minimum values from digital camera (DC) survey data and a di- midiate pixel model. The results show that this is a convenient, efficient and accurate method for fc monitoring, with the maximum error -0.172 and correlation coefficient of 0.974 between DC survey data and the estimated value of the remote sensing model. The remaining DC survey data can be used as verification data for the precision of the fc estimation. In general, the estimation of fc based on DC survey data and a remote sensing model is a brand-new development trend and deserves further extensive utilization.
文摘This paper describes the data release of the LAMOST pilot survey, which includes data reduction, calibration, spectral analysis, data products and data access. The accuracy of the released data and the information about the FITS headers of spectra are also introduced. The released data set includes 319 000 spectra and a catalog of these objects.
文摘’Long tail’data is the difficult-to-get-at data that sits in libraries,institutes and on the computers of individual scientists.Informatics specialists like to contrast it with the smaller number of large,more accessible data sets(e.g.Sinha et al.,2013).The name’long tail’derives from graphs drawn of the size of data sets against their number:there are relatively few large datasets and a lot of smaller ones.
基金funded by China Geological Survey (grant no.1212011120899)the Department of Geology & Mining, China National Nuclear Corporation (grant no.201498)
文摘Hyperspectral remote sensing is now a frontier of the remote sensing technology. Airborne hyperspectral remote sensing data have hundreds of narrow bands to obtain complete and continuous ground-object spectra. Therefore, they can be effectively used to identify these grotmd objects which are difficult to discriminate by using wide-band data, and show much promise in geological survey. At the height of 1500 m, have 36 bands in visible to the CASI hyperspectral data near-infrared spectral range, with a spectral resolution of 19 nm and a space resolution of 0.9 m. The SASI data have 101 bands in the shortwave infrared spectral range, with a spectral resolution of 15 nm and a space resolution of 2.25 m. In 2010, China Geological Survey deployed an airborne CASI/SASI hyperspectral measurement project, and selected the Liuyuan and Fangshankou areas in the Beishan metallogenic belt of Gansu Province, and the Nachitai area of East Kunlun metallogenic belt in Qinghai Province to conduct geological survey. The work period of this project was three years.
文摘A novel outlier recognition method in surveying data is presented based on Shannon information entropy. The probability distribution of surveying data does not need to be known or hypothesized in this method, and it is not only accurate but also convenient to calculate in this method compared with statistical recognition method.
基金supported by the National Natural Science Foundation of China(NSFCGrant Nos.11988101,11725313,11690024,12041303,U1731238,U2031117,U1831131 and U1831207)+2 种基金supported by the Science and Technology Foundation of Guizhou Province(No.LKS[2010]38)support by the Youth Innovation Promotion Association CAS(id.2021055)cultivation project for FAST scientific payoff and research achievement of CAMS-CAS。
文摘We developed a GPU based single-pulse search pipeline(GSP)with a candidate-archiving database.Largely based upon the infrastructure of the open source PulsaR Exploration and Search Toolkit(PRESTO),GSP implements GPU acceleration of the de-dispersion and integrates a candidate-archiving database.We applied GSP to the data streams from the Commensal Radio Astronomy FAST Survey(CRAFTS),which resulted in quasi-real-time processing.The integrated candidate database facilitates synergistic usage of multiple machine-learning tools and thus improves efficient identification of radio pulsars such as rotating radio transients(RRATs)and fast radio bursts(FRBs).We first tested GSP on pilot CRAFTS observations with the FAST Ultra-Wide Band(UWB)receiver.GSP detected all pulsars known from the the Parkes multibeam pulsar survey in the corresponding sky area covered by the FAST-UWB.GSP also discovered 13 new pulsars.We measured the computational efficiency of GSP to be~120 times faster than the original PRESTO and~60 times faster than an MPI-parallelized version of PRESTO.
基金funded by the National Natural Science Foundation of China(NSFC)Nos.11803029,11833006 and 12173034the National Training Program of Innovation and Entrepreneurship for Undergraduates of China No.201910673001,Yunnan University grant C176220100007+8 种基金the National Key R&D Program of China No.2019YFA0405500the science research grants from the China Manned Space Project with Nos.CMS-CSST-2021-A09,CMS-CSST-2021-A08 and CMS-CSST2021-B03Funding for SDSS-Ⅲhas been provided by the Alfred P.Sloan Foundation,the Participating Institutions,the National Science Foundation,and the U.S.Department of Energy Office of ScienceThe national facility capability for Sky Mapper has been funded through ARC LIEF grant LE130100104 from the Australian Research CouncilDevelopment and support of the Sky Mapper node of the ASVO has been funded in part by Astronomy Australia Limited(AAL)the Australian Government through the Commonwealth’s Education Investment Fund(EIF)National Collaborative Research Infrastructure Strategy(NCRIS)the National e Research Collaboration Tools and Resources(Ne CTAR)the Australian National Data Service Projects(ANDS)。
文摘We have investigated the feasibilities and accuracies of the identifications of RR Lyrae stars and quasars from the simulated data of the Multi-channel Photometric Survey Telescope(Mephisto)W Survey.Based on the variable sources light curve libraries from the Sloan Digital Sky Survey(SDSS)Stripe 82 data and the observation history simulation from the Mephisto-W Survey Scheduler,we have simulated the uvgriz multi-band light curves of RR Lyrae stars,quasars and other variable sources for the first year observation of Mephisto W Survey.We have applied the ensemble machine learning algorithm Random Forest Classifier(RFC)to identify RR Lyrae stars and quasars,respectively.We build training and test samples and extract~150 features from the simulated light curves and train two RFCs respectively for the RR Lyrae star and quasar classification.We find that,our RFCs are able to select the RR Lyrae stars and quasars with remarkably high precision and completeness,with purity=95.4%and completeness=96.9%for the RR Lyrae RFC and purity=91.4%and completeness=90.2%for the quasar RFC.We have also derived relative importances of the extracted features utilized to classify RR Lyrae stars and quasars.
基金supported by the National Natural Science Foundation of China (Grant Nos. 12073051, 12090041, 12090040, 11733006, 11403061, 11903048, U1631131, 11973060, 12090044, 12073039, 11633009 and U1531118)the Key Laboratory of Optical Astronomy+4 种基金National Astronomical ObservatoriesChinese Academy of Sciencesthe Key Research Program of Frontier Sciences, CAS (Grant No. QYZDY-SSW- SLH007)supports from the Science and Technology Development Fund, Macao SAR (file No. 0007/ 2019/A)Faculty Research Grants of the Macao University of Science and Technology (No. FRG- 19-004-SSI)。
文摘The Large sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST) medium-resolution spectral survey of Galactic Nebulae(MRS-N) has conducted for more than three years since 2018 September and observed more than190 thousand nebular spectra and 20 thousand stellar spectra.However,there is not yet a data processing pipeline for nebular spectra.To significantly improve the accuracy of nebulae classification and their physical parameters,we developed the MRS-N Pipeline.This article presented in detail each data processing step of the MRS-N Pipeline,such as removing cosmic rays,merging single exposure,fitting sky light emission lines,wavelength recalibration,subtracting skylight,measuring nebular parameters,creating catalogs and packing spectra.Finally,a description of the data products,including nebular spectra files and parameter catalogs,is provided.
文摘Complex survey designs often involve unequal selection probabilities of clus-ters or units within clusters. When estimating models for complex survey data, scaled weights are incorporated into the likelihood, producing a pseudo likeli-hood. In a 3-level weighted analysis for a binary outcome, we implemented two methods for scaling the sampling weights in the National Health Survey of Pa-kistan (NHSP). For NHSP with health care utilization as a binary outcome we found age, gender, household (HH) goods, urban/rural status, community de-velopment index, province and marital status as significant predictors of health care utilization (p-value < 0.05). The variance of the random intercepts using scaling method 1 is estimated as 0.0961 (standard error 0.0339) for PSU level, and 0.2726 (standard error 0.0995) for household level respectively. Both esti-mates are significantly different from zero (p-value < 0.05) and indicate consid-erable heterogeneity in health care utilization with respect to households and PSUs. The results of the NHSP data analysis showed that all three analyses, weighted (two scaling methods) and un-weighted, converged to almost identical results with few exceptions. This may have occurred because of the large num-ber of 3rd and 2nd level clusters and relatively small ICC. We performed a sim-ulation study to assess the effect of varying prevalence and intra-class correla-tion coefficients (ICCs) on bias of fixed effect parameters and variance components of a multilevel pseudo maximum likelihood (weighted) analysis. The simulation results showed that the performance of the scaled weighted estimators is satisfactory for both scaling methods. Incorporating simulation into the analysis of complex multilevel surveys allows the integrity of the results to be tested and is recommended as good practice.
文摘Data from the 2013 Canadian Tobacco, Alcohol and Drugs Survey, and two other surveys are used to determine the effects of cannabis use on self-reported physical and mental health. Daily or almost daily marijuana use is shown to be detrimental to both measures of health for some age groups but not all. The age group specific effects depend on gender. Males and females respond differently to cannabis use. The health costs of regularly using cannabis are significant but they are much smaller than those associated with tobacco use. These costs are attributed to both the presence of delta9-tetrahydrocannabinol and the fact that smoking cannabis is itself a health hazard because of the toxic properties of the smoke ingested. Cannabis use is costlier to regular smokers and age of first use below the age of 15 or 20 and being a former user leads to reduced physical and mental capacities which are permanent. These results strongly suggest that the legalization of marijuana be accompanied by educational programs, counseling services, and a delivery system, which minimizes juvenile and young adult usage.
文摘In studies of HIV, interval-censored data occur naturally. HIV infection time is not usually known exactly, only that it occurred before the survey, within some time interval or has not occurred at the time of the survey. Infections are often clustered within geographical areas such as enumerator areas (EAs) and thus inducing unobserved frailty. In this paper we consider an approach for estimating parameters when infection time is unknown and assumed correlated within an EA where dependency is modeled as frailties assuming a normal distribution for frailties and a Weibull distribution for baseline hazards. The data was from a household based population survey that used a multi-stage stratified sample design to randomly select 23,275 interviewed individuals from 10,584 households of whom 15,851 interviewed individuals were further tested for HIV (crude prevalence = 9.1%). A further test conducted among those that tested HIV positive found 181 (12.5%) recently infected. Results show high degree of heterogeneity in HIV distribution between EAs translating to a modest correlation of 0.198. Intervention strategies should target geographical areas that contribute disproportionately to the epidemic of HIV. Further research needs to identify such hot spot areas and understand what factors make these areas prone to HIV.
文摘In the formation process of megacities,urban population is constantly gathering from other cities to large and medium-sized cities,and from the periphery of the city to the center of the city,inducing traffic congestion,overload of main center,reduction of space efficiency,and other major urban diseases.Multi-center urban spatial structure is an effective way to solve the above negative externalities.It is helpful to evaluate the efficiency of the existing industrial distribution structure in the cities and make a reasonable estimate of the future enterprise growth by clarifying the interactive relationship between the development process of a city and the change of the employment center.Based on survey data of Shenzhen "Four Up-scale Enterprises" in 2017,threshold method is used to identify the existing employment centers of Shenzhen,and describe the characteristics of each employment center.Combining calculation of industry location entropy,below conclusions are obtained:①the double main centers (Futian-Luohu center and Nanshan center) in Shenzhen are obvious,and the weak multi-center has initially formed;②the development of the industry is relatively balanced in Futian-Luohu center,and Nanshan center has absolute advantage in information technology industry;③the employment centers outside the original special economic zone are mainly manufacturing industries,showing a highly specialized trend of industries,and cultivation of service industries is insufficient.
基金supported by the National Natural Science Foundation of China(41977215)。
文摘Long runout landslides involve a massive amount of energy and can be extremely hazardous owing to their long movement distance,high mobility and strong destructive power.Numerical methods have been widely used to predict the landslide runout but a fundamental problem remained is how to determine the reliable numerical parameters.This study proposes a framework to predict the runout of potential landslides through multi-source data collaboration and numerical analysis of historical landslide events.Specifically,for the historical landslide cases,the landslide-induced seismic signal,geophysical surveys,and possible in-situ drone/phone videos(multi-source data collaboration)can validate the numerical results in terms of landslide dynamics and deposit features and help calibrate the numerical(rheological)parameters.Subsequently,the calibrated numerical parameters can be used to numerically predict the runout of potential landslides in the region with a similar geological setting to the recorded events.Application of the runout prediction approach to the 2020 Jiashanying landslide in Guizhou,China gives reasonable results in comparison to the field observations.The numerical parameters are determined from the multi-source data collaboration analysis of a historical case in the region(2019 Shuicheng landslide).The proposed framework for landslide runout prediction can be of great utility for landslide risk assessment and disaster reduction in mountainous regions worldwide.
基金the National Natural Science Foundation of China(grant no.51208419).
文摘Automated pavement condition survey is of critical importance to road network management.There are three primary tasks involved in pavement condition surveys,namely data collection,data processing and condition evaluation.Artificial intelligence(AI)has achieved many breakthroughs in almost every aspect of modern technology over the past decade,and undoubtedly offers a more robust approach to automated pavement condition survey.This article aims to provide a comprehensive review on data collection systems,data processing algorithms and condition evaluation methods proposed between 2010 and 2023 for intelligent pavement condition survey.In particular,the data collection system includes AI-driven hardware devices and automated pavement data collection vehicles.The AI-driven hardware devices including right-of-way(ROW)cameras,ground penetrating radar(GPR)devices,light detection and ranging(LiDAR)devices,and advanced laser imaging systems,etc.These different hardware components can be selectively mounted on a vehicle to simultaneously collect multimedia information about the pavement.In addition,this article pays close attention to the application of artificial intelligence methods in detecting pavement distresses,measuring pavement roughness,identifying pavement rutting,analyzing skid resistance and evaluating structural strength of pavements.Based upon the analysis of a variety of the state-of-the-art artificial intelligence methodologies,remaining challenges and future needs with respect to intelligent pavement condition survey are discussed eventually.
基金the National Social Science Foundation of China(No.16BGL183).
文摘Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.