Potential of the Random Forest Model on mapping of different desertification processes was studied in Muttuma watershed of mid-Murrumbidgee river region of New South Wales,Australia.Desertification vulnerability index...Potential of the Random Forest Model on mapping of different desertification processes was studied in Muttuma watershed of mid-Murrumbidgee river region of New South Wales,Australia.Desertification vulnerability index was developed using climate,terrain,vegetation,soil and land quality indices to identify environmentally sensitive areas for desertification.Random Forest Model(RFM)was used to predict the different desertification processes such as soil erosion,salinization and waterlogging in the watershed and the information needed to train classification algorithms was obtained from satellite imagery interpretation and ground truth data.Climatic factors(evaporation,rainfall,temperature),terrain factors(aspect,slope,slope length,steepness,and wetness index),soil properties(pH,organic carbon,clay and sand content)and vulnerability indices were used as an explanatory variable.Classification accuracy and kappa index were calculated for training and testing datasets.We recorded an overall accuracy rate of 87.7%and 72.1%for training and testing sites,respectively.We found larger discrepancies between overall accuracy rate and kappa index for testing datasets(72.2%and 27.5%,respectively)suggesting that all the classes are not predicted well.The prediction of soil erosion and no desertification process was good and poor for salinization and water-logging process.Overall,the results observed give a new idea of using the knowledge of desertification process in training areas that can be used to predict the desertification processes at unvisited areas.展开更多
Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identificatio...Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identification of human body fluids,and has exhibited excellent performance in predicting single-source body fluids.The present study aims to develop a methylation SNaPshot multiplex system for body fluid identification,and accurately predict the mixture samples.In addition,the value of DNA methylation in the prediction of body fluid mixtures was further explored.Methods In the present study,420 samples of body fluid mixtures and 250 samples of single body fluids were tested using an optimized multiplex methylation system.Each kind of body fluid sample presented the specific methylation profiles of the 10 markers.Results Significant differences in methylation levels were observed between the mixtures and single body fluids.For all kinds of mixtures,the Spearman’s correlation analysis revealed a significantly strong correlation between the methylation levels and component proportions(1:20,1:10,1:5,1:1,5:1,10:1 and 20:1).Two random forest classification models were trained for the prediction of mixture types and the prediction of the mixture proportion of 2 components,based on the methylation levels of 10 markers.For the mixture prediction,Model-1 presented outstanding prediction accuracy,which reached up to 99.3%in 427 training samples,and had a remarkable accuracy of 100%in 243 independent test samples.For the mixture proportion prediction,Model-2 demonstrated an excellent accuracy of 98.8%in 252 training samples,and 98.2%in 168 independent test samples.The total prediction accuracy reached 99.3%for body fluid mixtures and 98.6%for the mixture proportions.Conclusion These results indicate the excellent capability and powerful value of the multiplex methylation system in the identification of forensic body fluid mixtures.展开更多
COVID-19,being the virus of fear and anxiety,is one of the most recent and emergent of various respiratory disorders.It is similar to the MERS-COV and SARS-COV,the viruses that affected a large population of different...COVID-19,being the virus of fear and anxiety,is one of the most recent and emergent of various respiratory disorders.It is similar to the MERS-COV and SARS-COV,the viruses that affected a large population of different countries in the year 2012 and 2002,respectively.Various standard models have been used for COVID-19 epidemic prediction but they suffered from low accuracy due to lesser data availability and a high level of uncertainty.The proposed approach used a machine learning-based time-series Facebook NeuralProphet model for prediction of the number of death as well as confirmed cases and compared it with Poisson Distribution,and Random Forest Model.The analysis upon dataset has been performed considering the time duration from January 1st 2020 to16th July 2021.The model has been developed to obtain the forecast values till September 2021.This study aimed to determine the pandemic prediction of COVID-19 in the second wave of coronavirus in India using the latest Time-Series model to observe and predict the coronavirus pandemic situation across the country.In India,the cases are rapidly increasing day-by-day since mid of Feb 2021.The prediction of death rate using the proposed model has a good ability to forecast the COVID-19 dataset essentially in the second wave.To empower the prediction for future validation,the proposed model works effectively.展开更多
BACKGROUND Type 2 diabetes mellitus(T2DM)is associated with periodontitis.Currently,there are few studies proposing predictive models for periodontitis in patients with T2DM.AIM To determine the factors influencing pe...BACKGROUND Type 2 diabetes mellitus(T2DM)is associated with periodontitis.Currently,there are few studies proposing predictive models for periodontitis in patients with T2DM.AIM To determine the factors influencing periodontitis in patients with T2DM by constructing logistic regression and random forest models.METHODS In this a retrospective study,300 patients with T2DM who were hospitalized at the First People’s Hospital of Wenling from January 2022 to June 2022 were selected for inclusion,and their data were collected from hospital records.We used logistic regression to analyze factors associated with periodontitis in patients with T2DM,and random forest and logistic regression prediction models were established.The prediction efficiency of the models was compared using the area under the receiver operating characteristic curve(AUC).RESULTS Of 300 patients with T2DM,224 had periodontitis,with an incidence of 74.67%.Logistic regression analysis showed that age[odds ratio(OR)=1.047,95%confidence interval(CI):1.017-1.078],teeth brushing frequency(OR=4.303,95%CI:2.154-8.599),education level(OR=0.528,95%CI:0.348-0.800),glycosylated hemoglobin(HbA1c)(OR=2.545,95%CI:1.770-3.661),total cholesterol(TC)(OR=2.872,95%CI:1.725-4.781),and triglyceride(TG)(OR=3.306,95%CI:1.019-10.723)influenced the occurrence of periodontitis(P<0.05).The random forest model showed that the most influential variable was HbA1c followed by age,TC,TG, education level, brushing frequency, and sex. Comparison of the prediction effects of the two models showedthat in the training dataset, the AUC of the random forest model was higher than that of the logistic regressionmodel (AUC = 1.000 vs AUC = 0.851;P < 0.05). In the validation dataset, there was no significant difference in AUCbetween the random forest and logistic regression models (AUC = 0.946 vs AUC = 0.915;P > 0.05).CONCLUSION Both random forest and logistic regression models have good predictive value and can accurately predict the riskof periodontitis in patients with T2DM.展开更多
Traffic flow prediction,as the basis of signal coordination and travel time prediction,has become a research point in the field of transportation.For traffic flow prediction,researchers have proposed a variety of meth...Traffic flow prediction,as the basis of signal coordination and travel time prediction,has become a research point in the field of transportation.For traffic flow prediction,researchers have proposed a variety of methods,but most of these methods only use the time domain information of traffic flow data to predict the traffic flow,ignoring the impact of spatial correlation on the prediction of target road segment flow,which leads to poor prediction accuracy.In this paper,a traffic flow prediction model called as long short time memory and random forest(LSTM-RF)was proposed based on the combination model.In the process of traffic flow prediction,the long short time memory(LSTM)model was used to extract the time sequence features of the predicted target road segment.Then,the predicted value of LSTM and the collected information of adjacent upstream and downstream sections were simultaneously used as the input features of the random forest model to analyze the spatial-temporal correlation of traffic flow,so as to obtain the final prediction results.The traffic flow data of 132 urban road sections collected by the license plate recognition system in Guiyang City were tested and verified.The results show that the method is better than the single model in prediction accuracy,and the prediction error is obviously reduced compared with the single model.展开更多
BACKGROUND Gestational diabetes mellitus(GDM)is a condition characterized by high blood sugar levels during pregnancy.The prevalence of GDM is on the rise globally,and this trend is particularly evident in China,which...BACKGROUND Gestational diabetes mellitus(GDM)is a condition characterized by high blood sugar levels during pregnancy.The prevalence of GDM is on the rise globally,and this trend is particularly evident in China,which has emerged as a significant issue impacting the well-being of expectant mothers and their fetuses.Identifying and addressing GDM in a timely manner is crucial for maintaining the health of both expectant mothers and their developing fetuses.Therefore,this study aims to establish a risk prediction model for GDM and explore the effects of serum ferritin,blood glucose,and body mass index(BMI)on the occurrence of GDM.AIM To develop a risk prediction model to analyze factors leading to GDM,and evaluate its efficiency for early prevention.METHODS The clinical data of 406 pregnant women who underwent routine prenatal examination in Fujian Maternity and Child Health Hospital from April 2020 to December 2022 were retrospectively analyzed.According to whether GDM occurred,they were divided into two groups to analyze the related factors affecting GDM.Then,according to the weight of the relevant risk factors,the training set and the verification set were divided at a ratio of 7:3.Subsequently,a risk prediction model was established using logistic regression and random forest models,and the model was evaluated and verified.RESULTS Pre-pregnancy BMI,previous history of GDM or macrosomia,hypertension,hemoglobin(Hb)level,triglyceride level,family history of diabetes,serum ferritin,and fasting blood glucose levels during early pregnancy were determined.These factors were found to have a significant impact on the development of GDM(P<0.05).According to the nomogram model’s prediction of GDM in pregnancy,the area under the curve(AUC)was determined to be 0.883[95%confidence interval(CI):0.846-0.921],and the sensitivity and specificity were 74.1%and 87.6%,respectively.The top five variables in the random forest model for predicting the occurrence of GDM were serum ferritin,fasting blood glucose in early pregnancy,pre-pregnancy BMI,Hb level and triglyceride level.The random forest model achieved an AUC of 0.950(95%CI:0.927-0.973),the sensitivity was 84.8%,and the specificity was 91.4%.The Delong test showed that the AUC value of the random forest model was higher than that of the decision tree model(P<0.05).CONCLUSION The random forest model is superior to the nomogram model in predicting the risk of GDM.This method is helpful for early diagnosis and appropriate intervention of GDM.展开更多
The dead fuel moisture content(DFMC)is the key driver leading to fire occurrence.Accurately estimating the DFMC could help identify locations facing fire risks,prioritise areas for fire monitoring,and facilitate timel...The dead fuel moisture content(DFMC)is the key driver leading to fire occurrence.Accurately estimating the DFMC could help identify locations facing fire risks,prioritise areas for fire monitoring,and facilitate timely deployment of fire-suppression resources.In this study,the DFMC and environmental variables,including air temperature,relative humidity,wind speed,solar radiation,rainfall,atmospheric pressure,soil temperature,and soil humidity,were simultaneously measured in a grassland of Ergun City,Inner Mongolia Autonomous Region of China in 2021.We chose three regression models,i.e.,random forest(RF)model,extreme gradient boosting(XGB)model,and boosted regression tree(BRT)model,to model the seasonal DFMC according to the data collected.To ensure accuracy,we added time-lag variables of 3 d to the models.The results showed that the RF model had the best fitting effect with an R2value of 0.847 and a prediction accuracy with a mean absolute error score of 4.764%among the three models.The accuracies of the models in spring and autumn were higher than those in the other two seasons.In addition,different seasons had different key influencing factors,and the degree of influence of these factors on the DFMC changed with time lags.Moreover,time-lag variables within 44 h clearly improved the fitting effect and prediction accuracy,indicating that environmental conditions within approximately 48 h greatly influence the DFMC.This study highlights the importance of considering 48 h time-lagged variables when predicting the DFMC of grassland fuels and mapping grassland fire risks based on the DFMC to help locate high-priority areas for grassland fire monitoring and prevention.展开更多
Random forest model is the mainstream research method used to accurately describe the distribution law and impact mechanism of regional population.We took Shijiazhuang as the research area,with comprehensive zoning ba...Random forest model is the mainstream research method used to accurately describe the distribution law and impact mechanism of regional population.We took Shijiazhuang as the research area,with comprehensive zoning based on endowments as the modeling unit,conducted stratified sampling on a hectare grid cell,and systematically carried out incremental selection experiments of population density impact factors,optimizing the population density random forest model throughout the process(zonal modeling,stratified sampling,factor selection,weighted output).The results are as follows:(1)Zonal modeling addresses the issue of confusion in population distribution laws caused by a single model.Sampling on a grid cell not only ensures the quality of training data by avoiding the modifiable areal unit problem(MAUP)but also attempts to mitigate the adverse effects of the ecological fallacy.Stratified sampling ensures the stability of population density label values(target variable)in the training sample.(2)Zonal selection experiments on population density impact factors help identify suitable combinations of factors,leading to a significant improvement in the goodness of fit(R^(2))of the zonal models.(3)Weighted combination output of the population density prediction dataset substantially enhances the model's robustness.(4)The population density dataset exhibits multi-scale superposition characteristics.On a large scale,the population density in plains is higher than that in mountainous areas,while on a small scale,urban areas have higher density compared to rural areas.The optimization scheme for the population density random forest model that we propose offers a unified technical framework for uncovering local population distribution law and the impact mechanisms.展开更多
Critical zone(CZ)plays a vital role in sustaining biodiversity and humanity.However,flux quantification within CZ,particularly in terms of subsurface hydrological partitioning,remains a significant challenge.This stud...Critical zone(CZ)plays a vital role in sustaining biodiversity and humanity.However,flux quantification within CZ,particularly in terms of subsurface hydrological partitioning,remains a significant challenge.This study focused on quantifying subsurface hydrological partitioning,specifically in an alpine mountainous area,and highlighted the important role of lateral flow during this process.Precipitation was usually classified as two parts into the soil:increased soil water content(SWC)and lateral flow out of the soil pit.It was found that 65%–88%precipitation contributed to lateral flow.The second common partitioning class showed an increase in SWC caused by both precipitation and lateral flow into the soil pit.In this case,lateral flow contributed to the SWC increase ranging from 43%to 74%,which was notably larger than the SWC increase caused by precipitation.On alpine meadows,lateral flow from the soil pit occurred when the shallow soil was wetter than the field capacity.This result highlighted the need for three-dimensional simulation between soil layers in Earth system models(ESMs).During evapotranspiration process,significant differences were observed in the classification of subsurface hydrological partitioning among different vegetation types.Due to tangled and aggregated fine roots in the surface soil on alpine meadows,the majority of subsurface responses involved lateral flow,which provided 98%–100%of evapotranspiration(ET).On grassland,there was a high probability(0.87),which ET was entirely provided by lateral flow.The main reason for underestimating transpiration through soil water dynamics in previous research was the neglect of lateral root water uptake.Furthermore,there was a probability of 0.12,which ET was entirely provided by SWC decrease on grassland.In this case,there was a high probability(0.98)that soil water responses only occurred at layer 2(10–20 cm),because grass roots mainly distributed in this soil layer,and grasses often used their deep roots for water uptake during ET.To improve the estimation of soil water dynamics and ET,we established a random forest(RF)model to simulate lateral flow and then corrected the community land model(CLM).RF model demonstrated good performance and led to significant improvements in CLM simulation.These findings enhance our understanding of subsurface hydrological partitioning and emphasize the importance of considering lateral flow in ESMs and hydrological research.展开更多
In a recent paper,Hong et al developed an artificial intelligence(AI)-driven predictive scoring system for potential complications following laparoscopic radical gastrectomy for gastric cancer patients.They demonstrat...In a recent paper,Hong et al developed an artificial intelligence(AI)-driven predictive scoring system for potential complications following laparoscopic radical gastrectomy for gastric cancer patients.They demonstrated that integrating AI with random forest models significantly improved the preoperative prediction and patient outcome management accuracy.By incorporating data from multiple centers,their model ensures standardization,reliability,and broad applicability,distinguishing it from the prior models.The present study highlights AI's potential in clinical decision support,aiding in the preoperative and postoperative management of gastric cancer patients.Our findings may pave the way for future prospective studies to further enhance AI-supported diagnoses in clinical practice.展开更多
Climate change influences both ecosystems and ecosystem services.The impacts of climate change on ecosystems and ecosystem services have been separately documented.However,it is less well known how ecosystem changes d...Climate change influences both ecosystems and ecosystem services.The impacts of climate change on ecosystems and ecosystem services have been separately documented.However,it is less well known how ecosystem changes driven by climate change will influence ecosystem services,especially in climate-sensitive regions.Here,we analyzed future climate trends between 2040 and 2100 under four Shared Socioeconomic Pathway(SSP) scenarios(SSP1-2.6,SSP2-4.5,SSP3-7.0,and SSP5-8.5) from the Coupled Model Intercomparison Project 6(CMIP6).We quantified their impacts on ecosystems patterns and on the ecosystem service of sandstorm prevention on the Qinghai-Tibet Plateau(QTP),one of the most climate-sensitive regions in the world,using Random Forest model(RF) and Revised Wind Erosion Equation(RWEQ).Strong warming(0.04℃/yr) and wetting(0.65 mm/yr) trends were projected from 2015 to 2100.Under these trends,there will be increased interspersion in the pattern of grassland and sparse vegetation with meadow and swamp vegetation,although their overall area will remain similar,while the areas of shrub and needleleaved forest classes will increase and move toward higher altitudes.Driven by the changes in ecosystem patterns caused by climate change indirectly,grassland will play an irreplaceable role in providing sandstorm prevention services,and sandstorm prevention services will increase gradually from 2040 to 2100(1.059-1.070 billion tons) on the QTP.However,some areas show a risk of deterioration in the future and these should be the focus of ecological rehabilitation.Our research helps to understand the cascading relationship among climate change,ecosystem patterns and ecosystem services,which provides important spatio-temporal information for future ecosystem service management.展开更多
The COVID-19 lockdowns led to abrupt reductions in human-related emissions worldwide and had an unintended impact on air quality improvement.However,quantifying this impact is difficult as meteorological conditions ma...The COVID-19 lockdowns led to abrupt reductions in human-related emissions worldwide and had an unintended impact on air quality improvement.However,quantifying this impact is difficult as meteorological conditions may mask the real effect of changes in emissions on the observed concentrations of pollutants.Based on the air quality and meteorological data at 35 sites in Beijing from 2015 to 2020,a machine learning technique was applied to decouple the impacts of meteorology and emissions on the concentrations of air pollutants.The results showed that the real(“deweathered”)concentrations of air pollutants(expect for O 3)dropped significantly due to lockdown measures.Compared with the scenario without lockdowns(predicted concentrations),the observed values of PM_(2.5),PM_(10),SO_(2),NO_(2),and CO during lockdowns decreased by 39.4%,50.1%,51.8%,43.1%,and 35.1%,respectively.In addition,a significant decline for NO_(2)and CO was found at the background sites(51%and 37.8%)rather than the traffic sites(37.1%and 35.5%),which is different from the common belief.While the primary emissions reduced during the lockdown period,episodic haze events still occurred due to unfavorable meteorological conditions.Thus,developing an optimized strategy to tackle air pollution in Beijing is essential in the future.展开更多
Dew is an essential water resource for the survival and reproduction of organisms in arid and semi-arid regions.Yet estimating the dew amount and quantifying its long-term variation are challenging.In this study,we el...Dew is an essential water resource for the survival and reproduction of organisms in arid and semi-arid regions.Yet estimating the dew amount and quantifying its long-term variation are challenging.In this study,we elucidate the dew amount and its long-term variation in the Kunes River Valley,Northwest China,based on the measured daily dew amount and reconstructed values(using meteorological data from 1980 to 2021),respectively.Four key results were found:(1)the daily mean dew amount was 0.05 mm during the observation period(4 July-12 August and 13 September-7 October of 2021).In 35 d of the observation period(i.e.,73%of the observation period),the daily dew amount exceeded the threshold(>0.03 mm/d)for microorganisms;(2)air temperature,relative humidity,and wind speed had significant impacts on the daily dew amount based on the relationships between the measured dew amount and meteorological variables;(3)for estimating the daily dew amount,random forest(RF)model outperformed multiple linear regression(MLR)model given its larger R^(2) and lower MAE and RMSE;and(4)the dew amount during June-October and in each month did not vary significantly from 1980 to the beginning of the 21^(st) century.It then significantly decreased for about a decade,after it increased slightly from 2013 to 2021.For the whole meteorological period of 1980-2021,the dew amount decreased significantly during June-October and in July and September,and there was no significant variation in June,August,and October.Variation in the dew amount in the Kunes River Valley was mainly driven by relative humidity.This study illustrates that RF model can be used to reconstruct long-term variation in the dew amount,which provides valuable information for us to better understand the dew amount and its relationship with climate change.展开更多
The main aim of this paper was to calculate soil organic carbon stock(SOCS) with consideration of the pedogenetic horizons using expert knowledge and GIS-based methods in northeastern China.A novel prediction process ...The main aim of this paper was to calculate soil organic carbon stock(SOCS) with consideration of the pedogenetic horizons using expert knowledge and GIS-based methods in northeastern China.A novel prediction process was presented and was referred to as model-then-calculate with respect to the variable thicknesses of soil horizons(MCV).The model-then-calculate with fixed-thickness(MCF),soil profile statistics(SPS),pedological professional knowledge-based(PKB) and vegetation type-based(Veg) methods were carried out for comparison.With respect to the similar pedological information,nine common layers from topsoil to bedrock were grouped in the MCV.Validation results suggested that the MCV method generated better performance than the other methods considered.For the comparison of polygon based approaches,the Veg method generated better accuracy than both SPS and PKB,as limited soil data were incorporated.Additional prediction of the pedogenetic horizons within MCV benefitted the regional SOCS estimation and provided information for future soil classification and understanding of soil functions.The intermediate product,that is,horizon thickness maps were fluctuant enough and reflected many details in space.The linear mixed model indicated that mean annual air temperature(MAAT) was the most important predictor for the SOCS simulation.The minimal residual of the linear mixed models was achieved in the vegetation type-based model,whereas the maximal residual was fitted in the soil type-based model.About 95% of SOCS could be found in Argosols,Cambosols and Isohumosols.The largest SOCS was found in the croplands with vegetation of Triticum aestivum L.,Sorghum bicolor(L.) Moench,Glycine max(L.) Merr.,Zea mays L.and Setaria italica(L.) P.Beauv.展开更多
Surface albedo is a quantitative indicator for land surface processes and climate modeling,and plays an important role in surface radiation balance and climate change.In this study,by means of the MCD43A3 surface albe...Surface albedo is a quantitative indicator for land surface processes and climate modeling,and plays an important role in surface radiation balance and climate change.In this study,by means of the MCD43A3 surface albedo product developed on the basis of Moderate Resolution Imaging Spectroradiometer(MODIS),we analyzed the spatiotemporal variation,persistence status,land cover type differences,and annual and seasonal differences of surface albedo,as well as the relationship between surface albedo and various influencing factors(including Normalized Difference Snow Index(NDSI),precipitation,Normalized Difference Vegetation Index(NDVI),land surface temperature,soil moisture,air temperature,and digital elevation model(DEM))in the north of Xinjiang Uygur Autonomous Region(northern Xinjiang)of Northwest China from 2010 to 2020 based on the unary linear regression,Hurst index,and Pearson's correlation coefficient analyses.Combined with the random forest(RF)model and geographical detector(Geodetector),the importance of the above-mentioned influencing factors as well as their interactions on surface albedo were quantitatively evaluated.The results showed that the seasonal average surface albedo in northern Xinjiang was the highest in winter and the lowest in summer.The annual average surface albedo from 2010 to 2020 was high in the west and north and low in the east and south,showing a weak decreasing trend and a small and stable overall variation.Land cover types had a significant impact on the variation of surface albedo.The annual average surface albedo in most regions of northern Xinjiang was positively correlated with NDSI and precipitation,and negatively correlated with NDVI,land surface temperature,soil moisture,and air temperature.In addition,the correlations between surface albedo and various influencing factors showed significant differences for different land cover types and in different seasons.To be specific,NDSI had the largest influence on surface albedo,followed by precipitation,land surface temperature,and soil moisture;whereas NDVI,air temperature,and DEM showed relatively weak influences.However,the interactions of any two influencing factors on surface albedo were enhanced,especially the interaction of air temperature and DEM.NDVI showed a nonlinear enhancement of influence on surface albedo when interacted with land surface temperature or precipitation,with an explanatory power greater than 92.00%.This study has a guiding significance in correctly understanding the land-atmosphere interactions in northern Xinjiang and improving the regional land-surface process simulation and climate prediction.展开更多
Dear Editor,Alterations in the human microbiome are closely related to various hepatobiliary diseases.Gut microbial dysbiosis has been found in patients with cholangiocarcinoma(CCA)[1].However,the characteristics of o...Dear Editor,Alterations in the human microbiome are closely related to various hepatobiliary diseases.Gut microbial dysbiosis has been found in patients with cholangiocarcinoma(CCA)[1].However,the characteristics of oral microbiome in patients with CCA have not been studied.展开更多
Survival rates following radical surgery for gastric neuroendocrine neoplasms(g-NENs)are low,with high recurrence rates.This fact impacts patient prognosis and complicates postoperative management.Traditional prognost...Survival rates following radical surgery for gastric neuroendocrine neoplasms(g-NENs)are low,with high recurrence rates.This fact impacts patient prognosis and complicates postoperative management.Traditional prognostic models,including the Cox proportional hazards(CoxPH)model,have shown limited predictive power for postoperative survival in gastrointestinal neuroectodermal tumor patients.Machine learning methods offer a unique opportunity to analyze complex relationships within datasets,providing tools and methodologies to assess large volumes of high-dimensional,multimodal data generated by biological sciences.These methods show promise in predicting outcomes across various medical disciplines.In the context of g-NENs,utilizing machine learning to predict survival outcomes holds potential for personalized postoperative management strategies.This editorial reviews a study exploring the advantages and effectiveness of the random survival forest(RSF)model,using the lymph node ratio(LNR),in predicting disease-specific survival(DSS)in postoperative g-NEN patients stratified into low-risk and high-risk groups.The findings demonstrate that the RSF model,incorporating LNR,outperformed the CoxPH model in predicting DSS and constitutes an important step towards precision medicine.展开更多
Methane is the second largest anthropogenic greenhouse gas,and changes in atmospheric methane concentrations can reflect the dynamic balance between its emissions and sinks.Therefore,the monitoring of CH_(4) concentra...Methane is the second largest anthropogenic greenhouse gas,and changes in atmospheric methane concentrations can reflect the dynamic balance between its emissions and sinks.Therefore,the monitoring of CH_(4) concentration changes and the assessment of underlying driving factors can provide scientific basis for the government’s policy making and evaluation.China is the world’s largest emitter of anthropogenic methane.However,due to the lack of ground-based observation sites,little work has been done on the spatial-temporal variations for the past decades and influencing factors in China,especially for areas with high anthropogenic emissions as Central and Eastern China.Here to quantify atmospheric CH_(4) enhancements trends and its driving factors in Central and Eastern China,we combined the most up-to-date TROPOMI satellite-based column CH_(4)(xCH_(4))concentration from 2018 to 2022,anthropogenic and natural emissions,and a random forest-based machine learning approach,to simulate atmospheric xCH_(4) enhancements from 2001 to 2018.The results showed that(1)the random forest model was able to accurately establish the relationship between emission sources and xCH_(4) enhancement with a correlation coefficient(R^(2))of 0.89 and a root mean-square error(RMSE)of 11.98 ppb;(2)The xCH_(4) enhancement only increased from 48.21±2.02 ppb to 49.79±1.87 ppb from the year of 2001 to 2018,with a relative change of 3.27%±0.13%;(3)The simulation results showed that the energy activities and waste treatment were the main contributors to the increase in xCH_(4) enhancement,contributing 68.00% and 31.21%,respectively,and the decrease of animal ruminants contributed-6.70% of its enhancement trend.展开更多
It is of great theoretical and practical importance to carry out research on the spatio-temporal evolution of urban air pollution and its driving forces,which helps to facilitate a deeper understanding of the mutual f...It is of great theoretical and practical importance to carry out research on the spatio-temporal evolution of urban air pollution and its driving forces,which helps to facilitate a deeper understanding of the mutual feedback mechanisms between the urban environment and socio-economic systems.Comprehension of these mechanisms will contribute to the design and implementation of efficient environmental policies that ultimately will improve the quality of urbanization development.This paper illustrates the spatio-temporal evolutionary characteristics of six urban ambient air pollutant concentrations,namely,CO,NO_(2),O_(3),PM_(10),PM_(2.5),SO_(2),in 286 sample cities above the prefecture level in China from 2014 to 2019.The interactions between the pollutant concentrations are analyzed based on panel regression models.A random forest model is then employed to explore the correlations between the concentrations of these six pollutants and 13 natural and socio-economic impact factors to isolate the most crucial ones.The results reveal three aspects.First,within the research period,the average annual concentration of O_(3)increased while that of other pollutants decreased year by year.Second,there were significant interactions between concentrations of the six pollutants,leading to obvious compound air pollution in urban areas.Third,the impact of natural and socio-economic factors on urban air quality varied greatly among different air pollutants,with air temperature,vegetation coverage,urbanization level and traffic factors ranking high and the different response thresholds to the dominant influencing factors.In light of the limited ability of humans to control the natural environment and meteorological conditions,it is recommended that urban air quality be further improved by optimizing urban density,controlling anthropogenic emission sources,and implementing strict air pollution prevention and control measures.展开更多
Digital soil mapping (DSM) aims to produce detailed maps of soil properties or soil classes to improve agricultural management and soil quality assessment. Optimized sampling design can reduce the substantial costs an...Digital soil mapping (DSM) aims to produce detailed maps of soil properties or soil classes to improve agricultural management and soil quality assessment. Optimized sampling design can reduce the substantial costs and efforts associated with sampling, profile description, and laboratory analysis. The purpose of this study was to compare common sampling designs for DSM, including grid sampling (GS), grid random sampling (GRS), stratified random sampling (StRS), and conditioned Latin hypercube sampling (cLHS). In an agricultural field (11 ha) in Quebec, Canada, a total of unique 118 locations were selected using each of the four sampling designs (45 locations each), and additional 30 sample locations were selected as an independent testing dataset (evaluation dataset). Soil visible near-infrared (Vis-NIR) spectra were collected in situ at the 148 locations (1 m depth), and soil cores were collected from a subset of 32 locations and subdivided at 10-cm depth intervals, totaling 251 samples. The Cubist model was used to elucidate the relationship between Vis-NIR spectra and soil properties (soil organic matter (SOM) and clay), which was then used to predict the soil properties at all 148 sample locations. Digital maps of soil properties at multiple depths for the entire field (148 sample locations) were prepared using a quantile random forest model to obtain complete model maps (CM-maps). Soil properties were also mapped using the samples from each of the 45 locations for each sampling design to obtain sampling design maps (SD-maps). The SD-maps were evaluated using the independent testing dataset (30 sample locations), and the spatial distribution and model uncertainty of each SD-map were compared with those of the corresponding CM-map. The spatial and feature space coverage were compared across the four sampling designs. The results showed that GS resulted in the most even spatial coverage, cLHS resulted in the best coverage of the feature space, and GS and cLHS resulted in similar prediction accuracies and spatial distributions of soil properties. The SOM content was underestimated using GRS, with large errors at 0–50 cm depth, due to some values not being captured by this sampling design, whereas larger errors for the deeper soil layers were produced using StRS. Predictions of SOM and clay contents had higher accuracy for topsoil (0–30 cm) than for deep subsoil (60–100 cm). It was concluded that the soil sampling designs with either good spatial coverage or feature space coverage can provide good accuracy in 3D DSM, but their performances may be different for different soil properties.展开更多
文摘Potential of the Random Forest Model on mapping of different desertification processes was studied in Muttuma watershed of mid-Murrumbidgee river region of New South Wales,Australia.Desertification vulnerability index was developed using climate,terrain,vegetation,soil and land quality indices to identify environmentally sensitive areas for desertification.Random Forest Model(RFM)was used to predict the different desertification processes such as soil erosion,salinization and waterlogging in the watershed and the information needed to train classification algorithms was obtained from satellite imagery interpretation and ground truth data.Climatic factors(evaporation,rainfall,temperature),terrain factors(aspect,slope,slope length,steepness,and wetness index),soil properties(pH,organic carbon,clay and sand content)and vulnerability indices were used as an explanatory variable.Classification accuracy and kappa index were calculated for training and testing datasets.We recorded an overall accuracy rate of 87.7%and 72.1%for training and testing sites,respectively.We found larger discrepancies between overall accuracy rate and kappa index for testing datasets(72.2%and 27.5%,respectively)suggesting that all the classes are not predicted well.The prediction of soil erosion and no desertification process was good and poor for salinization and water-logging process.Overall,the results observed give a new idea of using the knowledge of desertification process in training areas that can be used to predict the desertification processes at unvisited areas.
基金supported by the grants from the Natural Science Foundation of Hubei Province(No.2020CFB780)the Fundamental Research Funds for the Central Universities(No.2017KFYXJJ020).
文摘Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identification of human body fluids,and has exhibited excellent performance in predicting single-source body fluids.The present study aims to develop a methylation SNaPshot multiplex system for body fluid identification,and accurately predict the mixture samples.In addition,the value of DNA methylation in the prediction of body fluid mixtures was further explored.Methods In the present study,420 samples of body fluid mixtures and 250 samples of single body fluids were tested using an optimized multiplex methylation system.Each kind of body fluid sample presented the specific methylation profiles of the 10 markers.Results Significant differences in methylation levels were observed between the mixtures and single body fluids.For all kinds of mixtures,the Spearman’s correlation analysis revealed a significantly strong correlation between the methylation levels and component proportions(1:20,1:10,1:5,1:1,5:1,10:1 and 20:1).Two random forest classification models were trained for the prediction of mixture types and the prediction of the mixture proportion of 2 components,based on the methylation levels of 10 markers.For the mixture prediction,Model-1 presented outstanding prediction accuracy,which reached up to 99.3%in 427 training samples,and had a remarkable accuracy of 100%in 243 independent test samples.For the mixture proportion prediction,Model-2 demonstrated an excellent accuracy of 98.8%in 252 training samples,and 98.2%in 168 independent test samples.The total prediction accuracy reached 99.3%for body fluid mixtures and 98.6%for the mixture proportions.Conclusion These results indicate the excellent capability and powerful value of the multiplex methylation system in the identification of forensic body fluid mixtures.
基金This work was supported by the Taif University Researchers supporting Project Number(TURSP-2020/254).
文摘COVID-19,being the virus of fear and anxiety,is one of the most recent and emergent of various respiratory disorders.It is similar to the MERS-COV and SARS-COV,the viruses that affected a large population of different countries in the year 2012 and 2002,respectively.Various standard models have been used for COVID-19 epidemic prediction but they suffered from low accuracy due to lesser data availability and a high level of uncertainty.The proposed approach used a machine learning-based time-series Facebook NeuralProphet model for prediction of the number of death as well as confirmed cases and compared it with Poisson Distribution,and Random Forest Model.The analysis upon dataset has been performed considering the time duration from January 1st 2020 to16th July 2021.The model has been developed to obtain the forecast values till September 2021.This study aimed to determine the pandemic prediction of COVID-19 in the second wave of coronavirus in India using the latest Time-Series model to observe and predict the coronavirus pandemic situation across the country.In India,the cases are rapidly increasing day-by-day since mid of Feb 2021.The prediction of death rate using the proposed model has a good ability to forecast the COVID-19 dataset essentially in the second wave.To empower the prediction for future validation,the proposed model works effectively.
基金the First People’s Hospital of Wenling(approval No.KY-2023-2035-01).
文摘BACKGROUND Type 2 diabetes mellitus(T2DM)is associated with periodontitis.Currently,there are few studies proposing predictive models for periodontitis in patients with T2DM.AIM To determine the factors influencing periodontitis in patients with T2DM by constructing logistic regression and random forest models.METHODS In this a retrospective study,300 patients with T2DM who were hospitalized at the First People’s Hospital of Wenling from January 2022 to June 2022 were selected for inclusion,and their data were collected from hospital records.We used logistic regression to analyze factors associated with periodontitis in patients with T2DM,and random forest and logistic regression prediction models were established.The prediction efficiency of the models was compared using the area under the receiver operating characteristic curve(AUC).RESULTS Of 300 patients with T2DM,224 had periodontitis,with an incidence of 74.67%.Logistic regression analysis showed that age[odds ratio(OR)=1.047,95%confidence interval(CI):1.017-1.078],teeth brushing frequency(OR=4.303,95%CI:2.154-8.599),education level(OR=0.528,95%CI:0.348-0.800),glycosylated hemoglobin(HbA1c)(OR=2.545,95%CI:1.770-3.661),total cholesterol(TC)(OR=2.872,95%CI:1.725-4.781),and triglyceride(TG)(OR=3.306,95%CI:1.019-10.723)influenced the occurrence of periodontitis(P<0.05).The random forest model showed that the most influential variable was HbA1c followed by age,TC,TG, education level, brushing frequency, and sex. Comparison of the prediction effects of the two models showedthat in the training dataset, the AUC of the random forest model was higher than that of the logistic regressionmodel (AUC = 1.000 vs AUC = 0.851;P < 0.05). In the validation dataset, there was no significant difference in AUCbetween the random forest and logistic regression models (AUC = 0.946 vs AUC = 0.915;P > 0.05).CONCLUSION Both random forest and logistic regression models have good predictive value and can accurately predict the riskof periodontitis in patients with T2DM.
文摘Traffic flow prediction,as the basis of signal coordination and travel time prediction,has become a research point in the field of transportation.For traffic flow prediction,researchers have proposed a variety of methods,but most of these methods only use the time domain information of traffic flow data to predict the traffic flow,ignoring the impact of spatial correlation on the prediction of target road segment flow,which leads to poor prediction accuracy.In this paper,a traffic flow prediction model called as long short time memory and random forest(LSTM-RF)was proposed based on the combination model.In the process of traffic flow prediction,the long short time memory(LSTM)model was used to extract the time sequence features of the predicted target road segment.Then,the predicted value of LSTM and the collected information of adjacent upstream and downstream sections were simultaneously used as the input features of the random forest model to analyze the spatial-temporal correlation of traffic flow,so as to obtain the final prediction results.The traffic flow data of 132 urban road sections collected by the license plate recognition system in Guiyang City were tested and verified.The results show that the method is better than the single model in prediction accuracy,and the prediction error is obviously reduced compared with the single model.
文摘BACKGROUND Gestational diabetes mellitus(GDM)is a condition characterized by high blood sugar levels during pregnancy.The prevalence of GDM is on the rise globally,and this trend is particularly evident in China,which has emerged as a significant issue impacting the well-being of expectant mothers and their fetuses.Identifying and addressing GDM in a timely manner is crucial for maintaining the health of both expectant mothers and their developing fetuses.Therefore,this study aims to establish a risk prediction model for GDM and explore the effects of serum ferritin,blood glucose,and body mass index(BMI)on the occurrence of GDM.AIM To develop a risk prediction model to analyze factors leading to GDM,and evaluate its efficiency for early prevention.METHODS The clinical data of 406 pregnant women who underwent routine prenatal examination in Fujian Maternity and Child Health Hospital from April 2020 to December 2022 were retrospectively analyzed.According to whether GDM occurred,they were divided into two groups to analyze the related factors affecting GDM.Then,according to the weight of the relevant risk factors,the training set and the verification set were divided at a ratio of 7:3.Subsequently,a risk prediction model was established using logistic regression and random forest models,and the model was evaluated and verified.RESULTS Pre-pregnancy BMI,previous history of GDM or macrosomia,hypertension,hemoglobin(Hb)level,triglyceride level,family history of diabetes,serum ferritin,and fasting blood glucose levels during early pregnancy were determined.These factors were found to have a significant impact on the development of GDM(P<0.05).According to the nomogram model’s prediction of GDM in pregnancy,the area under the curve(AUC)was determined to be 0.883[95%confidence interval(CI):0.846-0.921],and the sensitivity and specificity were 74.1%and 87.6%,respectively.The top five variables in the random forest model for predicting the occurrence of GDM were serum ferritin,fasting blood glucose in early pregnancy,pre-pregnancy BMI,Hb level and triglyceride level.The random forest model achieved an AUC of 0.950(95%CI:0.927-0.973),the sensitivity was 84.8%,and the specificity was 91.4%.The Delong test showed that the AUC value of the random forest model was higher than that of the decision tree model(P<0.05).CONCLUSION The random forest model is superior to the nomogram model in predicting the risk of GDM.This method is helpful for early diagnosis and appropriate intervention of GDM.
基金funded by the National Key Research and Development Program of China Strategic International Cooperation in Science and Technology Innovation Program (2018YFE0207800)the National Natural Science Foundation of China (31971483)。
文摘The dead fuel moisture content(DFMC)is the key driver leading to fire occurrence.Accurately estimating the DFMC could help identify locations facing fire risks,prioritise areas for fire monitoring,and facilitate timely deployment of fire-suppression resources.In this study,the DFMC and environmental variables,including air temperature,relative humidity,wind speed,solar radiation,rainfall,atmospheric pressure,soil temperature,and soil humidity,were simultaneously measured in a grassland of Ergun City,Inner Mongolia Autonomous Region of China in 2021.We chose three regression models,i.e.,random forest(RF)model,extreme gradient boosting(XGB)model,and boosted regression tree(BRT)model,to model the seasonal DFMC according to the data collected.To ensure accuracy,we added time-lag variables of 3 d to the models.The results showed that the RF model had the best fitting effect with an R2value of 0.847 and a prediction accuracy with a mean absolute error score of 4.764%among the three models.The accuracies of the models in spring and autumn were higher than those in the other two seasons.In addition,different seasons had different key influencing factors,and the degree of influence of these factors on the DFMC changed with time lags.Moreover,time-lag variables within 44 h clearly improved the fitting effect and prediction accuracy,indicating that environmental conditions within approximately 48 h greatly influence the DFMC.This study highlights the importance of considering 48 h time-lagged variables when predicting the DFMC of grassland fuels and mapping grassland fire risks based on the DFMC to help locate high-priority areas for grassland fire monitoring and prevention.
基金National Natural Science Foundation of China,No.42071167,No.42201197,No.40871073The Second Tibetan Plateau Scientific Expedition and Research Program,No.2019QZKK0406Natural Science Foundation of Hebei Province,No.D2007000272。
文摘Random forest model is the mainstream research method used to accurately describe the distribution law and impact mechanism of regional population.We took Shijiazhuang as the research area,with comprehensive zoning based on endowments as the modeling unit,conducted stratified sampling on a hectare grid cell,and systematically carried out incremental selection experiments of population density impact factors,optimizing the population density random forest model throughout the process(zonal modeling,stratified sampling,factor selection,weighted output).The results are as follows:(1)Zonal modeling addresses the issue of confusion in population distribution laws caused by a single model.Sampling on a grid cell not only ensures the quality of training data by avoiding the modifiable areal unit problem(MAUP)but also attempts to mitigate the adverse effects of the ecological fallacy.Stratified sampling ensures the stability of population density label values(target variable)in the training sample.(2)Zonal selection experiments on population density impact factors help identify suitable combinations of factors,leading to a significant improvement in the goodness of fit(R^(2))of the zonal models.(3)Weighted combination output of the population density prediction dataset substantially enhances the model's robustness.(4)The population density dataset exhibits multi-scale superposition characteristics.On a large scale,the population density in plains is higher than that in mountainous areas,while on a small scale,urban areas have higher density compared to rural areas.The optimization scheme for the population density random forest model that we propose offers a unified technical framework for uncovering local population distribution law and the impact mechanisms.
基金funded by the National Natural Science Foundation of China(42371022,42030501,41877148).
文摘Critical zone(CZ)plays a vital role in sustaining biodiversity and humanity.However,flux quantification within CZ,particularly in terms of subsurface hydrological partitioning,remains a significant challenge.This study focused on quantifying subsurface hydrological partitioning,specifically in an alpine mountainous area,and highlighted the important role of lateral flow during this process.Precipitation was usually classified as two parts into the soil:increased soil water content(SWC)and lateral flow out of the soil pit.It was found that 65%–88%precipitation contributed to lateral flow.The second common partitioning class showed an increase in SWC caused by both precipitation and lateral flow into the soil pit.In this case,lateral flow contributed to the SWC increase ranging from 43%to 74%,which was notably larger than the SWC increase caused by precipitation.On alpine meadows,lateral flow from the soil pit occurred when the shallow soil was wetter than the field capacity.This result highlighted the need for three-dimensional simulation between soil layers in Earth system models(ESMs).During evapotranspiration process,significant differences were observed in the classification of subsurface hydrological partitioning among different vegetation types.Due to tangled and aggregated fine roots in the surface soil on alpine meadows,the majority of subsurface responses involved lateral flow,which provided 98%–100%of evapotranspiration(ET).On grassland,there was a high probability(0.87),which ET was entirely provided by lateral flow.The main reason for underestimating transpiration through soil water dynamics in previous research was the neglect of lateral root water uptake.Furthermore,there was a probability of 0.12,which ET was entirely provided by SWC decrease on grassland.In this case,there was a high probability(0.98)that soil water responses only occurred at layer 2(10–20 cm),because grass roots mainly distributed in this soil layer,and grasses often used their deep roots for water uptake during ET.To improve the estimation of soil water dynamics and ET,we established a random forest(RF)model to simulate lateral flow and then corrected the community land model(CLM).RF model demonstrated good performance and led to significant improvements in CLM simulation.These findings enhance our understanding of subsurface hydrological partitioning and emphasize the importance of considering lateral flow in ESMs and hydrological research.
文摘In a recent paper,Hong et al developed an artificial intelligence(AI)-driven predictive scoring system for potential complications following laparoscopic radical gastrectomy for gastric cancer patients.They demonstrated that integrating AI with random forest models significantly improved the preoperative prediction and patient outcome management accuracy.By incorporating data from multiple centers,their model ensures standardization,reliability,and broad applicability,distinguishing it from the prior models.The present study highlights AI's potential in clinical decision support,aiding in the preoperative and postoperative management of gastric cancer patients.Our findings may pave the way for future prospective studies to further enhance AI-supported diagnoses in clinical practice.
基金supported by the Second Tibetan Plateau Scientific Expedition and Research Program (STEP) (Grant No.2019QZKK0307)。
文摘Climate change influences both ecosystems and ecosystem services.The impacts of climate change on ecosystems and ecosystem services have been separately documented.However,it is less well known how ecosystem changes driven by climate change will influence ecosystem services,especially in climate-sensitive regions.Here,we analyzed future climate trends between 2040 and 2100 under four Shared Socioeconomic Pathway(SSP) scenarios(SSP1-2.6,SSP2-4.5,SSP3-7.0,and SSP5-8.5) from the Coupled Model Intercomparison Project 6(CMIP6).We quantified their impacts on ecosystems patterns and on the ecosystem service of sandstorm prevention on the Qinghai-Tibet Plateau(QTP),one of the most climate-sensitive regions in the world,using Random Forest model(RF) and Revised Wind Erosion Equation(RWEQ).Strong warming(0.04℃/yr) and wetting(0.65 mm/yr) trends were projected from 2015 to 2100.Under these trends,there will be increased interspersion in the pattern of grassland and sparse vegetation with meadow and swamp vegetation,although their overall area will remain similar,while the areas of shrub and needleleaved forest classes will increase and move toward higher altitudes.Driven by the changes in ecosystem patterns caused by climate change indirectly,grassland will play an irreplaceable role in providing sandstorm prevention services,and sandstorm prevention services will increase gradually from 2040 to 2100(1.059-1.070 billion tons) on the QTP.However,some areas show a risk of deterioration in the future and these should be the focus of ecological rehabilitation.Our research helps to understand the cascading relationship among climate change,ecosystem patterns and ecosystem services,which provides important spatio-temporal information for future ecosystem service management.
基金This work was supported by the National Natural Science Foundation of China(Grant number 42077204)the National Key Research and Development Project(Grant number 2017YFC0210103)with data support provided by the National Earth System Science Data Center,National Science&Technology Infrastructure of China(http://www.geodata.cn).
文摘The COVID-19 lockdowns led to abrupt reductions in human-related emissions worldwide and had an unintended impact on air quality improvement.However,quantifying this impact is difficult as meteorological conditions may mask the real effect of changes in emissions on the observed concentrations of pollutants.Based on the air quality and meteorological data at 35 sites in Beijing from 2015 to 2020,a machine learning technique was applied to decouple the impacts of meteorology and emissions on the concentrations of air pollutants.The results showed that the real(“deweathered”)concentrations of air pollutants(expect for O 3)dropped significantly due to lockdown measures.Compared with the scenario without lockdowns(predicted concentrations),the observed values of PM_(2.5),PM_(10),SO_(2),NO_(2),and CO during lockdowns decreased by 39.4%,50.1%,51.8%,43.1%,and 35.1%,respectively.In addition,a significant decline for NO_(2)and CO was found at the background sites(51%and 37.8%)rather than the traffic sites(37.1%and 35.5%),which is different from the common belief.While the primary emissions reduced during the lockdown period,episodic haze events still occurred due to unfavorable meteorological conditions.Thus,developing an optimized strategy to tackle air pollution in Beijing is essential in the future.
基金supported by the National Natural Science Foundation of China (41901048)the Project of State Key Laboratory of Desert and Oasis Ecology,Xinjiang Institute of Ecology and Geography,Chinese Academy of Sciences (E151030101)+1 种基金the Project of National Cryosphere Desert Data Center of China (2021kf02)the Youth Innovation Promotion Association of the Chinese Academy of Sciences (2021438)
文摘Dew is an essential water resource for the survival and reproduction of organisms in arid and semi-arid regions.Yet estimating the dew amount and quantifying its long-term variation are challenging.In this study,we elucidate the dew amount and its long-term variation in the Kunes River Valley,Northwest China,based on the measured daily dew amount and reconstructed values(using meteorological data from 1980 to 2021),respectively.Four key results were found:(1)the daily mean dew amount was 0.05 mm during the observation period(4 July-12 August and 13 September-7 October of 2021).In 35 d of the observation period(i.e.,73%of the observation period),the daily dew amount exceeded the threshold(>0.03 mm/d)for microorganisms;(2)air temperature,relative humidity,and wind speed had significant impacts on the daily dew amount based on the relationships between the measured dew amount and meteorological variables;(3)for estimating the daily dew amount,random forest(RF)model outperformed multiple linear regression(MLR)model given its larger R^(2) and lower MAE and RMSE;and(4)the dew amount during June-October and in each month did not vary significantly from 1980 to the beginning of the 21^(st) century.It then significantly decreased for about a decade,after it increased slightly from 2013 to 2021.For the whole meteorological period of 1980-2021,the dew amount decreased significantly during June-October and in July and September,and there was no significant variation in June,August,and October.Variation in the dew amount in the Kunes River Valley was mainly driven by relative humidity.This study illustrates that RF model can be used to reconstruct long-term variation in the dew amount,which provides valuable information for us to better understand the dew amount and its relationship with climate change.
基金Under the auspices of Basic Project of State Commission of Science Technology of China(No.2008FY110600)National Natural Science Foundation of China(No.91325301,41401237,41571212,41371224)Field Frontier Program of Institute of Soil Science,Chinese Academy of Sciences(No.ISSASIP1624)
文摘The main aim of this paper was to calculate soil organic carbon stock(SOCS) with consideration of the pedogenetic horizons using expert knowledge and GIS-based methods in northeastern China.A novel prediction process was presented and was referred to as model-then-calculate with respect to the variable thicknesses of soil horizons(MCV).The model-then-calculate with fixed-thickness(MCF),soil profile statistics(SPS),pedological professional knowledge-based(PKB) and vegetation type-based(Veg) methods were carried out for comparison.With respect to the similar pedological information,nine common layers from topsoil to bedrock were grouped in the MCV.Validation results suggested that the MCV method generated better performance than the other methods considered.For the comparison of polygon based approaches,the Veg method generated better accuracy than both SPS and PKB,as limited soil data were incorporated.Additional prediction of the pedogenetic horizons within MCV benefitted the regional SOCS estimation and provided information for future soil classification and understanding of soil functions.The intermediate product,that is,horizon thickness maps were fluctuant enough and reflected many details in space.The linear mixed model indicated that mean annual air temperature(MAAT) was the most important predictor for the SOCS simulation.The minimal residual of the linear mixed models was achieved in the vegetation type-based model,whereas the maximal residual was fitted in the soil type-based model.About 95% of SOCS could be found in Argosols,Cambosols and Isohumosols.The largest SOCS was found in the croplands with vegetation of Triticum aestivum L.,Sorghum bicolor(L.) Moench,Glycine max(L.) Merr.,Zea mays L.and Setaria italica(L.) P.Beauv.
基金This research was supported by the National Key Research and Development Program of China(2019YFC1510505)the Xinjiang University PhD Start-up Fund(BS210226)the National College Student Research Training Plan of China(202210755004).
文摘Surface albedo is a quantitative indicator for land surface processes and climate modeling,and plays an important role in surface radiation balance and climate change.In this study,by means of the MCD43A3 surface albedo product developed on the basis of Moderate Resolution Imaging Spectroradiometer(MODIS),we analyzed the spatiotemporal variation,persistence status,land cover type differences,and annual and seasonal differences of surface albedo,as well as the relationship between surface albedo and various influencing factors(including Normalized Difference Snow Index(NDSI),precipitation,Normalized Difference Vegetation Index(NDVI),land surface temperature,soil moisture,air temperature,and digital elevation model(DEM))in the north of Xinjiang Uygur Autonomous Region(northern Xinjiang)of Northwest China from 2010 to 2020 based on the unary linear regression,Hurst index,and Pearson's correlation coefficient analyses.Combined with the random forest(RF)model and geographical detector(Geodetector),the importance of the above-mentioned influencing factors as well as their interactions on surface albedo were quantitatively evaluated.The results showed that the seasonal average surface albedo in northern Xinjiang was the highest in winter and the lowest in summer.The annual average surface albedo from 2010 to 2020 was high in the west and north and low in the east and south,showing a weak decreasing trend and a small and stable overall variation.Land cover types had a significant impact on the variation of surface albedo.The annual average surface albedo in most regions of northern Xinjiang was positively correlated with NDSI and precipitation,and negatively correlated with NDVI,land surface temperature,soil moisture,and air temperature.In addition,the correlations between surface albedo and various influencing factors showed significant differences for different land cover types and in different seasons.To be specific,NDSI had the largest influence on surface albedo,followed by precipitation,land surface temperature,and soil moisture;whereas NDVI,air temperature,and DEM showed relatively weak influences.However,the interactions of any two influencing factors on surface albedo were enhanced,especially the interaction of air temperature and DEM.NDVI showed a nonlinear enhancement of influence on surface albedo when interacted with land surface temperature or precipitation,with an explanatory power greater than 92.00%.This study has a guiding significance in correctly understanding the land-atmosphere interactions in northern Xinjiang and improving the regional land-surface process simulation and climate prediction.
基金supported by the National Natural Science Foundation of China (U2004121, 82070643, and U1904164)the Research Project of Jinan Microecological Biomedicine Shandong Laboratory (JNL-2022015B and JNL-2022001A)the National Key Research and Development Program of China (2018YFC2000500).
文摘Dear Editor,Alterations in the human microbiome are closely related to various hepatobiliary diseases.Gut microbial dysbiosis has been found in patients with cholangiocarcinoma(CCA)[1].However,the characteristics of oral microbiome in patients with CCA have not been studied.
文摘Survival rates following radical surgery for gastric neuroendocrine neoplasms(g-NENs)are low,with high recurrence rates.This fact impacts patient prognosis and complicates postoperative management.Traditional prognostic models,including the Cox proportional hazards(CoxPH)model,have shown limited predictive power for postoperative survival in gastrointestinal neuroectodermal tumor patients.Machine learning methods offer a unique opportunity to analyze complex relationships within datasets,providing tools and methodologies to assess large volumes of high-dimensional,multimodal data generated by biological sciences.These methods show promise in predicting outcomes across various medical disciplines.In the context of g-NENs,utilizing machine learning to predict survival outcomes holds potential for personalized postoperative management strategies.This editorial reviews a study exploring the advantages and effectiveness of the random survival forest(RSF)model,using the lymph node ratio(LNR),in predicting disease-specific survival(DSS)in postoperative g-NEN patients stratified into low-risk and high-risk groups.The findings demonstrate that the RSF model,incorporating LNR,outperformed the CoxPH model in predicting DSS and constitutes an important step towards precision medicine.
基金supported by the National Natural Science foundation of China(No.42105117)the Natural Science Foundation of Jiangsu Province(No.BK20200802)supported by the National Key R&D Program of China(Nos.2020YFA0607501 and 2019YFA0607202)。
文摘Methane is the second largest anthropogenic greenhouse gas,and changes in atmospheric methane concentrations can reflect the dynamic balance between its emissions and sinks.Therefore,the monitoring of CH_(4) concentration changes and the assessment of underlying driving factors can provide scientific basis for the government’s policy making and evaluation.China is the world’s largest emitter of anthropogenic methane.However,due to the lack of ground-based observation sites,little work has been done on the spatial-temporal variations for the past decades and influencing factors in China,especially for areas with high anthropogenic emissions as Central and Eastern China.Here to quantify atmospheric CH_(4) enhancements trends and its driving factors in Central and Eastern China,we combined the most up-to-date TROPOMI satellite-based column CH_(4)(xCH_(4))concentration from 2018 to 2022,anthropogenic and natural emissions,and a random forest-based machine learning approach,to simulate atmospheric xCH_(4) enhancements from 2001 to 2018.The results showed that(1)the random forest model was able to accurately establish the relationship between emission sources and xCH_(4) enhancement with a correlation coefficient(R^(2))of 0.89 and a root mean-square error(RMSE)of 11.98 ppb;(2)The xCH_(4) enhancement only increased from 48.21±2.02 ppb to 49.79±1.87 ppb from the year of 2001 to 2018,with a relative change of 3.27%±0.13%;(3)The simulation results showed that the energy activities and waste treatment were the main contributors to the increase in xCH_(4) enhancement,contributing 68.00% and 31.21%,respectively,and the decrease of animal ruminants contributed-6.70% of its enhancement trend.
基金National Natural Science Foundation of China,No.41771133,No.41822104The Strategic Priority Research Program of the Chinese Academy of Sciences,No.XDA19040403。
文摘It is of great theoretical and practical importance to carry out research on the spatio-temporal evolution of urban air pollution and its driving forces,which helps to facilitate a deeper understanding of the mutual feedback mechanisms between the urban environment and socio-economic systems.Comprehension of these mechanisms will contribute to the design and implementation of efficient environmental policies that ultimately will improve the quality of urbanization development.This paper illustrates the spatio-temporal evolutionary characteristics of six urban ambient air pollutant concentrations,namely,CO,NO_(2),O_(3),PM_(10),PM_(2.5),SO_(2),in 286 sample cities above the prefecture level in China from 2014 to 2019.The interactions between the pollutant concentrations are analyzed based on panel regression models.A random forest model is then employed to explore the correlations between the concentrations of these six pollutants and 13 natural and socio-economic impact factors to isolate the most crucial ones.The results reveal three aspects.First,within the research period,the average annual concentration of O_(3)increased while that of other pollutants decreased year by year.Second,there were significant interactions between concentrations of the six pollutants,leading to obvious compound air pollution in urban areas.Third,the impact of natural and socio-economic factors on urban air quality varied greatly among different air pollutants,with air temperature,vegetation coverage,urbanization level and traffic factors ranking high and the different response thresholds to the dominant influencing factors.In light of the limited ability of humans to control the natural environment and meteorological conditions,it is recommended that urban air quality be further improved by optimizing urban density,controlling anthropogenic emission sources,and implementing strict air pollution prevention and control measures.
基金the National Science and Engineering Research Council of Canada(No.RGPIN-2014-04100)for funding this project.
文摘Digital soil mapping (DSM) aims to produce detailed maps of soil properties or soil classes to improve agricultural management and soil quality assessment. Optimized sampling design can reduce the substantial costs and efforts associated with sampling, profile description, and laboratory analysis. The purpose of this study was to compare common sampling designs for DSM, including grid sampling (GS), grid random sampling (GRS), stratified random sampling (StRS), and conditioned Latin hypercube sampling (cLHS). In an agricultural field (11 ha) in Quebec, Canada, a total of unique 118 locations were selected using each of the four sampling designs (45 locations each), and additional 30 sample locations were selected as an independent testing dataset (evaluation dataset). Soil visible near-infrared (Vis-NIR) spectra were collected in situ at the 148 locations (1 m depth), and soil cores were collected from a subset of 32 locations and subdivided at 10-cm depth intervals, totaling 251 samples. The Cubist model was used to elucidate the relationship between Vis-NIR spectra and soil properties (soil organic matter (SOM) and clay), which was then used to predict the soil properties at all 148 sample locations. Digital maps of soil properties at multiple depths for the entire field (148 sample locations) were prepared using a quantile random forest model to obtain complete model maps (CM-maps). Soil properties were also mapped using the samples from each of the 45 locations for each sampling design to obtain sampling design maps (SD-maps). The SD-maps were evaluated using the independent testing dataset (30 sample locations), and the spatial distribution and model uncertainty of each SD-map were compared with those of the corresponding CM-map. The spatial and feature space coverage were compared across the four sampling designs. The results showed that GS resulted in the most even spatial coverage, cLHS resulted in the best coverage of the feature space, and GS and cLHS resulted in similar prediction accuracies and spatial distributions of soil properties. The SOM content was underestimated using GRS, with large errors at 0–50 cm depth, due to some values not being captured by this sampling design, whereas larger errors for the deeper soil layers were produced using StRS. Predictions of SOM and clay contents had higher accuracy for topsoil (0–30 cm) than for deep subsoil (60–100 cm). It was concluded that the soil sampling designs with either good spatial coverage or feature space coverage can provide good accuracy in 3D DSM, but their performances may be different for different soil properties.