This paper addresses the problem of predicting population density leveraging cellular station data.As wireless communication devices are commonly used,cellular station data has become integral for estimating populatio...This paper addresses the problem of predicting population density leveraging cellular station data.As wireless communication devices are commonly used,cellular station data has become integral for estimating population figures and studying their movement,thereby implying significant contributions to urban planning.However,existing research grapples with issues pertinent to preprocessing base station data and the modeling of population prediction.To address this,we propose methodologies for preprocessing cellular station data to eliminate any irregular or redundant data.The preprocessing reveals a distinct cyclical characteristic and high-frequency variation in population shift.Further,we devise a multi-view enhancement model grounded on the Transformer(MVformer),targeting the improvement of the accuracy of extended time-series population predictions.Comparative experiments,conducted on the above-mentioned population dataset using four alternate Transformer-based models,indicate that our proposedMVformer model enhances prediction accuracy by approximately 30%for both univariate and multivariate time-series prediction assignments.The performance of this model in tasks pertaining to population prediction exhibits commendable results.展开更多
BACKGROUND Non-alcoholic fatty liver disease(NAFLD)is the most common chronic liver disease,affecting over 30% of the United States population.Early patient identification using a simple method is highly desirable.AIM...BACKGROUND Non-alcoholic fatty liver disease(NAFLD)is the most common chronic liver disease,affecting over 30% of the United States population.Early patient identification using a simple method is highly desirable.AIM To create machine learning models for predicting NAFLD in the general United States population.METHODS Using the NHANES 1988-1994.Thirty NAFLD-related factors were included.The dataset was divided into the training(70%)and testing(30%)datasets.Twentyfour machine learning algorithms were applied to the training dataset.The bestperforming models and another interpretable model(i.e.,coarse trees)were tested using the testing dataset.RESULTS There were 3235 participants(n=3235)that met the inclusion criteria.In the training phase,the ensemble of random undersampling(RUS)boosted trees had the highest F1(0.53).In the testing phase,we compared selective machine learning models and NAFLD indices.Based on F1,the ensemble of RUS boosted trees remained the top performer(accuracy 71.1%and F10.56)followed by the fatty liver index(accuracy 68.8% and F10.52).A simple model(coarse trees)had an accuracy of 74.9% and an F1 of 0.33.CONCLUSION Not every machine learning model is complex.Using a simpler model such as coarse trees,we can create an interpretable model for predicting NAFLD with only two predictors:fasting C-peptide and waist circumference.Although the simpler model does not have the best performance,its simplicity is useful in clinical practice.展开更多
Due to the development of digital transformation,intelligent algorithms are getting more and more attention.The whale optimization algorithm(WOA)is one of swarm intelligence optimization algorithms and is widely used ...Due to the development of digital transformation,intelligent algorithms are getting more and more attention.The whale optimization algorithm(WOA)is one of swarm intelligence optimization algorithms and is widely used to solve practical engineering optimization problems.However,with the increased dimensions,higher requirements are put forward for algorithm performance.The double population whale optimization algorithm with distributed collaboration and reverse learning ability(DCRWOA)is proposed to solve the slow convergence speed and unstable search accuracy of the WOA algorithm in optimization problems.In the DCRWOA algorithm,the novel double population search strategy is constructed.Meanwhile,the reverse learning strategy is adopted in the population search process to help individuals quickly jump out of the non-ideal search area.Numerical experi-ments are carried out using standard test functions with different dimensions(10,50,100,200).The optimization case of shield construction parameters is also used to test the practical application performance of the proposed algo-rithm.The results show that the DCRWOA algorithm has higher optimization accuracy and stability,and the convergence speed is significantly improved.Therefore,the proposed DCRWOA algorithm provides a better method for solving practical optimization problems.展开更多
Objective Asymptomatic carotid stenosis(ACS)is closely associated to the incidence of severe cerebrovascular diseases.Early identifying the individuals with ACS and its associated risk factors could be beneficial for ...Objective Asymptomatic carotid stenosis(ACS)is closely associated to the incidence of severe cerebrovascular diseases.Early identifying the individuals with ACS and its associated risk factors could be beneficial for primary prevention of stroke.This study aimed to investigate a machine-learning algorithm for the detection of ACS among high-risk population of stroke based on the associated risk factors.Methods A novel model of machine learning was utilized to screen the associated predictors of ACS based on 30 potential risk factors.The algorithm of this model adopted a random forest pattern based on the training data and then was verified using the testing data.All of the original data were retrieved from the China National Stroke Screening and Prevention Project(CNSSPP),including demographic,clinical and laboratory characteristics.The individuals with high risk of stroke were enrolled and randomly divided into a training group and a testing group at a ratio of 4:1.The identification of carotid stenosis by carotid artery duplex scans was set as the golden standard.The receiver operating characteristic(ROC)curve and the area under the curve(AUC)was used to evaluate the efficacy of the model in detecting ACS.Results Of 2841 high risk individual of stroke enrolled,326(11.6%)were diagnosed as ACS by ultrasonography.The top five risk fectors contributing to ACS in this model were identified as family history of dyslipidemia,high level of lowdensity lipoprotein cholesterol(LDL-c),low level of high-density lipoprotein cholesterol(HDL-c),aging,and low body.展开更多
Background:Nonalcoholic fatty liver disease(NAFLD)is a public health challenge and significant cause of morbidity and mortality worldwide.Early identification is crucial for disease intervention.We recently proposed a...Background:Nonalcoholic fatty liver disease(NAFLD)is a public health challenge and significant cause of morbidity and mortality worldwide.Early identification is crucial for disease intervention.We recently proposed a nomogram-based NAFLD prediction model from a large population cohort.We aimed to explore machine learning tools in predicting NAFLD.Methods:A retrospective cross-sectional study was performed on 15315 Chinese subjects(10373 training and 4942 testing sets).Selected clinical and biochemical factors were evaluated by different types of machine learning algorithms to develop and validate seven predictive models.Nine evaluation indicators including area under the receiver operating characteristic curve(AUROC),area under the precision-recall curve(AUPRC),accuracy,positive predictive value,sensitivity,F1 score,Matthews correlation coefficient(MCC),specificity and negative prognostic value were applied to compare the performance among the models.The selected clinical and biochemical factors were ranked according to the importance in prediction ability.Results:Totally 4018/10373(38.74%)and 1860/4942(37.64%)subjects had ultrasound-proven NAFLD in the training and testing sets,respectively.Seven machine learning based models were developed and demonstrated good performance in predicting NAFLD.Among these models,the XGBoost model revealed the highest AUROC(0.873),AUPRC(0.810),accuracy(0.795),positive predictive value(0.806),F1 score(0.695),MCC(0.557),specificity(0.909),demonstrating the best prediction ability among the built models.Body mass index was the most valuable indicator to predict NAFLD according to the feature ranking scores.Conclusions:The XGBoost model has the best overall prediction ability for diagnosing NAFLD.The novel machine learning tools provide considerable beneficial potential in NAFLD screening.展开更多
The only known predictable aggregation of dwarf minke whales (Balaenoptera acutorostrata subsp.) occurs in the Australian offshore waters of the northern Great Barrier Reef in May-August each year. The identification ...The only known predictable aggregation of dwarf minke whales (Balaenoptera acutorostrata subsp.) occurs in the Australian offshore waters of the northern Great Barrier Reef in May-August each year. The identification of individual whales is required for research on the whales’ population characteristics and for monitoring the potential impacts of tourism activities, including commercial swims with the whales. At present, it is not cost-effective for researchers to manually process and analyze the tens of thousands of underwater images collated after each observation/tourist season, and a large data base of historical non-identified imagery exists. This study reports the first proof of concept for recognizing individual dwarf minke whales using the Deep Learning Convolutional Neural Networks (CNN).The “off-the-shelf” Image net-trained VGG16 CNN was used as the feature-encoder of the perpixel sematic segmentation Automatic Minke Whale Recognizer (AMWR). The most frequently photographed whale in a sample of 76 individual whales (MW1020) was identified in 179 images out of the total 1320 images provid-ed. Training and image augmentation procedures were developed to compen-sate for the small number of available images. The trained AMWR achieved 93% prediction accuracy on the testing subset of 36 positive/MW1020 and 228 negative/not-MW1020 images, where each negative image contained at least one of the other 75 whales. Furthermore on the test subset, AMWR achieved 74% precision, 80% recall, and 4% false-positive rate, making the presented approach comparable or better to other state-of-the-art individual animal recognition results.展开更多
In this study,the future landslide population amount risk(LPAR)is assessed based on integrated machine learning models(MLMs)and scenario simulation techniques in Shuicheng County,China.Firstly,multiple MLMs were selec...In this study,the future landslide population amount risk(LPAR)is assessed based on integrated machine learning models(MLMs)and scenario simulation techniques in Shuicheng County,China.Firstly,multiple MLMs were selected and hyperparameters were optimized,and the generated 11 models were crossintegrated to select the best model to calculate landslide susceptibility;by calculating precipitation for different extreme precipitation recurrence periods and combining the susceptibility results to assess the landslide hazard.Using the town as the basic unit,the exposure and vulnerability of the future landslide population under different Shared Socioeconomic Pathways(SSPs)scenarios in each town were assessed,and then combined with the hazard to estimate the LPAR in 2050.The results showed that the integrated model with the optimized random forest model as the combination strategy had the best comprehensive performance in susceptibility assessment.The distribution of hazard classes is similar to susceptibility,and with an increase in precipitation,the low-hazard area and high-hazard decrease and shift to medium-hazard and very high-hazard classes.The high-risk areas for future landslide populations in Shuicheng County are mainly concentrated in the three southwestern towns with high vulnerability,whereas the northern towns of Baohua and Qinglin are at the lowest risk class.The LPAR increased with the intensity of extreme precipitation.The LPAR differs significantly among the SSPs scenarios,with the lowest in the“fossil-fueled development(SSP5)”scenario and the highest in the“regional rivalry(SSP3)”scenario.In summary,the landslide susceptibility model based on integrated machine learning proposed in this study has a high predictive capability.The results of future LPAR assessment can provide theoretical guidance for relevant departments to cope with future socioeconomic development challenges and make corresponding disaster prevention and mitigation plans to prevent landslide risks from a developmental perspective.展开更多
基金Guangdong Basic and Applied Basic Research Foundation under Grant No.2024A1515012485in part by the Shenzhen Fundamental Research Program under Grant JCYJ20220810112354002.
文摘This paper addresses the problem of predicting population density leveraging cellular station data.As wireless communication devices are commonly used,cellular station data has become integral for estimating population figures and studying their movement,thereby implying significant contributions to urban planning.However,existing research grapples with issues pertinent to preprocessing base station data and the modeling of population prediction.To address this,we propose methodologies for preprocessing cellular station data to eliminate any irregular or redundant data.The preprocessing reveals a distinct cyclical characteristic and high-frequency variation in population shift.Further,we devise a multi-view enhancement model grounded on the Transformer(MVformer),targeting the improvement of the accuracy of extended time-series population predictions.Comparative experiments,conducted on the above-mentioned population dataset using four alternate Transformer-based models,indicate that our proposedMVformer model enhances prediction accuracy by approximately 30%for both univariate and multivariate time-series prediction assignments.The performance of this model in tasks pertaining to population prediction exhibits commendable results.
文摘BACKGROUND Non-alcoholic fatty liver disease(NAFLD)is the most common chronic liver disease,affecting over 30% of the United States population.Early patient identification using a simple method is highly desirable.AIM To create machine learning models for predicting NAFLD in the general United States population.METHODS Using the NHANES 1988-1994.Thirty NAFLD-related factors were included.The dataset was divided into the training(70%)and testing(30%)datasets.Twentyfour machine learning algorithms were applied to the training dataset.The bestperforming models and another interpretable model(i.e.,coarse trees)were tested using the testing dataset.RESULTS There were 3235 participants(n=3235)that met the inclusion criteria.In the training phase,the ensemble of random undersampling(RUS)boosted trees had the highest F1(0.53).In the testing phase,we compared selective machine learning models and NAFLD indices.Based on F1,the ensemble of RUS boosted trees remained the top performer(accuracy 71.1%and F10.56)followed by the fatty liver index(accuracy 68.8% and F10.52).A simple model(coarse trees)had an accuracy of 74.9% and an F1 of 0.33.CONCLUSION Not every machine learning model is complex.Using a simpler model such as coarse trees,we can create an interpretable model for predicting NAFLD with only two predictors:fasting C-peptide and waist circumference.Although the simpler model does not have the best performance,its simplicity is useful in clinical practice.
基金supported by Anhui Polytechnic University Introduced Talents Research Fund(No.2021YQQ064)Anhui Polytechnic University ScientificResearch Project(No.Xjky2022168).
文摘Due to the development of digital transformation,intelligent algorithms are getting more and more attention.The whale optimization algorithm(WOA)is one of swarm intelligence optimization algorithms and is widely used to solve practical engineering optimization problems.However,with the increased dimensions,higher requirements are put forward for algorithm performance.The double population whale optimization algorithm with distributed collaboration and reverse learning ability(DCRWOA)is proposed to solve the slow convergence speed and unstable search accuracy of the WOA algorithm in optimization problems.In the DCRWOA algorithm,the novel double population search strategy is constructed.Meanwhile,the reverse learning strategy is adopted in the population search process to help individuals quickly jump out of the non-ideal search area.Numerical experi-ments are carried out using standard test functions with different dimensions(10,50,100,200).The optimization case of shield construction parameters is also used to test the practical application performance of the proposed algo-rithm.The results show that the DCRWOA algorithm has higher optimization accuracy and stability,and the convergence speed is significantly improved.Therefore,the proposed DCRWOA algorithm provides a better method for solving practical optimization problems.
基金Fund supported by the Medical Science and Tech no logy Development Foundatio n(YKK18114)the Gen era I Social Development Medical and Health Project of Nanjing Science and Technology Commission(201803029).
文摘Objective Asymptomatic carotid stenosis(ACS)is closely associated to the incidence of severe cerebrovascular diseases.Early identifying the individuals with ACS and its associated risk factors could be beneficial for primary prevention of stroke.This study aimed to investigate a machine-learning algorithm for the detection of ACS among high-risk population of stroke based on the associated risk factors.Methods A novel model of machine learning was utilized to screen the associated predictors of ACS based on 30 potential risk factors.The algorithm of this model adopted a random forest pattern based on the training data and then was verified using the testing data.All of the original data were retrieved from the China National Stroke Screening and Prevention Project(CNSSPP),including demographic,clinical and laboratory characteristics.The individuals with high risk of stroke were enrolled and randomly divided into a training group and a testing group at a ratio of 4:1.The identification of carotid stenosis by carotid artery duplex scans was set as the golden standard.The receiver operating characteristic(ROC)curve and the area under the curve(AUC)was used to evaluate the efficacy of the model in detecting ACS.Results Of 2841 high risk individual of stroke enrolled,326(11.6%)were diagnosed as ACS by ultrasonography.The top five risk fectors contributing to ACS in this model were identified as family history of dyslipidemia,high level of lowdensity lipoprotein cholesterol(LDL-c),low level of high-density lipoprotein cholesterol(HDL-c),aging,and low body.
基金supported by grants from the National Natural Science Foundation of China(81970543 and 81570591)Zhejiang Provincial Medical&Hygienic Science and Technology Project of China(2018KY385)Zhejiang Provincial Natural Science Foundation of China(LY20H160023)。
文摘Background:Nonalcoholic fatty liver disease(NAFLD)is a public health challenge and significant cause of morbidity and mortality worldwide.Early identification is crucial for disease intervention.We recently proposed a nomogram-based NAFLD prediction model from a large population cohort.We aimed to explore machine learning tools in predicting NAFLD.Methods:A retrospective cross-sectional study was performed on 15315 Chinese subjects(10373 training and 4942 testing sets).Selected clinical and biochemical factors were evaluated by different types of machine learning algorithms to develop and validate seven predictive models.Nine evaluation indicators including area under the receiver operating characteristic curve(AUROC),area under the precision-recall curve(AUPRC),accuracy,positive predictive value,sensitivity,F1 score,Matthews correlation coefficient(MCC),specificity and negative prognostic value were applied to compare the performance among the models.The selected clinical and biochemical factors were ranked according to the importance in prediction ability.Results:Totally 4018/10373(38.74%)and 1860/4942(37.64%)subjects had ultrasound-proven NAFLD in the training and testing sets,respectively.Seven machine learning based models were developed and demonstrated good performance in predicting NAFLD.Among these models,the XGBoost model revealed the highest AUROC(0.873),AUPRC(0.810),accuracy(0.795),positive predictive value(0.806),F1 score(0.695),MCC(0.557),specificity(0.909),demonstrating the best prediction ability among the built models.Body mass index was the most valuable indicator to predict NAFLD according to the feature ranking scores.Conclusions:The XGBoost model has the best overall prediction ability for diagnosing NAFLD.The novel machine learning tools provide considerable beneficial potential in NAFLD screening.
文摘The only known predictable aggregation of dwarf minke whales (Balaenoptera acutorostrata subsp.) occurs in the Australian offshore waters of the northern Great Barrier Reef in May-August each year. The identification of individual whales is required for research on the whales’ population characteristics and for monitoring the potential impacts of tourism activities, including commercial swims with the whales. At present, it is not cost-effective for researchers to manually process and analyze the tens of thousands of underwater images collated after each observation/tourist season, and a large data base of historical non-identified imagery exists. This study reports the first proof of concept for recognizing individual dwarf minke whales using the Deep Learning Convolutional Neural Networks (CNN).The “off-the-shelf” Image net-trained VGG16 CNN was used as the feature-encoder of the perpixel sematic segmentation Automatic Minke Whale Recognizer (AMWR). The most frequently photographed whale in a sample of 76 individual whales (MW1020) was identified in 179 images out of the total 1320 images provid-ed. Training and image augmentation procedures were developed to compen-sate for the small number of available images. The trained AMWR achieved 93% prediction accuracy on the testing subset of 36 positive/MW1020 and 228 negative/not-MW1020 images, where each negative image contained at least one of the other 75 whales. Furthermore on the test subset, AMWR achieved 74% precision, 80% recall, and 4% false-positive rate, making the presented approach comparable or better to other state-of-the-art individual animal recognition results.
基金supported by“The National Key Research and Development Program of China(2018YFC1508804)The Key Scientific and Technology Program of Jilin Province(20170204035SF)+2 种基金The Key Scientific and Technology Research and Development Program of Jilin Province(20200403074SF)The Key Scientific and Technology Research and Development Program of Jilin Province(20180201035SF)National Natural Science Fund for Young Scholars of China(41907238)”.
文摘In this study,the future landslide population amount risk(LPAR)is assessed based on integrated machine learning models(MLMs)and scenario simulation techniques in Shuicheng County,China.Firstly,multiple MLMs were selected and hyperparameters were optimized,and the generated 11 models were crossintegrated to select the best model to calculate landslide susceptibility;by calculating precipitation for different extreme precipitation recurrence periods and combining the susceptibility results to assess the landslide hazard.Using the town as the basic unit,the exposure and vulnerability of the future landslide population under different Shared Socioeconomic Pathways(SSPs)scenarios in each town were assessed,and then combined with the hazard to estimate the LPAR in 2050.The results showed that the integrated model with the optimized random forest model as the combination strategy had the best comprehensive performance in susceptibility assessment.The distribution of hazard classes is similar to susceptibility,and with an increase in precipitation,the low-hazard area and high-hazard decrease and shift to medium-hazard and very high-hazard classes.The high-risk areas for future landslide populations in Shuicheng County are mainly concentrated in the three southwestern towns with high vulnerability,whereas the northern towns of Baohua and Qinglin are at the lowest risk class.The LPAR increased with the intensity of extreme precipitation.The LPAR differs significantly among the SSPs scenarios,with the lowest in the“fossil-fueled development(SSP5)”scenario and the highest in the“regional rivalry(SSP3)”scenario.In summary,the landslide susceptibility model based on integrated machine learning proposed in this study has a high predictive capability.The results of future LPAR assessment can provide theoretical guidance for relevant departments to cope with future socioeconomic development challenges and make corresponding disaster prevention and mitigation plans to prevent landslide risks from a developmental perspective.