This study introduces and evaluates a novel artificial hummingbird algorithm-optimised boosted tree(AHAboosted)model for predicting the dynamic modulus(E*)of hot mix asphalt concrete.Using a substantial dataset from N...This study introduces and evaluates a novel artificial hummingbird algorithm-optimised boosted tree(AHAboosted)model for predicting the dynamic modulus(E*)of hot mix asphalt concrete.Using a substantial dataset from NCHRP Report-547,the model was trained and rigorously tested.Performance metrics,specifically RMSE,MAE,and R2,were employed to assess the model's predictive accuracy,robustness,and generalisability.When benchmarked against well-established models like support vector machines(SVM)and gaussian process regression(GPR),the AHA-boosted model demonstrated enhanced performance.It achieved R2 values of 0.997 in training and 0.974 in testing,using the traditional Witczak NCHRP 1-40D model inputs.Incorporating features such as test temperature,frequency,and asphalt content led to a 1.23%increase in the test R2,signifying an improvement in the model's accuracy.The study also explored feature importance and sensitivity through SHAP and permutation importance plots,highlighting binder complex modulus|G*|as a key predictor.Although the AHA-boosted model shows promise,a slight decrease in R2 from training to testing indicates a need for further validation.Overall,this study confirms the AHA-boosted model as a highly accurate and robust tool for predicting the dynamic modulus of hot mix asphalt concrete,making it a valuable asset for pavement engineering.展开更多
Recently,machine learning-based technologies have been developed to automate the classification of wafer map defect patterns during semiconductormanufacturing.The existing approaches used in the wafer map pattern clas...Recently,machine learning-based technologies have been developed to automate the classification of wafer map defect patterns during semiconductormanufacturing.The existing approaches used in the wafer map pattern classification include directly learning the image through a convolution neural network and applying the ensemble method after extracting image features.This study aims to classify wafer map defects more effectively and derive robust algorithms even for datasets with insufficient defect patterns.First,the number of defects during the actual process may be limited.Therefore,insufficient data are generated using convolutional auto-encoder(CAE),and the expanded data are verified using the evaluation technique of structural similarity index measure(SSIM).After extracting handcrafted features,a boosted stacking ensemble model that integrates the four base-level classifiers with the extreme gradient boosting classifier as a meta-level classifier is designed and built for training the model based on the expanded data for final prediction.Since the proposed algorithm shows better performance than those of existing ensemble classifiers even for insufficient defect patterns,the results of this study will contribute to improving the product quality and yield of the actual semiconductor manufacturing process.展开更多
Mobile Ad Hoc Network(MANET)is an infrastructure-less network that is comprised of a set of nodes that move randomly.In MANET,the overall performance is improved through multipath multicast routing to achieve the qual...Mobile Ad Hoc Network(MANET)is an infrastructure-less network that is comprised of a set of nodes that move randomly.In MANET,the overall performance is improved through multipath multicast routing to achieve the quality of service(quality of service).In this,different nodes are involved in the information data collection and transmission to the destination nodes in the network.The different nodes are combined and presented to achieve energy-efficient data transmission and classification of the nodes.The route identification and routing are established based on the data broadcast by the network nodes.In transmitting the data packet,evaluating the data delivery ratio is necessary to achieve optimal data transmission in the network.Furthermore,energy consumption and overhead are considered essential factors for the effective data transmission rate and better data delivery rate.In this paper,a Gradient-Based Energy Optimization model(GBEOM)for the route in MANET is proposed to achieve an improved data delivery rate.Initially,the Weighted Multi-objective Cluster-based Spider Monkey Load Balancing(WMC-SMLB)technique is utilized for obtaining energy efficiency and load balancing routing.The WMC algorithm is applied to perform an efficient node clustering process from the considered mobile nodes in MANET.Load balancing efficiency is improved with a higher data delivery ratio and minimum routing overhead based on the residual energy and bandwidth estimation.Next,the Gradient Boosted Multinomial ID3 Classification algorithm is applied to improve the performance of multipath multicast routing in MANET with minimal energy consumption and higher load balancing efficiency.The proposed GBEOM exhibits∼4%improved performance in MANET routing.展开更多
Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Re...Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Regression Tree (BRT) can address Big Data challenges to drive decision making. The challenge of this study is lack of interoperability since the data, a collection of GIS shapefiles, remotely sensed imagery, and aggregated and interpolated spatio-temporal information, are stored in monolithic hardware components. For the modelling process, it was necessary to create one common input file. By merging the data sources together, a structured but noisy input file, showing inconsistencies and redundancies, was created. Here, it is shown that BRT can process different data granularities, heterogeneous data and missingness. In particular, BRT has the advantage of dealing with missing data by default by allowing a split on whether or not a value is missing as well as what the value is. Most importantly, the BRT offers a wide range of possibilities regarding the interpretation of results and variable selection is automatically performed by considering how frequently a variable is used to define a split in the tree. A comparison with two similar regression models (Random Forests and Least Absolute Shrinkage and Selection Operator, LASSO) shows that BRT outperforms these in this instance. BRT can also be a starting point for sophisticated hierarchical modelling in real world scenarios. For example, a single or ensemble approach of BRT could be tested with existing models in order to improve results for a wide range of data-driven decisions and applications.展开更多
This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock pric...This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock prices, we construct models which relate the market indicators to a trading decision directly. Furthermore, unlike a reversal trading system or a binary system of buy and sell, we allow three modes of trades, namely, buy, sell or stand by, and the stand-by case is important as it caters to the market conditions where a model does not produce a strong signal of buy or sell. Linear trading models are firstly developed with the scoring technique which weights higher on successful indicators, as well as with the Least Squares technique which tries to match the past perfect trades with its weights. The linear models are then made adaptive by using the forgetting factor to address market changes. Because stock markets could be highly nonlinear sometimes, the Random Forest is adopted as a nonlinear trading model, and improved with Gradient Boosting to form a new technique—Gradient Boosted Random Forest. All the models are trained and evaluated on nine stocks and one index, and statistical tests such as randomness, linear and nonlinear correlations are conducted on the data to check the statistical significance of the inputs and their relation with the output before a model is trained. Our empirical results show that the proposed trading methods are able to generate excess returns compared with the buy-and-hold strategy.展开更多
Dry streams filled with sand,and sun-baked soil and drought resistant mopane trees characterize vast expanse of land in the rural Chiredzi District,more than 600 km southeast of Zimbabwe’s capital Harare.Topless and ...Dry streams filled with sand,and sun-baked soil and drought resistant mopane trees characterize vast expanse of land in the rural Chiredzi District,more than 600 km southeast of Zimbabwe’s capital Harare.Topless and barefooted children make a beeline waving at modern non-governmental organization vehicles which frequent the district.展开更多
Plant epidemics are often associated with weather-related variables.It is difficult to identify weather-related predictors for models predicting plant epidemics.In the article by Shah et al.,to predict Fusarium head b...Plant epidemics are often associated with weather-related variables.It is difficult to identify weather-related predictors for models predicting plant epidemics.In the article by Shah et al.,to predict Fusarium head blight(FHB)epidemics of wheat,they explored a functional approach using scalar-on-function regression to model a binary outcome(FHB epidemic or non-epidemic)with respect to weather time series spanning 140 days relative to anthesis.The scalar-on-function models fit the data better than previously described logistic regression models.In this work,given the same dataset and models,we attempt to reproduce the article by Shah et al.using a different approach,boosted regression trees.After fitting,the classification accuracy and model statistics are surprisingly good.展开更多
Horseshoe bats host numerous SARS-related coronaviruses without overt disease signs.Bat intestinal organoids,a unique model of bat intestinal epithelium,allow direct comparison with human intestinal organoids.We sough...Horseshoe bats host numerous SARS-related coronaviruses without overt disease signs.Bat intestinal organoids,a unique model of bat intestinal epithelium,allow direct comparison with human intestinal organoids.We sought to unravel the cellular mechanism(s)underlying bat tolerance of coronaviruses by comparing the innate immunity in bat and human organoids.We optimized the culture medium,which enabled a consecutive passage of bat intestinal organoids for over one year.Basal expression levels of IFNs and IFN-stimulated genes were higher in bat organoids than in their human counterparts.Notably,bat organoids mounted a more rapid,robust and prolonged antiviral defense than human organoids upon Poly(I:C)stimulation.TLR3 and RLR might be the conserved pathways mediating antiviral response in bat and human intestinal organoids.The susceptibility of bat organoids to a bat coronavirus CoV-HKU4,but resistance to EV-71,an enterovirus of exclusive human origin,indicated that bat organoids adequately recapitulated the authentic susceptibility of bats to certain viruses.Importantly,TLR3/RLR inhibition in bat organoids significantly boosted viral growth in the early phase after SARS-CoV-2 or CoV-HKU4 infection.Collectively,the higher basal expression of antiviral genes,especially more rapid and robust induction of innate immune response,empowered bat cells to curtail virus propagation in the early phase of infection.展开更多
Hydraulic fracturing is an effective technology for hydrocarbon extraction from unconventional shale and tight gas reservoirs.A potential risk of hydraulic fracturing is the upward migration of stray gas from the deep...Hydraulic fracturing is an effective technology for hydrocarbon extraction from unconventional shale and tight gas reservoirs.A potential risk of hydraulic fracturing is the upward migration of stray gas from the deep subsurface to shallow aquifers.The stray gas can dissolve in groundwater leading to chemical and biological reactions,which could negatively affect groundwater quality and contribute to atmospheric emissions.The knowledge oflight hydrocarbon solubility in the aqueous environment is essential for the numerical modelling offlow and transport in the subsurface.Herein,we compiled a database containing 2129experimental data of methane,ethane,and propane solubility in pure water and various electrolyte solutions over wide ranges of operating temperature and pressure.Two machine learning algorithms,namely regression tree(RT)and boosted regression tree(BRT)tuned with a Bayesian optimization algorithm(BO)were employed to determine the solubility of gases.The predictions were compared with the experimental data as well as four well-established thermodynamic models.Our analysis shows that the BRT-BO is sufficiently accurate,and the predicted values agree well with those obtained from the thermodynamic models.The coefficient of determination(R2)between experimental and predicted values is 0.99 and the mean squared error(MSE)is 9.97×10^(-8).The leverage statistical approach further confirmed the validity of the model developed.展开更多
文摘This study introduces and evaluates a novel artificial hummingbird algorithm-optimised boosted tree(AHAboosted)model for predicting the dynamic modulus(E*)of hot mix asphalt concrete.Using a substantial dataset from NCHRP Report-547,the model was trained and rigorously tested.Performance metrics,specifically RMSE,MAE,and R2,were employed to assess the model's predictive accuracy,robustness,and generalisability.When benchmarked against well-established models like support vector machines(SVM)and gaussian process regression(GPR),the AHA-boosted model demonstrated enhanced performance.It achieved R2 values of 0.997 in training and 0.974 in testing,using the traditional Witczak NCHRP 1-40D model inputs.Incorporating features such as test temperature,frequency,and asphalt content led to a 1.23%increase in the test R2,signifying an improvement in the model's accuracy.The study also explored feature importance and sensitivity through SHAP and permutation importance plots,highlighting binder complex modulus|G*|as a key predictor.Although the AHA-boosted model shows promise,a slight decrease in R2 from training to testing indicates a need for further validation.Overall,this study confirms the AHA-boosted model as a highly accurate and robust tool for predicting the dynamic modulus of hot mix asphalt concrete,making it a valuable asset for pavement engineering.
基金the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.NRF-2021R1A5A8033165)the“Human Resources Program in Energy Technology”of the Korea Institute of Energy Technology Evaluation and Planning(KETEP)and was granted financial resources from the Ministry of Trade,Industry&Energy,Republic of Korea(No.20214000000200).
文摘Recently,machine learning-based technologies have been developed to automate the classification of wafer map defect patterns during semiconductormanufacturing.The existing approaches used in the wafer map pattern classification include directly learning the image through a convolution neural network and applying the ensemble method after extracting image features.This study aims to classify wafer map defects more effectively and derive robust algorithms even for datasets with insufficient defect patterns.First,the number of defects during the actual process may be limited.Therefore,insufficient data are generated using convolutional auto-encoder(CAE),and the expanded data are verified using the evaluation technique of structural similarity index measure(SSIM).After extracting handcrafted features,a boosted stacking ensemble model that integrates the four base-level classifiers with the extreme gradient boosting classifier as a meta-level classifier is designed and built for training the model based on the expanded data for final prediction.Since the proposed algorithm shows better performance than those of existing ensemble classifiers even for insufficient defect patterns,the results of this study will contribute to improving the product quality and yield of the actual semiconductor manufacturing process.
基金Deanship of Scientific Research at Umm Al-Qura University,Grant Code,funds this research:22UQU4281768DSR08。
文摘Mobile Ad Hoc Network(MANET)is an infrastructure-less network that is comprised of a set of nodes that move randomly.In MANET,the overall performance is improved through multipath multicast routing to achieve the quality of service(quality of service).In this,different nodes are involved in the information data collection and transmission to the destination nodes in the network.The different nodes are combined and presented to achieve energy-efficient data transmission and classification of the nodes.The route identification and routing are established based on the data broadcast by the network nodes.In transmitting the data packet,evaluating the data delivery ratio is necessary to achieve optimal data transmission in the network.Furthermore,energy consumption and overhead are considered essential factors for the effective data transmission rate and better data delivery rate.In this paper,a Gradient-Based Energy Optimization model(GBEOM)for the route in MANET is proposed to achieve an improved data delivery rate.Initially,the Weighted Multi-objective Cluster-based Spider Monkey Load Balancing(WMC-SMLB)technique is utilized for obtaining energy efficiency and load balancing routing.The WMC algorithm is applied to perform an efficient node clustering process from the considered mobile nodes in MANET.Load balancing efficiency is improved with a higher data delivery ratio and minimum routing overhead based on the residual energy and bandwidth estimation.Next,the Gradient Boosted Multinomial ID3 Classification algorithm is applied to improve the performance of multipath multicast routing in MANET with minimal energy consumption and higher load balancing efficiency.The proposed GBEOM exhibits∼4%improved performance in MANET routing.
文摘Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Regression Tree (BRT) can address Big Data challenges to drive decision making. The challenge of this study is lack of interoperability since the data, a collection of GIS shapefiles, remotely sensed imagery, and aggregated and interpolated spatio-temporal information, are stored in monolithic hardware components. For the modelling process, it was necessary to create one common input file. By merging the data sources together, a structured but noisy input file, showing inconsistencies and redundancies, was created. Here, it is shown that BRT can process different data granularities, heterogeneous data and missingness. In particular, BRT has the advantage of dealing with missing data by default by allowing a split on whether or not a value is missing as well as what the value is. Most importantly, the BRT offers a wide range of possibilities regarding the interpretation of results and variable selection is automatically performed by considering how frequently a variable is used to define a split in the tree. A comparison with two similar regression models (Random Forests and Least Absolute Shrinkage and Selection Operator, LASSO) shows that BRT outperforms these in this instance. BRT can also be a starting point for sophisticated hierarchical modelling in real world scenarios. For example, a single or ensemble approach of BRT could be tested with existing models in order to improve results for a wide range of data-driven decisions and applications.
文摘This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock prices, we construct models which relate the market indicators to a trading decision directly. Furthermore, unlike a reversal trading system or a binary system of buy and sell, we allow three modes of trades, namely, buy, sell or stand by, and the stand-by case is important as it caters to the market conditions where a model does not produce a strong signal of buy or sell. Linear trading models are firstly developed with the scoring technique which weights higher on successful indicators, as well as with the Least Squares technique which tries to match the past perfect trades with its weights. The linear models are then made adaptive by using the forgetting factor to address market changes. Because stock markets could be highly nonlinear sometimes, the Random Forest is adopted as a nonlinear trading model, and improved with Gradient Boosting to form a new technique—Gradient Boosted Random Forest. All the models are trained and evaluated on nine stocks and one index, and statistical tests such as randomness, linear and nonlinear correlations are conducted on the data to check the statistical significance of the inputs and their relation with the output before a model is trained. Our empirical results show that the proposed trading methods are able to generate excess returns compared with the buy-and-hold strategy.
文摘Dry streams filled with sand,and sun-baked soil and drought resistant mopane trees characterize vast expanse of land in the rural Chiredzi District,more than 600 km southeast of Zimbabwe’s capital Harare.Topless and barefooted children make a beeline waving at modern non-governmental organization vehicles which frequent the district.
基金supported by the National Natural Science Foundation of China(Grant No.12071173 and 12171192)Huaian Key Laboratory for Infectious Diseases Control and Prevention(HAP201704).
文摘Plant epidemics are often associated with weather-related variables.It is difficult to identify weather-related predictors for models predicting plant epidemics.In the article by Shah et al.,to predict Fusarium head blight(FHB)epidemics of wheat,they explored a functional approach using scalar-on-function regression to model a binary outcome(FHB epidemic or non-epidemic)with respect to weather time series spanning 140 days relative to anthesis.The scalar-on-function models fit the data better than previously described logistic regression models.In this work,given the same dataset and models,we attempt to reproduce the article by Shah et al.using a different approach,boosted regression trees.After fitting,the classification accuracy and model statistics are surprisingly good.
基金supported by funding from the Health and Medical Research Fund(HMRF,17161272 and 19180392)of the Food and Health Bureau of the HKSAR government to J.Z.General Research Fund(GRF,17105420)+1 种基金Collaborative Research Fund(CRF,C7042-21G)Theme-based Research Scheme(TbRS,T11-709/21-N)of the Research Grants Council of HKSAR government to J.Z.,Health@InnoHK,Innovation and Technology Commission,HKSAR Government to K.Y.Y.
文摘Horseshoe bats host numerous SARS-related coronaviruses without overt disease signs.Bat intestinal organoids,a unique model of bat intestinal epithelium,allow direct comparison with human intestinal organoids.We sought to unravel the cellular mechanism(s)underlying bat tolerance of coronaviruses by comparing the innate immunity in bat and human organoids.We optimized the culture medium,which enabled a consecutive passage of bat intestinal organoids for over one year.Basal expression levels of IFNs and IFN-stimulated genes were higher in bat organoids than in their human counterparts.Notably,bat organoids mounted a more rapid,robust and prolonged antiviral defense than human organoids upon Poly(I:C)stimulation.TLR3 and RLR might be the conserved pathways mediating antiviral response in bat and human intestinal organoids.The susceptibility of bat organoids to a bat coronavirus CoV-HKU4,but resistance to EV-71,an enterovirus of exclusive human origin,indicated that bat organoids adequately recapitulated the authentic susceptibility of bats to certain viruses.Importantly,TLR3/RLR inhibition in bat organoids significantly boosted viral growth in the early phase after SARS-CoV-2 or CoV-HKU4 infection.Collectively,the higher basal expression of antiviral genes,especially more rapid and robust induction of innate immune response,empowered bat cells to curtail virus propagation in the early phase of infection.
文摘Hydraulic fracturing is an effective technology for hydrocarbon extraction from unconventional shale and tight gas reservoirs.A potential risk of hydraulic fracturing is the upward migration of stray gas from the deep subsurface to shallow aquifers.The stray gas can dissolve in groundwater leading to chemical and biological reactions,which could negatively affect groundwater quality and contribute to atmospheric emissions.The knowledge oflight hydrocarbon solubility in the aqueous environment is essential for the numerical modelling offlow and transport in the subsurface.Herein,we compiled a database containing 2129experimental data of methane,ethane,and propane solubility in pure water and various electrolyte solutions over wide ranges of operating temperature and pressure.Two machine learning algorithms,namely regression tree(RT)and boosted regression tree(BRT)tuned with a Bayesian optimization algorithm(BO)were employed to determine the solubility of gases.The predictions were compared with the experimental data as well as four well-established thermodynamic models.Our analysis shows that the BRT-BO is sufficiently accurate,and the predicted values agree well with those obtained from the thermodynamic models.The coefficient of determination(R2)between experimental and predicted values is 0.99 and the mean squared error(MSE)is 9.97×10^(-8).The leverage statistical approach further confirmed the validity of the model developed.