An algorithm named InterOpt for optimizing operational parameters is proposed based on interpretable machine learning,and is demonstrated via optimization of shale gas development.InterOpt consists of three parts:a ne...An algorithm named InterOpt for optimizing operational parameters is proposed based on interpretable machine learning,and is demonstrated via optimization of shale gas development.InterOpt consists of three parts:a neural network is used to construct an emulator of the actual drilling and hydraulic fracturing process in the vector space(i.e.,virtual environment);:the Sharpley value method in inter-pretable machine learning is applied to analyzing the impact of geological and operational parameters in each well(i.e.,single well feature impact analysis):and ensemble randomized maximum likelihood(EnRML)is conducted to optimize the operational parameters to comprehensively improve the efficiency of shale gas development and reduce the average cost.In the experiment,InterOpt provides different drilling and fracturing plans for each well according to its specific geological conditions,and finally achieves an average cost reduction of 9.7%for a case study with 104 wells.展开更多
Thermoelectric and thermal materials are essential in achieving carbon neutrality. However, the high cost of lattice thermal conductivity calculations and the limited applicability of classical physical models have le...Thermoelectric and thermal materials are essential in achieving carbon neutrality. However, the high cost of lattice thermal conductivity calculations and the limited applicability of classical physical models have led to the inefficient development of thermoelectric materials. In this study, we proposed a two-stage machine learning framework with physical interpretability incorporating domain knowledge to calculate high/low thermal conductivity rapidly. Specifically, crystal graph convolutional neural network(CGCNN) is constructed to predict the fundamental physical parameters related to lattice thermal conductivity. Based on the above physical parameters, an interpretable machine learning model–sure independence screening and sparsifying operator(SISSO), is trained to predict the lattice thermal conductivity. We have predicted the lattice thermal conductivity of all available materials in the open quantum materials database(OQMD)(https://www.oqmd.org/). The proposed approach guides the next step of searching for materials with ultra-high or ultralow lattice thermal conductivity and promotes the development of new thermal insulation materials and thermoelectric materials.展开更多
Facing the escalating effects of climate change,it is critical to improve the prediction and understanding of the hurricane evacuation decisions made by households in order to enhance emergency management.Current stud...Facing the escalating effects of climate change,it is critical to improve the prediction and understanding of the hurricane evacuation decisions made by households in order to enhance emergency management.Current studies in this area often have relied on psychology-driven linear models,which frequently exhibited limitations in practice.The present study proposed a novel interpretable machine learning approach to predict household-level evacuation decisions by leveraging easily accessible demographic and resource-related predictors,compared to existing models that mainly rely on psychological factors.An enhanced logistic regression model(that is,an interpretable machine learning approach) was developed for accurate predictions by automatically accounting for nonlinearities and interactions(that is,univariate and bivariate threshold effects).Specifically,nonlinearity and interaction detection were enabled by low-depth decision trees,which offer transparent model structure and robustness.A survey dataset collected in the aftermath of Hurricanes Katrina and Rita,two of the most intense tropical storms of the last two decades,was employed to test the new methodology.The findings show that,when predicting the households’ evacuation decisions,the enhanced logistic regression model outperformed previous linear models in terms of both model fit and predictive capability.This outcome suggests that our proposed methodology could provide a new tool and framework for emergency management authorities to improve the prediction of evacuation traffic demands in a timely and accurate manner.展开更多
Major issues currently restricting the use of learning analytics are the lack of interpretability and adaptability of the machine learning models used in this domain.Interpretability makes it easy for the stakeholders...Major issues currently restricting the use of learning analytics are the lack of interpretability and adaptability of the machine learning models used in this domain.Interpretability makes it easy for the stakeholders to understand the working of these models and adaptability makes it easy to use the same model for multiple cohorts and courses in educational institutions.Recently,some models in learning analytics are constructed with the consideration of interpretability but their interpretability is not quantified.However,adaptability is not specifically considered in this domain.This paper presents a new framework based on hybrid statistical fuzzy theory to overcome these limitations.It also provides explainability in the form of rules describing the reasoning behind a particular output.The paper also discusses the system evaluation on a benchmark dataset showing promising results.The measure of explainability,fuzzy index,shows that the model is highly interpretable.This system achieves more than 82%recall in both the classification and the context adaptation stages.展开更多
The present study extracts human-understandable insights from machine learning(ML)-based mesoscale closure in fluid-particle flows via several novel data-driven analysis approaches,i.e.,maximal information coefficient...The present study extracts human-understandable insights from machine learning(ML)-based mesoscale closure in fluid-particle flows via several novel data-driven analysis approaches,i.e.,maximal information coefficient(MIC),interpretable ML,and automated ML.It is previously shown that the solidvolume fraction has the greatest effect on the drag force.The present study aims to quantitativelyinvestigate the influence of flow properties on mesoscale drag correction(H_(d)).The MIC results showstrong correlations between the features(i.e.,slip velocity(u^(*)_(sy))and particle volume fraction(εs))and thelabel H_(d).The interpretable ML analysis confirms this conclusion,and quantifies the contribution of u^(*)_(sy),εs and gas pressure gradient to the model as 71.9%,27.2%and 0.9%,respectively.Automated ML without theneed to select the model structure and hyperparameters is used for modeling,improving the predictionaccuracy over our previous model(Zhu et al.,2020;Ouyang,Zhu,Su,&Luo,2021).展开更多
The identification of factors that may be forcing ecological observations to approach the upper boundary provides insight into potential mechanisms affecting driver-response relationships,and can help inform ecosystem...The identification of factors that may be forcing ecological observations to approach the upper boundary provides insight into potential mechanisms affecting driver-response relationships,and can help inform ecosystem management,but has rarely been explored.In this study,we propose a novel framework integrating quantile regression with interpretable machine learning.In the first stage of the framework,we estimate the upper boundary of a driver-response relationship using quantile regression.Next,we calculate“potentials”of the response variable depending on the driver,which are defined as vertical distances from the estimated upper boundary of the relationship to observations in the driver-response variable scatter plot.Finally,we identify key factors impacting the potential using a machine learning model.We illustrate the necessary steps to implement the framework using the total phosphorus(TP)-Chlorophyll a(CHL)relationship in lakes across the continental US.We found that the nitrogen to phosphorus ratio(N:P),annual average precipitation,total nitrogen(TN),and summer average air temperature were key factors impacting the potential of CHL depending on TP.We further revealed important implications of our findings for lake eutrophication management.The important role of N:P and TN on the potential highlights the co-limitation of phosphorus and nitrogen and indicates the need for dual nutrient criteria.Future wetter and/or warmer climate scenarios can decrease the potential which may reduce the efficacy of lake eutrophication management.The novel framework advances the application of quantile regression to identify factors driving observations to approach the upper boundary of driver-response relationships.展开更多
The potential for reducing greenhouse gas(GHG)emissions and energy consumption in wastewater treatment can be realized through intelligent control,with machine learning(ML)and multimodality emerging as a promising sol...The potential for reducing greenhouse gas(GHG)emissions and energy consumption in wastewater treatment can be realized through intelligent control,with machine learning(ML)and multimodality emerging as a promising solution.Here,we introduce an ML technique based on multimodal strategies,focusing specifically on intelligent aeration control in wastewater treatment plants(WWTPs).The generalization of the multimodal strategy is demonstrated on eight ML models.The results demonstrate that this multimodal strategy significantly enhances model indicators for ML in environmental science and the efficiency of aeration control,exhibiting exceptional performance and interpretability.Integrating random forest with visual models achieves the highest accuracy in forecasting aeration quantity in multimodal models,with a mean absolute percentage error of 4.4%and a coefficient of determination of 0.948.Practical testing in a full-scale plant reveals that the multimodal model can reduce operation costs by 19.8%compared to traditional fuzzy control methods.The potential application of these strategies in critical water science domains is discussed.To foster accessibility and promote widespread adoption,the multimodal ML models are freely available on GitHub,thereby eliminating technical barriers and encouraging the application of artificial intelligence in urban wastewater treatment.展开更多
Artificial intelligence and machine learning have been increasingly applied for prediction in agricultural science.However,many models are typically black boxes,meaning we cannot explain what the models learned from t...Artificial intelligence and machine learning have been increasingly applied for prediction in agricultural science.However,many models are typically black boxes,meaning we cannot explain what the models learned from the data and the reasons behind predictions.To address this issue,I introduce an emerging subdomain of artificial intelligence,explainable artificial intelligence(XAI),and associated toolkits,interpretable machine learning.This study demonstrates the usefulness of several methods by applying them to an openly available dataset.The dataset includes the no-tillage effect on crop yield relative to conventional tillage and soil,climate,and management variables.Data analysis discovered that no-tillage management can increase maize crop yield where yield in conventional tillage is<5000 kg/ha and the maximum temperature is higher than 32°.These methods are useful to answer(i)which variables are important for prediction in regression/classification,(ii)which variable interactions are important for prediction,(iii)how important variables and their interactions are associated with the response variable,(iv)what are the reasons underlying a predicted value for a certain instance,and(v)whether different machine learning algorithms offer the same answer to these questions.I argue that the goodness of model fit is overly evaluated with model performance measures in the current practice,while these questions are unanswered.XAI and interpretable machine learning can enhance trust and explainability in AI.展开更多
Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and e...Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and efficient defense mechanisms against adversarial attacks. Most of the existing methods are just stopgaps for specific adversarial samples. The main obstacle is that how adversarial samples fool the deep learning models is still unclear. The underlying working mechanism of adversarial samples has not been well explored, and it is the bottleneck of adversarial attack defense. In this paper, we build a causal model to interpret the generation and performance of adversarial samples. The self-attention/transformer is adopted as a powerful tool in this causal model. Compared to existing methods, causality enables us to analyze adversarial samples more naturally and intrinsically. Based on this causal model, the working mechanism of adversarial samples is revealed, and instructive analysis is provided. Then, we propose simple and effective adversarial sample detection and recognition methods according to the revealed working mechanism. The causal insights enable us to detect and recognize adversarial samples without any extra model or training. Extensive experiments are conducted to demonstrate the effectiveness of the proposed methods. Our methods outperform the state-of-the-art defense methods under various adversarial attacks.展开更多
Geometric and working condition uncertainties are inevitable in a compressor,deviating the compressor performance from the design value.It’s necessary to explore the influence of geometric uncertainty on performance ...Geometric and working condition uncertainties are inevitable in a compressor,deviating the compressor performance from the design value.It’s necessary to explore the influence of geometric uncertainty on performance deviation under different working conditions.In this paper,the geometric uncertainty influences at near stall,peak efficiency,and near choke conditions under design speed and low speed are investigated.Firstly,manufacturing geometric uncertainties are analyzed.Next,correlation models between geometry and performance under different working conditions are constructed based on a neural network.Then the Shapley additive explanations(SHAP)method is introduced to explain the output of the neural network.Results show that under real manufacturing uncertainty,the efficiency deviation range is small under the near stall and peak efficiency conditions.However,under the near choke conditions,efficiency is highly sensitive to flow capacity changes caused by geometric uncertainty,leading to a significant increase in the efficiency deviation amplitude,up to a magnitude of-3.6%.Moreover,the tip leading-edge radius and tip thickness are two main factors affecting efficiency deviation.Therefore,to reduce efficiency uncertainty,a compressor should be avoided working near the choke condition,and the tolerances of the tip leading-edge radius and tip thickness should be strictly controlled.展开更多
文摘An algorithm named InterOpt for optimizing operational parameters is proposed based on interpretable machine learning,and is demonstrated via optimization of shale gas development.InterOpt consists of three parts:a neural network is used to construct an emulator of the actual drilling and hydraulic fracturing process in the vector space(i.e.,virtual environment);:the Sharpley value method in inter-pretable machine learning is applied to analyzing the impact of geological and operational parameters in each well(i.e.,single well feature impact analysis):and ensemble randomized maximum likelihood(EnRML)is conducted to optimize the operational parameters to comprehensively improve the efficiency of shale gas development and reduce the average cost.In the experiment,InterOpt provides different drilling and fracturing plans for each well according to its specific geological conditions,and finally achieves an average cost reduction of 9.7%for a case study with 104 wells.
基金support of the National Natural Science Foundation of China(Grant Nos.12104356 and52250191)China Postdoctoral Science Foundation(Grant No.2022M712552)+2 种基金the Opening Project of Shanghai Key Laboratory of Special Artificial Microstructure Materials and Technology(Grant No.Ammt2022B-1)the Fundamental Research Funds for the Central Universitiessupport by HPC Platform,Xi’an Jiaotong University。
文摘Thermoelectric and thermal materials are essential in achieving carbon neutrality. However, the high cost of lattice thermal conductivity calculations and the limited applicability of classical physical models have led to the inefficient development of thermoelectric materials. In this study, we proposed a two-stage machine learning framework with physical interpretability incorporating domain knowledge to calculate high/low thermal conductivity rapidly. Specifically, crystal graph convolutional neural network(CGCNN) is constructed to predict the fundamental physical parameters related to lattice thermal conductivity. Based on the above physical parameters, an interpretable machine learning model–sure independence screening and sparsifying operator(SISSO), is trained to predict the lattice thermal conductivity. We have predicted the lattice thermal conductivity of all available materials in the open quantum materials database(OQMD)(https://www.oqmd.org/). The proposed approach guides the next step of searching for materials with ultra-high or ultralow lattice thermal conductivity and promotes the development of new thermal insulation materials and thermoelectric materials.
基金supported by the National Science Foundation under Grant Nos.2303578,2303579, 05 27699,0838654,and 1212790by an Early-Career Research Fellowship from the Gulf Research Program of the National Academies of Sciences,Engineering,and Medicine
文摘Facing the escalating effects of climate change,it is critical to improve the prediction and understanding of the hurricane evacuation decisions made by households in order to enhance emergency management.Current studies in this area often have relied on psychology-driven linear models,which frequently exhibited limitations in practice.The present study proposed a novel interpretable machine learning approach to predict household-level evacuation decisions by leveraging easily accessible demographic and resource-related predictors,compared to existing models that mainly rely on psychological factors.An enhanced logistic regression model(that is,an interpretable machine learning approach) was developed for accurate predictions by automatically accounting for nonlinearities and interactions(that is,univariate and bivariate threshold effects).Specifically,nonlinearity and interaction detection were enabled by low-depth decision trees,which offer transparent model structure and robustness.A survey dataset collected in the aftermath of Hurricanes Katrina and Rita,two of the most intense tropical storms of the last two decades,was employed to test the new methodology.The findings show that,when predicting the households’ evacuation decisions,the enhanced logistic regression model outperformed previous linear models in terms of both model fit and predictive capability.This outcome suggests that our proposed methodology could provide a new tool and framework for emergency management authorities to improve the prediction of evacuation traffic demands in a timely and accurate manner.
文摘Major issues currently restricting the use of learning analytics are the lack of interpretability and adaptability of the machine learning models used in this domain.Interpretability makes it easy for the stakeholders to understand the working of these models and adaptability makes it easy to use the same model for multiple cohorts and courses in educational institutions.Recently,some models in learning analytics are constructed with the consideration of interpretability but their interpretability is not quantified.However,adaptability is not specifically considered in this domain.This paper presents a new framework based on hybrid statistical fuzzy theory to overcome these limitations.It also provides explainability in the form of rules describing the reasoning behind a particular output.The paper also discusses the system evaluation on a benchmark dataset showing promising results.The measure of explainability,fuzzy index,shows that the model is highly interpretable.This system achieves more than 82%recall in both the classification and the context adaptation stages.
基金This work was supported by the National Natural ScienceFoundation of China(No.U1862201,91834303 and 22208208)the China Postdoctoral Science Foundation(No.2022M712056)the China National Postdoctoral Program for Innovative Talents(No.BX20220205).
文摘The present study extracts human-understandable insights from machine learning(ML)-based mesoscale closure in fluid-particle flows via several novel data-driven analysis approaches,i.e.,maximal information coefficient(MIC),interpretable ML,and automated ML.It is previously shown that the solidvolume fraction has the greatest effect on the drag force.The present study aims to quantitativelyinvestigate the influence of flow properties on mesoscale drag correction(H_(d)).The MIC results showstrong correlations between the features(i.e.,slip velocity(u^(*)_(sy))and particle volume fraction(εs))and thelabel H_(d).The interpretable ML analysis confirms this conclusion,and quantifies the contribution of u^(*)_(sy),εs and gas pressure gradient to the model as 71.9%,27.2%and 0.9%,respectively.Automated ML without theneed to select the model structure and hyperparameters is used for modeling,improving the predictionaccuracy over our previous model(Zhu et al.,2020;Ouyang,Zhu,Su,&Luo,2021).
基金This research was funded by the National Natural Science Foundation of China(Nos.71761147001 and 42030707)the International Partnership Program by the Chinese Academy of Sciences(No.121311KYSB20190029)+2 种基金the Fundamental Research Fund for the Central Universities(No.20720210083)the National Science Foundation(Nos.EF-1638679,EF-1638554,EF-1638539,and EF-1638550)Any use of trade,firm,or product names is for descriptive purposes only and does not imply endorsement by the US Government.
文摘The identification of factors that may be forcing ecological observations to approach the upper boundary provides insight into potential mechanisms affecting driver-response relationships,and can help inform ecosystem management,but has rarely been explored.In this study,we propose a novel framework integrating quantile regression with interpretable machine learning.In the first stage of the framework,we estimate the upper boundary of a driver-response relationship using quantile regression.Next,we calculate“potentials”of the response variable depending on the driver,which are defined as vertical distances from the estimated upper boundary of the relationship to observations in the driver-response variable scatter plot.Finally,we identify key factors impacting the potential using a machine learning model.We illustrate the necessary steps to implement the framework using the total phosphorus(TP)-Chlorophyll a(CHL)relationship in lakes across the continental US.We found that the nitrogen to phosphorus ratio(N:P),annual average precipitation,total nitrogen(TN),and summer average air temperature were key factors impacting the potential of CHL depending on TP.We further revealed important implications of our findings for lake eutrophication management.The important role of N:P and TN on the potential highlights the co-limitation of phosphorus and nitrogen and indicates the need for dual nutrient criteria.Future wetter and/or warmer climate scenarios can decrease the potential which may reduce the efficacy of lake eutrophication management.The novel framework advances the application of quantile regression to identify factors driving observations to approach the upper boundary of driver-response relationships.
基金the financial support by the National Natural Science Foundation of China(52230004 and 52293445)the Key Research and Development Project of Shandong Province(2020CXGC011202-005)the Shenzhen Science and Technology Program(KCXFZ20211020163404007 and KQTD20190929172630447).
文摘The potential for reducing greenhouse gas(GHG)emissions and energy consumption in wastewater treatment can be realized through intelligent control,with machine learning(ML)and multimodality emerging as a promising solution.Here,we introduce an ML technique based on multimodal strategies,focusing specifically on intelligent aeration control in wastewater treatment plants(WWTPs).The generalization of the multimodal strategy is demonstrated on eight ML models.The results demonstrate that this multimodal strategy significantly enhances model indicators for ML in environmental science and the efficiency of aeration control,exhibiting exceptional performance and interpretability.Integrating random forest with visual models achieves the highest accuracy in forecasting aeration quantity in multimodal models,with a mean absolute percentage error of 4.4%and a coefficient of determination of 0.948.Practical testing in a full-scale plant reveals that the multimodal model can reduce operation costs by 19.8%compared to traditional fuzzy control methods.The potential application of these strategies in critical water science domains is discussed.To foster accessibility and promote widespread adoption,the multimodal ML models are freely available on GitHub,thereby eliminating technical barriers and encouraging the application of artificial intelligence in urban wastewater treatment.
基金supported by ZALF Integrated Priority Project(IPP2022)“Co-designing smart,resilient,sustainable agricultural landscapes with cross-scale diversification”,Bundesministerium für Bildung und Forschung(BMBF)Land-Innovation-Lausitz project“Landschaftsinnovationen in der Lausitz für eine klimaangepasste Bioökonomie und naturnahen Bioökonomie-Tourismus”(03WIR3017A)BMBF project“Multi-modale Datenintegration,domänenspezifische Methoden und KI zur Stärkung der Datenkompetenz in der Agrarforschung”(16DKWN089)Brandenburgische Technische Universität Cottbus-Senftenberg GRS cluster project“Integrated analysis of Multifunctional Fruit production landscapes to promote ecosystem services and sustainable land-use under climate change”(GRS2018/19).
文摘Artificial intelligence and machine learning have been increasingly applied for prediction in agricultural science.However,many models are typically black boxes,meaning we cannot explain what the models learned from the data and the reasons behind predictions.To address this issue,I introduce an emerging subdomain of artificial intelligence,explainable artificial intelligence(XAI),and associated toolkits,interpretable machine learning.This study demonstrates the usefulness of several methods by applying them to an openly available dataset.The dataset includes the no-tillage effect on crop yield relative to conventional tillage and soil,climate,and management variables.Data analysis discovered that no-tillage management can increase maize crop yield where yield in conventional tillage is<5000 kg/ha and the maximum temperature is higher than 32°.These methods are useful to answer(i)which variables are important for prediction in regression/classification,(ii)which variable interactions are important for prediction,(iii)how important variables and their interactions are associated with the response variable,(iv)what are the reasons underlying a predicted value for a certain instance,and(v)whether different machine learning algorithms offer the same answer to these questions.I argue that the goodness of model fit is overly evaluated with model performance measures in the current practice,while these questions are unanswered.XAI and interpretable machine learning can enhance trust and explainability in AI.
基金supported by National Key Research and Development Program of China(No.2020AAA0140002)Natural Science Foundation of China(Nos.U1836217,62076240,62006225,61906199,62071468,62176025 and U21B200389)the CAAI-Huawei Mind-spore Open Fund.
文摘Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and efficient defense mechanisms against adversarial attacks. Most of the existing methods are just stopgaps for specific adversarial samples. The main obstacle is that how adversarial samples fool the deep learning models is still unclear. The underlying working mechanism of adversarial samples has not been well explored, and it is the bottleneck of adversarial attack defense. In this paper, we build a causal model to interpret the generation and performance of adversarial samples. The self-attention/transformer is adopted as a powerful tool in this causal model. Compared to existing methods, causality enables us to analyze adversarial samples more naturally and intrinsically. Based on this causal model, the working mechanism of adversarial samples is revealed, and instructive analysis is provided. Then, we propose simple and effective adversarial sample detection and recognition methods according to the revealed working mechanism. The causal insights enable us to detect and recognize adversarial samples without any extra model or training. Extensive experiments are conducted to demonstrate the effectiveness of the proposed methods. Our methods outperform the state-of-the-art defense methods under various adversarial attacks.
基金supported by the National Science and Technology Major Project,China(No.2017-II-0004-0016)。
文摘Geometric and working condition uncertainties are inevitable in a compressor,deviating the compressor performance from the design value.It’s necessary to explore the influence of geometric uncertainty on performance deviation under different working conditions.In this paper,the geometric uncertainty influences at near stall,peak efficiency,and near choke conditions under design speed and low speed are investigated.Firstly,manufacturing geometric uncertainties are analyzed.Next,correlation models between geometry and performance under different working conditions are constructed based on a neural network.Then the Shapley additive explanations(SHAP)method is introduced to explain the output of the neural network.Results show that under real manufacturing uncertainty,the efficiency deviation range is small under the near stall and peak efficiency conditions.However,under the near choke conditions,efficiency is highly sensitive to flow capacity changes caused by geometric uncertainty,leading to a significant increase in the efficiency deviation amplitude,up to a magnitude of-3.6%.Moreover,the tip leading-edge radius and tip thickness are two main factors affecting efficiency deviation.Therefore,to reduce efficiency uncertainty,a compressor should be avoided working near the choke condition,and the tolerances of the tip leading-edge radius and tip thickness should be strictly controlled.