Dear Editor,Scene understanding is an essential task in computer vision.The ultimate objective of scene understanding is to instruct computers to understand and reason about the scenes as humans do.Parallel vision is ...Dear Editor,Scene understanding is an essential task in computer vision.The ultimate objective of scene understanding is to instruct computers to understand and reason about the scenes as humans do.Parallel vision is a research framework that unifies the explanation and perception of dynamic and complex scenes.展开更多
Boosting algorithms have been widely utilized in the development of landslide susceptibility mapping(LSM)studies.However,these algorithms possess distinct computational strategies and hyperparameters,making it challen...Boosting algorithms have been widely utilized in the development of landslide susceptibility mapping(LSM)studies.However,these algorithms possess distinct computational strategies and hyperparameters,making it challenging to propose an ideal LSM model.To investigate the impact of different boosting algorithms and hyperparameter optimization algorithms on LSM,this study constructed a geospatial database comprising 12 conditioning factors,such as elevation,stratum,and annual average rainfall.The XGBoost(XGB),LightGBM(LGBM),and CatBoost(CB)algorithms were employed to construct the LSM model.Furthermore,the Bayesian optimization(BO),particle swarm optimization(PSO),and Hyperband optimization(HO)algorithms were applied to optimizing the LSM model.The boosting algorithms exhibited varying performances,with CB demonstrating the highest precision,followed by LGBM,and XGB showing poorer precision.Additionally,the hyperparameter optimization algorithms displayed different performances,with HO outperforming PSO and BO showing poorer performance.The HO-CB model achieved the highest precision,boasting an accuracy of 0.764,an F1-score of 0.777,an area under the curve(AUC)value of 0.837 for the training set,and an AUC value of 0.863 for the test set.The model was interpreted using SHapley Additive exPlanations(SHAP),revealing that slope,curvature,topographic wetness index(TWI),degree of relief,and elevation significantly influenced landslides in the study area.This study offers a scientific reference for LSM and disaster prevention research.This study examines the utilization of various boosting algorithms and hyperparameter optimization algorithms in Wanzhou District.It proposes the HO-CB-SHAP framework as an effective approach to accurately forecast landslide disasters and interpret LSM models.However,limitations exist concerning the generalizability of the model and the data processing,which require further exploration in subsequent studies.展开更多
Colletotrichum kahawae(Coffee Berry Disease)spreads through spores that can be carried by wind,rain,and insects affecting coffee plantations,and causes 80%yield losses and poor-quality coffee beans.The deadly disease ...Colletotrichum kahawae(Coffee Berry Disease)spreads through spores that can be carried by wind,rain,and insects affecting coffee plantations,and causes 80%yield losses and poor-quality coffee beans.The deadly disease is hard to control because wind,rain,and insects carry spores.Colombian researchers utilized a deep learning system to identify CBD in coffee cherries at three growth stages and classify photographs of infected and uninfected cherries with 93%accuracy using a random forest method.If the dataset is too small and noisy,the algorithm may not learn data patterns and generate accurate predictions.To overcome the existing challenge,early detection of Colletotrichum Kahawae disease in coffee cherries requires automated processes,prompt recognition,and accurate classifications.The proposed methodology selects CBD image datasets through four different stages for training and testing.XGBoost to train a model on datasets of coffee berries,with each image labeled as healthy or diseased.Once themodel is trained,SHAP algorithmto figure out which features were essential formaking predictions with the proposed model.Some of these characteristics were the cherry’s colour,whether it had spots or other damage,and how big the Lesions were.Virtual inception is important for classification to virtualize the relationship between the colour of the berry is correlated with the presence of disease.To evaluate themodel’s performance andmitigate excess fitting,a 10-fold cross-validation approach is employed.This involves partitioning the dataset into ten subsets,training the model on each subset,and evaluating its performance.In comparison to other contemporary methodologies,the model put forth achieved an accuracy of 98.56%.展开更多
Today,urban traffic,growing populations,and dense transportation networks are contributing to an increase in traffic incidents.These incidents include traffic accidents,vehicle breakdowns,fires,and traffic disputes,re...Today,urban traffic,growing populations,and dense transportation networks are contributing to an increase in traffic incidents.These incidents include traffic accidents,vehicle breakdowns,fires,and traffic disputes,resulting in long waiting times,high carbon emissions,and other undesirable situations.It is vital to estimate incident response times quickly and accurately after traffic incidents occur for the success of incident-related planning and response activities.This study presents a model for forecasting the traffic incident duration of traffic events with high precision.The proposed model goes through a 4-stage process using various features to predict the duration of four different traffic events and presents a feature reduction approach to enable real-time data collection and prediction.In the first stage,the dataset consisting of 24,431 data points and 75 variables is prepared by data collection,merging,missing data processing and data cleaning.In the second stage,models such as Decision Trees(DT),K-Nearest Neighbour(KNN),Random Forest(RF)and Support Vector Machines(SVM)are used and hyperparameter optimisation is performed with GridSearchCV.In the third stage,feature selection and reduction are performed and real-time data are used.In the last stage,model performance with 14 variables is evaluated with metrics such as accuracy,precision,recall,F1-score,MCC,confusion matrix and SHAP.The RF model outperforms other models with an accuracy of 98.5%.The study’s prediction results demonstrate that the proposed dynamic prediction model can achieve a high level of success.展开更多
Accurate prediction of molten steel temperature in the ladle furnace(LF)refining process has an important influence on the quality of molten steel and the control of steelmaking cost.Extensive research on establishing...Accurate prediction of molten steel temperature in the ladle furnace(LF)refining process has an important influence on the quality of molten steel and the control of steelmaking cost.Extensive research on establishing models to predict molten steel temperature has been conducted.However,most researchers focus solely on improving the accuracy of the model,neglecting its explainability.The present study aims to develop a high-precision and explainable model with improved reliability and transparency.The eXtreme gradient boosting(XGBoost)and light gradient boosting machine(LGBM)were utilized,along with bayesian optimization and grey wolf optimiz-ation(GWO),to establish the prediction model.Different performance evaluation metrics and graphical representations were applied to compare the optimal XGBoost and LGBM models obtained through varying hyperparameter optimization methods with the other models.The findings indicated that the GWO-LGBM model outperformed other methods in predicting molten steel temperature,with a high pre-diction accuracy of 89.35%within the error range of±5°C.The model’s learning/decision process was revealed,and the influence degree of different variables on the molten steel temperature was clarified using the tree structure visualization and SHapley Additive exPlana-tions(SHAP)analysis.Consequently,the explainability of the optimal GWO-LGBM model was enhanced,providing reliable support for prediction results.展开更多
People learn causal relations since childhood using counterfactual reasoning. Counterfactual reasoning uses counterfactual examples which take the form of “what if this has happened differently”. Counterfactual exam...People learn causal relations since childhood using counterfactual reasoning. Counterfactual reasoning uses counterfactual examples which take the form of “what if this has happened differently”. Counterfactual examples are also the basis of counterfactual explanation in explainable artificial intelligence (XAI). However, a framework that relies solely on optimization algorithms to find and present counterfactual samples cannot help users gain a deeper understanding of the system. Without a way to verify their understanding, the users can even be misled by such explanations. Such limitations can be overcome through an interactive and iterative framework that allows the users to explore their desired “what-if” scenarios. The purpose of our research is to develop such a framework. In this paper, we present our “what-if” XAI framework (WiXAI), which visualizes the artificial intelligence (AI) classification model from the perspective of the user’s sample and guides their “what-if” exploration. We also formulated how to use the WiXAI framework to generate counterfactuals and understand the feature-feature and feature-output relations in-depth for a local sample. These relations help move the users toward causal understanding.展开更多
A lightweight malware detection and family classification system for the Internet of Things (IoT) was designed to solve the difficulty of deploying defense models caused by the limited computing and storage resources ...A lightweight malware detection and family classification system for the Internet of Things (IoT) was designed to solve the difficulty of deploying defense models caused by the limited computing and storage resources of IoT devices. By training complex models with IoT software gray-scale images and utilizing the gradient-weighted class-activated mapping technique, the system can identify key codes that influence model decisions. This allows for the reconstruction of gray-scale images to train a lightweight model called LMDNet for malware detection. Additionally, the multi-teacher knowledge distillation method is employed to train KD-LMDNet, which focuses on classifying malware families. The results indicate that the model’s identification speed surpasses that of traditional methods by 23.68%. Moreover, the accuracy achieved on the Malimg dataset for family classification is an impressive 99.07%. Furthermore, with a model size of only 0.45M, it appears to be well-suited for the IoT environment. By training complex models using IoT software gray-scale images and utilizing the gradient-weighted class-activated mapping technique, the system can identify key codes that influence model decisions. This allows for the reconstruction of gray-scale images to train a lightweight model called LMDNet for malware detection. Thus, the presented approach can address the challenges associated with malware detection and family classification in IoT devices.展开更多
This paper reviews the research on land use change and its corresponding ecological responses. Patterns of land use changes in spatio-temporal level are produced by the interaction of biophysical and socio-economic pr...This paper reviews the research on land use change and its corresponding ecological responses. Patterns of land use changes in spatio-temporal level are produced by the interaction of biophysical and socio-economic processes. Nowadays, the studies derived from different socioeconomic conditions and scales show that at short-term scale, human activities, rather than natural forces, have become a major force in shaping the environment, while biophysical factors control the trends and processes of land use change under the macro environmental background. Providing a scientific understanding of the process of land use change, the impacts of different land use decisions, and the ways that decisions are affected by a changing environment and increasing ecological variability are the priority areas for research: (1) explanation of scale dependency of drivers of land use change; (2) quantification of driving factors of land use change; (3) incorporation of biophysical feedbacks in land use change models: and (4) underlying processes and mechanisms of ecological impacts of land use change.展开更多
Hade 4 oilfield is located on the Hadexun tectonic belt north of the Manjiaer depression in the Tarim basin, whose main target layer is the Donghe sandstone reservoir, with a burial depth over 5,000m and an amplitude ...Hade 4 oilfield is located on the Hadexun tectonic belt north of the Manjiaer depression in the Tarim basin, whose main target layer is the Donghe sandstone reservoir, with a burial depth over 5,000m and an amplitude below 34m, at the bottom of the Carboniferous. The Donghe sandstone reservoir consists of littoral facies deposited quartz sandstones of the transgressive system tract, overlapping northward and pinching out. Exploration and development confirms that water-oil contact tilts from the southeast to the northwest with a drop height of nearly 80m. The reservoir, under the control of both the stratigraphic overlap pinch-out and tectonism, is a typical subtle reservoir. The Donghe sandstone reservoir in Hade 4 oilfield also has the feature of a large oil-bearing area (over 130 km2 proved), a small thickness (average efficient thickness below 6m) and a low abundance (below 50 × 104t/km2). Moreover, above the target layer developed a set of igneous rocks with an uneven thickness in the Permian formation, thus causing a great difficulty in research of the velocity field. Considering these features, an combination mode of exploration and development is adopted, namely by way of whole deployment, step-by-step enforcement and rolling development with key problems to be tackled, in order to further deepen the understanding and enlarge the fruits of exploration and development. The paper technically focuses its study on the following four aspects concerning problem tackling. First, to strengthen the collecting, processing and explanation of seismic data, improve the resolution, accurately recognize the pinch-out line of the Donghe sandstone reservoir by combining the drilling materials in order to make sure its distribution law; second, to strengthen the research on velocity field, improve the accuracy of variable speed mapping, make corrections by the data from newly- drilled key wells and, as a result, the precision of tectonic description is greatly improved; third, to strengthen the research on sequence stratigraphy and make sure the distribution law of the Donghe sandstone; and fourth, with a step- by-step extrapolation method, to deepen the cognition of the leaning water-oil contact, and by combining the tectonic description and drilling results, to make sure little by little the law of change of the water-oil contact. The exploration and development of the Donghe sandstone subtle reservoir in Hade 4 oilfield is a gradually perfected process. From 1998 when it was discovered till now, the reservoir has managed to make a benign circle of exploration and development, in which its reserve has gradually been enlarged, its production scale increased, and, in a word, it has used techniques necessary for this subtle reservoir in the Tarim basin.展开更多
Causal inference is a powerful modeling tool for explanatory analysis,which might enable current machine learning to become explainable.How to marry causal inference with machine learning to develop explainable artifi...Causal inference is a powerful modeling tool for explanatory analysis,which might enable current machine learning to become explainable.How to marry causal inference with machine learning to develop explainable artificial intelligence(XAI)algorithms is one of key steps toward to the artificial intelligence 2.0.With the aim of bringing knowledge of causal inference to scholars of machine learning and artificial intelligence,we invited researchers working on causal inference to write this survey from different aspects of causal inference.This survey includes the following sections:“Estimating average treatment effect:A brief review and beyond”from Dr.Kun Kuang,“Attribution problems in counterfactual inference”from Prof.Lian Li,“The Yule–Simpson paradox and the surrogate paradox”from Prof.Zhi Geng,“Causal potential theory”from Prof.Lei Xu,“Discovering causal information from observational data”from Prof.Kun Zhang,“Formal argumentation in causal reasoning and explanation”from Profs.Beishui Liao and Huaxin Huang,“Causal inference with complex experiments”from Prof.Peng Ding,“Instrumental variables and negative controls for observational studies”from Prof.Wang Miao,and“Causal inference with interference”from Dr.Zhichao Jiang.展开更多
Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the sett...Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。展开更多
Collaborative Filtering(CF) is a leading approach to build recommender systems which has gained considerable development and popularity. A predominant approach to CF is rating prediction recommender algorithm, aiming ...Collaborative Filtering(CF) is a leading approach to build recommender systems which has gained considerable development and popularity. A predominant approach to CF is rating prediction recommender algorithm, aiming to predict a user's rating for those items which were not rated yet by the user. However, with the increasing number of items and users, thedata is sparse.It is difficult to detectlatent closely relation among the items or users for predicting the user behaviors. In this paper,we enhance the rating prediction approach leading to substantial improvement of prediction accuracy by categorizing according to the genres of movies. Then the probabilities that users are interested in the genres are computed to integrate the prediction of each genre cluster. A novel probabilistic approach based on the sentiment analysis of the user reviews is also proposed to give intuitional explanations of why an item is recommended.To test the novel recommendation approach, a new corpus of user reviews on movies obtained from the Internet Movies Database(IMDB) has been generated. Experimental results show that the proposed framework is effective and achieves a better prediction performance.展开更多
A lot of narrow lines.on R_(2),R^(+),N_(1),N^(-) and some other un kowll centers have been observed in γ-raycd pure and doped sodium fuoride crystals in the temperature range of 9-77 K.The proper laser irradiation wa...A lot of narrow lines.on R_(2),R^(+),N_(1),N^(-) and some other un kowll centers have been observed in γ-raycd pure and doped sodium fuoride crystals in the temperature range of 9-77 K.The proper laser irradiation was used for strengthening the zero-phonon lines and the relative explanation was given.The spectral properties and thermosta-bilities of the lines were in vestigated systematically at difcrent temperatures.展开更多
AIM To investigate the post-colonoscopy colorectal cancer(PCCRC) rate for high-definition(HD) colonoscopy compared with that for standard-definition colonoscopy reported previously.METHODS Using medical records at San...AIM To investigate the post-colonoscopy colorectal cancer(PCCRC) rate for high-definition(HD) colonoscopy compared with that for standard-definition colonoscopy reported previously.METHODS Using medical records at Sano Hospital(SH) and Dokkyo Medical University Koshigaya Hospital(DMUKH), we retrospectively obtained data on consecutive patients diagnosed as having CRC between January 2010 andDecember 2015. The definition of PCCRC was diagnosis of CRC between 7 and 36 mo after initial high-definition colonoscopy that had detected no cancer, and patients were divided into a PCCRC group and a non-PCCRC group. The primary outcome was the rate of PCCRC for HD colonoscopy. The secondary outcomes were factors associated with PCCRC and possible reason for occurrence of early and advanced PCCRC.RESULTS Among 892 CRC patients, 11 were diagnosed as having PCCRC and 881 had non-PCCRC. The PCCRC rate was 1.7%(8/471) at SH and 0.7%(3/421) at DMUKH. In comparison with the non-PCCRC group, the PCCRC group had a significantly higher preponderance of smaller tumors(39 mm vs 19 mm, P = 0.002), a shallower invasion depth(T1 rate, 25.4% vs 63.6%, P = 0.01), a non-polypoid macroscopic appearance(39.0% vs 85.7%, P = 0.02) and an earlier stage(59.7% vs 90.9%, P = 0.03). Possible reasons for PCCRC were "missed or new" in 9 patients(82%), "incomplete resection" in 1(9%), and "inadequate examination'" in 1(9%). Among 9 "missed or new" PCCRC, the leading cause was non-polypoid shape for early PCCRC and blinded location for advanced PCCRC.CONCLUSION The PCCRC rate for HD colonoscopy was 0.7%-1.7%, being lower than that for standard-definition colonoscopy(1.8%-9.0%) reported previously employing the same methodology.展开更多
Two kinds of iterative methods are designed to solve the linear system of equations, we obtain a new interpretation in terms of a geometric concept. Therefore, we have a better insight into the essence of the iterativ...Two kinds of iterative methods are designed to solve the linear system of equations, we obtain a new interpretation in terms of a geometric concept. Therefore, we have a better insight into the essence of the iterative methods and provide a reference for further study and design. Finally, a new iterative method is designed named as the diverse relaxation parameter of the SOR method which, in particular, demonstrates the geometric characteristics. Many examples prove that the method is quite effective.展开更多
Fish maw(the dried swimbladders of fish) is ranked in the list of the four sea treasures in Chinese cuisine. Fish maw is mainly produced from croaker, which is the most highly priced. However, some of the fish maw bei...Fish maw(the dried swimbladders of fish) is ranked in the list of the four sea treasures in Chinese cuisine. Fish maw is mainly produced from croaker, which is the most highly priced. However, some of the fish maw being sold as croaker maw are in fact not from croaker, but from the Nile perch Lates niloticus. The present work determined and compared the proximate composition, amino acid and fatty acid composition of croaker Protonibea diacanthus maw and perch L. niloticus maw. The results indicated that both maws were high protein sources and low in fat content. The dominant amino acids in both maws were glycine, proline, glutamic acid, alanine and arginine. These amino acids constituted 66.2% and 66.4% of the total amino acids in P. diacanthus and L. niloticus, respectively. The ratio of FAA: TAA(functional amino acids: total amino acids) in both maws were 0.69. This is a good explanation for why fish maws have been widely utilized as a traditional tonic and remedy in Asia. Except valine and histidine, all the essential amino acid contents in P. diacanthus were higher than in L. niloticus. Moreover, croaker P. diacanthus maw contained more AA and DHA than perch L. niloticus maw, showing a higher ratio of n-3 / n-6, which is more desirable.展开更多
基金supported by the Natural Science Foundation for Young Scientists in Shaanxi Province of China (2023-JC-QN-0729)the Fundamental Research Funds for the Central Universities (GK202207008)。
文摘Dear Editor,Scene understanding is an essential task in computer vision.The ultimate objective of scene understanding is to instruct computers to understand and reason about the scenes as humans do.Parallel vision is a research framework that unifies the explanation and perception of dynamic and complex scenes.
基金funded by the Natural Science Foundation of Chongqing(Grants No.CSTB2022NSCQ-MSX0594)the Humanities and Social Sciences Research Project of the Ministry of Education(Grants No.16YJCZH061).
文摘Boosting algorithms have been widely utilized in the development of landslide susceptibility mapping(LSM)studies.However,these algorithms possess distinct computational strategies and hyperparameters,making it challenging to propose an ideal LSM model.To investigate the impact of different boosting algorithms and hyperparameter optimization algorithms on LSM,this study constructed a geospatial database comprising 12 conditioning factors,such as elevation,stratum,and annual average rainfall.The XGBoost(XGB),LightGBM(LGBM),and CatBoost(CB)algorithms were employed to construct the LSM model.Furthermore,the Bayesian optimization(BO),particle swarm optimization(PSO),and Hyperband optimization(HO)algorithms were applied to optimizing the LSM model.The boosting algorithms exhibited varying performances,with CB demonstrating the highest precision,followed by LGBM,and XGB showing poorer precision.Additionally,the hyperparameter optimization algorithms displayed different performances,with HO outperforming PSO and BO showing poorer performance.The HO-CB model achieved the highest precision,boasting an accuracy of 0.764,an F1-score of 0.777,an area under the curve(AUC)value of 0.837 for the training set,and an AUC value of 0.863 for the test set.The model was interpreted using SHapley Additive exPlanations(SHAP),revealing that slope,curvature,topographic wetness index(TWI),degree of relief,and elevation significantly influenced landslides in the study area.This study offers a scientific reference for LSM and disaster prevention research.This study examines the utilization of various boosting algorithms and hyperparameter optimization algorithms in Wanzhou District.It proposes the HO-CB-SHAP framework as an effective approach to accurately forecast landslide disasters and interpret LSM models.However,limitations exist concerning the generalizability of the model and the data processing,which require further exploration in subsequent studies.
基金support from the Deanship for Research&Innovation,Ministry of Education in Saudi Arabia,under the Auspices of Project Number:IFP22UQU4281768DSR122.
文摘Colletotrichum kahawae(Coffee Berry Disease)spreads through spores that can be carried by wind,rain,and insects affecting coffee plantations,and causes 80%yield losses and poor-quality coffee beans.The deadly disease is hard to control because wind,rain,and insects carry spores.Colombian researchers utilized a deep learning system to identify CBD in coffee cherries at three growth stages and classify photographs of infected and uninfected cherries with 93%accuracy using a random forest method.If the dataset is too small and noisy,the algorithm may not learn data patterns and generate accurate predictions.To overcome the existing challenge,early detection of Colletotrichum Kahawae disease in coffee cherries requires automated processes,prompt recognition,and accurate classifications.The proposed methodology selects CBD image datasets through four different stages for training and testing.XGBoost to train a model on datasets of coffee berries,with each image labeled as healthy or diseased.Once themodel is trained,SHAP algorithmto figure out which features were essential formaking predictions with the proposed model.Some of these characteristics were the cherry’s colour,whether it had spots or other damage,and how big the Lesions were.Virtual inception is important for classification to virtualize the relationship between the colour of the berry is correlated with the presence of disease.To evaluate themodel’s performance andmitigate excess fitting,a 10-fold cross-validation approach is employed.This involves partitioning the dataset into ten subsets,training the model on each subset,and evaluating its performance.In comparison to other contemporary methodologies,the model put forth achieved an accuracy of 98.56%.
文摘Today,urban traffic,growing populations,and dense transportation networks are contributing to an increase in traffic incidents.These incidents include traffic accidents,vehicle breakdowns,fires,and traffic disputes,resulting in long waiting times,high carbon emissions,and other undesirable situations.It is vital to estimate incident response times quickly and accurately after traffic incidents occur for the success of incident-related planning and response activities.This study presents a model for forecasting the traffic incident duration of traffic events with high precision.The proposed model goes through a 4-stage process using various features to predict the duration of four different traffic events and presents a feature reduction approach to enable real-time data collection and prediction.In the first stage,the dataset consisting of 24,431 data points and 75 variables is prepared by data collection,merging,missing data processing and data cleaning.In the second stage,models such as Decision Trees(DT),K-Nearest Neighbour(KNN),Random Forest(RF)and Support Vector Machines(SVM)are used and hyperparameter optimisation is performed with GridSearchCV.In the third stage,feature selection and reduction are performed and real-time data are used.In the last stage,model performance with 14 variables is evaluated with metrics such as accuracy,precision,recall,F1-score,MCC,confusion matrix and SHAP.The RF model outperforms other models with an accuracy of 98.5%.The study’s prediction results demonstrate that the proposed dynamic prediction model can achieve a high level of success.
基金financially supported by the National Natural Science Foundation of China(Nos.51974023 and 52374321)the funding of State Key Laboratory of Advanced Metallurgy,University of Science and Technology Beijing(No.41621005)the Youth Science and Technology Innovation Fund of Jianlong Group-University of Science and Technology Beijing(No.20231235).
文摘Accurate prediction of molten steel temperature in the ladle furnace(LF)refining process has an important influence on the quality of molten steel and the control of steelmaking cost.Extensive research on establishing models to predict molten steel temperature has been conducted.However,most researchers focus solely on improving the accuracy of the model,neglecting its explainability.The present study aims to develop a high-precision and explainable model with improved reliability and transparency.The eXtreme gradient boosting(XGBoost)and light gradient boosting machine(LGBM)were utilized,along with bayesian optimization and grey wolf optimiz-ation(GWO),to establish the prediction model.Different performance evaluation metrics and graphical representations were applied to compare the optimal XGBoost and LGBM models obtained through varying hyperparameter optimization methods with the other models.The findings indicated that the GWO-LGBM model outperformed other methods in predicting molten steel temperature,with a high pre-diction accuracy of 89.35%within the error range of±5°C.The model’s learning/decision process was revealed,and the influence degree of different variables on the molten steel temperature was clarified using the tree structure visualization and SHapley Additive exPlana-tions(SHAP)analysis.Consequently,the explainability of the optimal GWO-LGBM model was enhanced,providing reliable support for prediction results.
文摘People learn causal relations since childhood using counterfactual reasoning. Counterfactual reasoning uses counterfactual examples which take the form of “what if this has happened differently”. Counterfactual examples are also the basis of counterfactual explanation in explainable artificial intelligence (XAI). However, a framework that relies solely on optimization algorithms to find and present counterfactual samples cannot help users gain a deeper understanding of the system. Without a way to verify their understanding, the users can even be misled by such explanations. Such limitations can be overcome through an interactive and iterative framework that allows the users to explore their desired “what-if” scenarios. The purpose of our research is to develop such a framework. In this paper, we present our “what-if” XAI framework (WiXAI), which visualizes the artificial intelligence (AI) classification model from the perspective of the user’s sample and guides their “what-if” exploration. We also formulated how to use the WiXAI framework to generate counterfactuals and understand the feature-feature and feature-output relations in-depth for a local sample. These relations help move the users toward causal understanding.
文摘A lightweight malware detection and family classification system for the Internet of Things (IoT) was designed to solve the difficulty of deploying defense models caused by the limited computing and storage resources of IoT devices. By training complex models with IoT software gray-scale images and utilizing the gradient-weighted class-activated mapping technique, the system can identify key codes that influence model decisions. This allows for the reconstruction of gray-scale images to train a lightweight model called LMDNet for malware detection. Additionally, the multi-teacher knowledge distillation method is employed to train KD-LMDNet, which focuses on classifying malware families. The results indicate that the model’s identification speed surpasses that of traditional methods by 23.68%. Moreover, the accuracy achieved on the Malimg dataset for family classification is an impressive 99.07%. Furthermore, with a model size of only 0.45M, it appears to be well-suited for the IoT environment. By training complex models using IoT software gray-scale images and utilizing the gradient-weighted class-activated mapping technique, the system can identify key codes that influence model decisions. This allows for the reconstruction of gray-scale images to train a lightweight model called LMDNet for malware detection. Thus, the presented approach can address the challenges associated with malware detection and family classification in IoT devices.
基金N ational N atural Science Foundation of China,N o.49771073K ey Projectof the Chinese A cadem y of Sciences,N o.K 2952-J1-203
文摘This paper reviews the research on land use change and its corresponding ecological responses. Patterns of land use changes in spatio-temporal level are produced by the interaction of biophysical and socio-economic processes. Nowadays, the studies derived from different socioeconomic conditions and scales show that at short-term scale, human activities, rather than natural forces, have become a major force in shaping the environment, while biophysical factors control the trends and processes of land use change under the macro environmental background. Providing a scientific understanding of the process of land use change, the impacts of different land use decisions, and the ways that decisions are affected by a changing environment and increasing ecological variability are the priority areas for research: (1) explanation of scale dependency of drivers of land use change; (2) quantification of driving factors of land use change; (3) incorporation of biophysical feedbacks in land use change models: and (4) underlying processes and mechanisms of ecological impacts of land use change.
文摘Hade 4 oilfield is located on the Hadexun tectonic belt north of the Manjiaer depression in the Tarim basin, whose main target layer is the Donghe sandstone reservoir, with a burial depth over 5,000m and an amplitude below 34m, at the bottom of the Carboniferous. The Donghe sandstone reservoir consists of littoral facies deposited quartz sandstones of the transgressive system tract, overlapping northward and pinching out. Exploration and development confirms that water-oil contact tilts from the southeast to the northwest with a drop height of nearly 80m. The reservoir, under the control of both the stratigraphic overlap pinch-out and tectonism, is a typical subtle reservoir. The Donghe sandstone reservoir in Hade 4 oilfield also has the feature of a large oil-bearing area (over 130 km2 proved), a small thickness (average efficient thickness below 6m) and a low abundance (below 50 × 104t/km2). Moreover, above the target layer developed a set of igneous rocks with an uneven thickness in the Permian formation, thus causing a great difficulty in research of the velocity field. Considering these features, an combination mode of exploration and development is adopted, namely by way of whole deployment, step-by-step enforcement and rolling development with key problems to be tackled, in order to further deepen the understanding and enlarge the fruits of exploration and development. The paper technically focuses its study on the following four aspects concerning problem tackling. First, to strengthen the collecting, processing and explanation of seismic data, improve the resolution, accurately recognize the pinch-out line of the Donghe sandstone reservoir by combining the drilling materials in order to make sure its distribution law; second, to strengthen the research on velocity field, improve the accuracy of variable speed mapping, make corrections by the data from newly- drilled key wells and, as a result, the precision of tectonic description is greatly improved; third, to strengthen the research on sequence stratigraphy and make sure the distribution law of the Donghe sandstone; and fourth, with a step- by-step extrapolation method, to deepen the cognition of the leaning water-oil contact, and by combining the tectonic description and drilling results, to make sure little by little the law of change of the water-oil contact. The exploration and development of the Donghe sandstone subtle reservoir in Hade 4 oilfield is a gradually perfected process. From 1998 when it was discovered till now, the reservoir has managed to make a benign circle of exploration and development, in which its reserve has gradually been enlarged, its production scale increased, and, in a word, it has used techniques necessary for this subtle reservoir in the Tarim basin.
文摘Causal inference is a powerful modeling tool for explanatory analysis,which might enable current machine learning to become explainable.How to marry causal inference with machine learning to develop explainable artificial intelligence(XAI)algorithms is one of key steps toward to the artificial intelligence 2.0.With the aim of bringing knowledge of causal inference to scholars of machine learning and artificial intelligence,we invited researchers working on causal inference to write this survey from different aspects of causal inference.This survey includes the following sections:“Estimating average treatment effect:A brief review and beyond”from Dr.Kun Kuang,“Attribution problems in counterfactual inference”from Prof.Lian Li,“The Yule–Simpson paradox and the surrogate paradox”from Prof.Zhi Geng,“Causal potential theory”from Prof.Lei Xu,“Discovering causal information from observational data”from Prof.Kun Zhang,“Formal argumentation in causal reasoning and explanation”from Profs.Beishui Liao and Huaxin Huang,“Causal inference with complex experiments”from Prof.Peng Ding,“Instrumental variables and negative controls for observational studies”from Prof.Wang Miao,and“Causal inference with interference”from Dr.Zhichao Jiang.
基金support provided by The Science and Technology Development Fund,Macao SAR,China(File Nos.0057/2020/AGJ and SKL-IOTSC-2021-2023)Science and Technology Program of Guangdong Province,China(Grant No.2021A0505080009).
文摘Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。
基金supported in part by National Science Foundation of China under Grants No.61303105 and 61402304the Humanity&Social Science general project of Ministry of Education under Grants No.14YJAZH046+2 种基金the Beijing Natural Science Foundation under Grants No.4154065the Beijing Educational Committee Science and Technology Development Planned under Grants No.KM201410028017Academic Degree Graduate Courses group projects
文摘Collaborative Filtering(CF) is a leading approach to build recommender systems which has gained considerable development and popularity. A predominant approach to CF is rating prediction recommender algorithm, aiming to predict a user's rating for those items which were not rated yet by the user. However, with the increasing number of items and users, thedata is sparse.It is difficult to detectlatent closely relation among the items or users for predicting the user behaviors. In this paper,we enhance the rating prediction approach leading to substantial improvement of prediction accuracy by categorizing according to the genres of movies. Then the probabilities that users are interested in the genres are computed to integrate the prediction of each genre cluster. A novel probabilistic approach based on the sentiment analysis of the user reviews is also proposed to give intuitional explanations of why an item is recommended.To test the novel recommendation approach, a new corpus of user reviews on movies obtained from the Internet Movies Database(IMDB) has been generated. Experimental results show that the proposed framework is effective and achieves a better prediction performance.
基金Supported by the National Natural Science Foundation of China。
文摘A lot of narrow lines.on R_(2),R^(+),N_(1),N^(-) and some other un kowll centers have been observed in γ-raycd pure and doped sodium fuoride crystals in the temperature range of 9-77 K.The proper laser irradiation was used for strengthening the zero-phonon lines and the relative explanation was given.The spectral properties and thermosta-bilities of the lines were in vestigated systematically at difcrent temperatures.
文摘AIM To investigate the post-colonoscopy colorectal cancer(PCCRC) rate for high-definition(HD) colonoscopy compared with that for standard-definition colonoscopy reported previously.METHODS Using medical records at Sano Hospital(SH) and Dokkyo Medical University Koshigaya Hospital(DMUKH), we retrospectively obtained data on consecutive patients diagnosed as having CRC between January 2010 andDecember 2015. The definition of PCCRC was diagnosis of CRC between 7 and 36 mo after initial high-definition colonoscopy that had detected no cancer, and patients were divided into a PCCRC group and a non-PCCRC group. The primary outcome was the rate of PCCRC for HD colonoscopy. The secondary outcomes were factors associated with PCCRC and possible reason for occurrence of early and advanced PCCRC.RESULTS Among 892 CRC patients, 11 were diagnosed as having PCCRC and 881 had non-PCCRC. The PCCRC rate was 1.7%(8/471) at SH and 0.7%(3/421) at DMUKH. In comparison with the non-PCCRC group, the PCCRC group had a significantly higher preponderance of smaller tumors(39 mm vs 19 mm, P = 0.002), a shallower invasion depth(T1 rate, 25.4% vs 63.6%, P = 0.01), a non-polypoid macroscopic appearance(39.0% vs 85.7%, P = 0.02) and an earlier stage(59.7% vs 90.9%, P = 0.03). Possible reasons for PCCRC were "missed or new" in 9 patients(82%), "incomplete resection" in 1(9%), and "inadequate examination'" in 1(9%). Among 9 "missed or new" PCCRC, the leading cause was non-polypoid shape for early PCCRC and blinded location for advanced PCCRC.CONCLUSION The PCCRC rate for HD colonoscopy was 0.7%-1.7%, being lower than that for standard-definition colonoscopy(1.8%-9.0%) reported previously employing the same methodology.
基金Supported by the National Natural Science Foundation of China(61272300)
文摘Two kinds of iterative methods are designed to solve the linear system of equations, we obtain a new interpretation in terms of a geometric concept. Therefore, we have a better insight into the essence of the iterative methods and provide a reference for further study and design. Finally, a new iterative method is designed named as the diverse relaxation parameter of the SOR method which, in particular, demonstrates the geometric characteristics. Many examples prove that the method is quite effective.
基金supported by the National Natural Science Foundation of China (No.31201999)the Natural Science Foundation of Guangdong Province, China (No.2014A030307022)+5 种基金the Special Support Program of Guangdong Province, China (No.2014TQ01N621)the Foundation for Distinguished Young Teachers in Higher Education of Guangdong, China (No.Yq2014115)the Foundation of Education Bureau of Guangdong Province (No.2014KTSCX159)the Technology Program of Guangdong Province (No.2015A030302089)the Overseas Scholarship Program for Elite Young and Middle-aged Teachers of Lingnan Normal University, the Technology Program of Zhanjiang (Nos.2015A03017, 2014A03011)the Guangxi Key Laboratory of Beibu Gulf Marine Biodiversity Conservation, Qinzhou University (No.2015KB04)
文摘Fish maw(the dried swimbladders of fish) is ranked in the list of the four sea treasures in Chinese cuisine. Fish maw is mainly produced from croaker, which is the most highly priced. However, some of the fish maw being sold as croaker maw are in fact not from croaker, but from the Nile perch Lates niloticus. The present work determined and compared the proximate composition, amino acid and fatty acid composition of croaker Protonibea diacanthus maw and perch L. niloticus maw. The results indicated that both maws were high protein sources and low in fat content. The dominant amino acids in both maws were glycine, proline, glutamic acid, alanine and arginine. These amino acids constituted 66.2% and 66.4% of the total amino acids in P. diacanthus and L. niloticus, respectively. The ratio of FAA: TAA(functional amino acids: total amino acids) in both maws were 0.69. This is a good explanation for why fish maws have been widely utilized as a traditional tonic and remedy in Asia. Except valine and histidine, all the essential amino acid contents in P. diacanthus were higher than in L. niloticus. Moreover, croaker P. diacanthus maw contained more AA and DHA than perch L. niloticus maw, showing a higher ratio of n-3 / n-6, which is more desirable.