To overcome the problem of imprecise and unclear information in the development of quality functions,a method for determining the priority of engineering features based on mixed linguistic variables is proposed.First,...To overcome the problem of imprecise and unclear information in the development of quality functions,a method for determining the priority of engineering features based on mixed linguistic variables is proposed.First,the evaluation member uses the determined linguistic variable to give the correlation strength evaluation matrix of customer requirements and engineering features.Secondly,the relative importance of the evaluation member and customer requirements are aggregated.Finally,the priority of engineering features is obtained by calculating the deviation.The feasibility and practicability of this method are proven by taking the design of a new product of a long bag low-pressure pulse dust collector as an example.展开更多
The geological features of three types of tropical volcanic rock and soil distributed along Jakarta-Bandung high-speed railway(HSR),including pozzolanic clayey soil,mud shale and deep soft soil,are studied through fie...The geological features of three types of tropical volcanic rock and soil distributed along Jakarta-Bandung high-speed railway(HSR),including pozzolanic clayey soil,mud shale and deep soft soil,are studied through field and laboratory tests.The paper analyzes the mechanism and causes of engineering geological problems caused by tropical volcanic rock and soil and puts forward measures to control subgrade slope instability by rationally determining project type,making side slope stability control and strengthening waterproofing and drainage.The“zero front slope”tunneling technology at the portal,the simplified excavation method of double-side wall heading and the cross brace construction method of arch protection within the semi-open cut row pile frame in the“mountainside”eccentrically loaded soft soil stratum are adopted to control the instability of tunnel side and front slopes,foundation pits and working faces;CFG or pipe piles shall be used to reinforce soft and expansive foundation or replacement measures shall be taken,and the scheme of blind ditch+double-layer water sealing in ballastless track section shall be put forward to prevent arching deformation of foundation;the treatment measures of CFG pile,pipe pile and vacuum combined piled preloading are adopted to improve the bearing capacity of foundation in deep soft soil section and solve the problems of settlement control and uneven settlement.These engineering countermeasures have been applied during the construction of Jakarta-Bandung HSR and achieved good results.展开更多
State of health(SOH)estimation of e-mobilities operated in real and dynamic conditions is essential and challenging.Most of existing estimations are based on a fixed constant current charging and discharging aging pro...State of health(SOH)estimation of e-mobilities operated in real and dynamic conditions is essential and challenging.Most of existing estimations are based on a fixed constant current charging and discharging aging profiles,which overlooked the fact that the charging and discharging profiles are random and not complete in real application.This work investigates the influence of feature engineering on the accuracy of different machine learning(ML)-based SOH estimations acting on different recharging sub-profiles where a realistic battery mission profile is considered.Fifteen features were extracted from the battery partial recharging profiles,considering different factors such as starting voltage values,charge amount,and charging sliding windows.Then,features were selected based on a feature selection pipeline consisting of filtering and supervised ML-based subset selection.Multiple linear regression(MLR),Gaussian process regression(GPR),and support vector regression(SVR)were applied to estimate SOH,and root mean square error(RMSE)was used to evaluate and compare the estimation performance.The results showed that the feature selection pipeline can improve SOH estimation accuracy by 55.05%,2.57%,and 2.82%for MLR,GPR and SVR respectively.It was demonstrated that the estimation based on partial charging profiles with lower starting voltage,large charge,and large sliding window size is more likely to achieve higher accuracy.This work hopes to give some insights into the supervised ML-based feature engineering acting on random partial recharges on SOH estimation performance and tries to fill the gap of effective SOH estimation between theoretical study and real dynamic application.展开更多
Background Deep 3D morphable models(deep 3DMMs)play an essential role in computer vision.They are used in facial synthesis,compression,reconstruction and animation,avatar creation,virtual try-on,facial recognition sys...Background Deep 3D morphable models(deep 3DMMs)play an essential role in computer vision.They are used in facial synthesis,compression,reconstruction and animation,avatar creation,virtual try-on,facial recognition systems and medical imaging.These applications require high spatial and perceptual quality of synthesised meshes.Despite their significance,these models have not been compared with different mesh representations and evaluated jointly with point-wise distance and perceptual metrics.Methods We compare the influence of different mesh representation features to various deep 3DMMs on spatial and perceptual fidelity of the reconstructed meshes.This paper proves the hypothesis that building deep 3DMMs from meshes represented with global representations leads to lower spatial reconstruction error measured with L_(1) and L_(2) norm metrics and underperforms on perceptual metrics.In contrast,using differential mesh representations which describe differential surface properties yields lower perceptual FMPD and DAME and higher spatial fidelity error.The influence of mesh feature normalisation and standardisation is also compared and analysed from perceptual and spatial fidelity perspectives.Results The results presented in this paper provide guidance in selecting mesh representations to build deep 3DMMs accordingly to spatial and perceptual quality objectives and propose combinations of mesh representations and deep 3DMMs which improve either perceptual or spatial fidelity of existing methods.展开更多
Fetal health care is vital in ensuring the health of pregnant women and the fetus.Regular check-ups need to be taken by the mother to determine the status of the fetus’growth and identify any potential problems.To kn...Fetal health care is vital in ensuring the health of pregnant women and the fetus.Regular check-ups need to be taken by the mother to determine the status of the fetus’growth and identify any potential problems.To know the status of the fetus,doctors monitor blood reports,Ultrasounds,cardiotocography(CTG)data,etc.Still,in this research,we have considered CTG data,which provides information on heart rate and uterine contractions during pregnancy.Several researchers have proposed various methods for classifying the status of fetus growth.Manual processing of CTG data is time-consuming and unreliable.So,automated tools should be used to classify fetal health.This study proposes a novel neural network-based architecture,the Dynamic Multi-Layer Perceptron model,evaluated from a single layer to several layers to classify fetal health.Various strategies were applied,including pre-processing data using techniques like Balancing,Scaling,Normalization hyperparameter tuning,batch normalization,early stopping,etc.,to enhance the model’s performance.A comparative analysis of the proposed method is done against the traditional machine learning models to showcase its accuracy(97%).An ablation study without any pre-processing techniques is also illustrated.This study easily provides valuable interpretations for healthcare professionals in the decision-making process.展开更多
The article describes a new method for malware classification,based on a Machine Learning(ML)model architecture specifically designed for malware detection,enabling real-time and accurate malware identification.Using ...The article describes a new method for malware classification,based on a Machine Learning(ML)model architecture specifically designed for malware detection,enabling real-time and accurate malware identification.Using an innovative feature dimensionality reduction technique called the Interpolation-based Feature Dimensionality Reduction Technique(IFDRT),the authors have significantly reduced the feature space while retaining critical information necessary for malware classification.This technique optimizes the model’s performance and reduces computational requirements.The proposed method is demonstrated by applying it to the BODMAS malware dataset,which contains 57,293 malware samples and 77,142 benign samples,each with a 2381-feature vector.Through the IFDRT method,the dataset is transformed,reducing the number of features while maintaining essential data for accurate classification.The evaluation results show outstanding performance,with an F1 score of 0.984 and a high accuracy of 98.5%using only two reduced features.This demonstrates the method’s ability to classify malware samples accurately while minimizing processing time.The method allows for improving computational efficiency by reducing the feature space,which decreases the memory and time requirements for training and prediction.The new method’s effectiveness is confirmed by the calculations,which indicate significant improvements in malware classification accuracy and efficiency.The research results enhance existing malware detection techniques and can be applied in various cybersecurity applications,including real-timemalware detection on resource-constrained devices.Novelty and scientific contribution lie in the development of the IFDRT method,which provides a robust and efficient solution for feature reduction in ML-based malware classification,paving the way for more effective and scalable cybersecurity measures.展开更多
Reducing neonatal mortality is a critical global health objective,especially in resource-constrained developing countries.This study employs machine learning(ML)techniques to predict fetal health status based on cardi...Reducing neonatal mortality is a critical global health objective,especially in resource-constrained developing countries.This study employs machine learning(ML)techniques to predict fetal health status based on cardiotocography(CTG)examination findings,utilizing a dataset from the Kaggle repository due to the limited comprehensive healthcare data available in developing nations.Features such as baseline fetal heart rate,uterine contractions,and waveform characteristics were extracted using the RFE wrapper feature engineering technique and scaled with a standard scaler.Six ML models—Logistic Regression(LR),Decision Tree(DT),Random Forest(RF),Gradient Boosting(GB),Categorical Boosting(CB),and Extended Gradient Boosting(XGB)—are trained via cross-validation and evaluated using performance metrics.The developed models were trained via cross-validation and evaluated using ML performance metrics.Eight out of the 21 features selected by GB returned their maximum Matthews Correlation Coefficient(MCC)score of 0.6255,while CB,with 20 of the 21 features,returned the maximum and highest MCC score of 0.6321.The study demonstrated the ability of ML models to predict fetal health conditions from CTG exam results,facilitating early identification of high-risk pregnancies and enabling prompt treatment to prevent severe neonatal outcomes.展开更多
The nature of the measured data varies among different disciplines of geosciences.In rock engineering,features of data play a leading role in determining the feasible methods of its proper manipulation.The present stu...The nature of the measured data varies among different disciplines of geosciences.In rock engineering,features of data play a leading role in determining the feasible methods of its proper manipulation.The present study focuses on resolving one of the major deficiencies of conventional neural networks(NNs)in dealing with rock engineering data.Herein,since the samples are obtained from hundreds of meters below the surface with the utmost difficulty,the number of samples is always limited.Meanwhile,the experimental analysis of these samples may result in many repetitive values and 0 s.However,conventional neural networks are incapable of making robust models in the presence of such data.On the other hand,these networks strongly depend on the initial weights and bias values for making reliable predictions.With this in mind,the current research introduces a novel kind of neural network processing framework for the geological that does not suffer from the limitations of the conventional NNs.The introduced single-data-based feature engineering network extracts all the information wrapped in every single data point without being affected by the other points.This method,being completely different from the conventional NNs,re-arranges all the basic elements of the neuron model into a new structure.Therefore,its mathematical calculations were performed from the very beginning.Moreover,the corresponding programming codes were developed in MATLAB and Python since they could not be found in any common programming software at the time being.This new kind of network was first evaluated through computer-based simulations of rock cracks in the 3 DEC environment.After the model’s reliability was confirmed,it was adopted in two case studies for estimating respectively tensile strength and shear strength of real rock samples.These samples were coal core samples from the Southern Qinshui Basin of China,and gas hydrate-bearing sediment(GHBS)samples from the Nankai Trough of Japan.The coal samples used in the experiments underwent nuclear magnetic resonance(NMR)measurements,and Scanning Electron Microscopy(SEM)imaging to investigate their original micro and macro fractures.Once done with these experiments,measurement of the rock mechanical properties,including tensile strength,was performed using a rock mechanical test system.However,the shear strength of GHBS samples was acquired through triaxial and direct shear tests.According to the obtained result,the new network structure outperformed the conventional neural networks in both cases of simulation-based and case study estimations of the tensile and shear strength.Even though the proposed approach of the current study originally aimed at resolving the issue of having a limited dataset,its unique properties would also be applied to larger datasets from other subsurface measurements.展开更多
A new method of extraction of blend surface feature is presented. It contains two steps: segmentation and recovery of parametric representation of the blend. The segmentation separates the points in the blend region f...A new method of extraction of blend surface feature is presented. It contains two steps: segmentation and recovery of parametric representation of the blend. The segmentation separates the points in the blend region from the rest of the input point cloud with the processes of sampling point data, estimation of local surface curvature properties and comparison of maximum curvature values. The recovery of parametric representation generates a set of profile curves by marching throughout the blend and fitting cylinders. Compared with the existing approaches of blend surface feature extraction, the proposed method reduces the requirement of user interaction and is capable of extracting blend surface with either constant radius or variable radius. Application examples are presented to verify the proposed method.展开更多
Diabetes is increasing commonly in people’s daily life and represents an extraordinary threat to human well-being.Machine Learning(ML)in the healthcare industry has recently made headlines.Several ML models are devel...Diabetes is increasing commonly in people’s daily life and represents an extraordinary threat to human well-being.Machine Learning(ML)in the healthcare industry has recently made headlines.Several ML models are developed around different datasets for diabetic prediction.It is essential for ML models to predict diabetes accurately.Highly informative features of the dataset are vital to determine the capability factors of the model in the prediction of diabetes.Feature engineering(FE)is the way of taking forward in yielding highly informative features.Pima Indian Diabetes Dataset(PIDD)is used in this work,and the impact of informative features in ML models is experimented with and analyzed for the prediction of diabetes.Missing values(MV)and the effect of the imputation process in the data distribution of each feature are analyzed.Permutation importance and partial dependence are carried out extensively and the results revealed that Glucose(GLUC),Body Mass Index(BMI),and Insulin(INS)are highly informative features.Derived features are obtained for BMI and INS to add more information with its raw form.The ensemble classifier with an ensemble of AdaBoost(AB)and XGBoost(XB)is considered for the impact analysis of the proposed FE approach.The ensemble model performs well for the inclusion of derived features provided the high Diagnostics Odds Ratio(DOR)of 117.694.This shows a high margin of 8.2%when compared with the ensemble model with no derived features(DOR=96.306)included in the experiment.The inclusion of derived features with the FE approach of the current state-of-the-art made the ensemble model performs well with Sensitivity(0.793),Specificity(0.945),DOR(79.517),and False Omission Rate(0.090)which further improves the state-of-the-art results.展开更多
With the emergence of massive online courses,how to evaluate the quality of courses with different qualities to improve the discrimination between courses and recommend personalized online course learning resources fo...With the emergence of massive online courses,how to evaluate the quality of courses with different qualities to improve the discrimination between courses and recommend personalized online course learning resources for learners needs to be evaluated from all aspects.In this paper,a method of constructing an online course portrait based on feature engineering is proposed.Firstly,the framework of online course portrait is established,the related features of the portrait are extracted by feature engineering method,and then the indicator weights of the portrait are calculated by entropy weight method.Finally,experiments are designed to evaluate the performance of the algorithms,and an example of the course portrait is given.展开更多
The performance of the metal halide perovskite solar cells(PSCs)highly relies on the experimental parameters,including the fabrication processes and the compositions of the perovskites;tremendous experimental work has...The performance of the metal halide perovskite solar cells(PSCs)highly relies on the experimental parameters,including the fabrication processes and the compositions of the perovskites;tremendous experimental work has been done to optimize these factors.However,predicting the device performance of the PSCs from the fabrication parameters before experiments is still challenging.Herein,we bridge this gap by machine learning(ML)based on a dataset including 1072 devices from peer-reviewed publications.The optimized ML model accurately predicts the PCE from the experimental parameters with a root mean square error of 1.28%and a Pearson coefficientr of 0.768.Moreover,the factors governing the device performance are ranked by shapley additive explanations(SHAP),among which,A-site cation is crucial to getting highly efficient PSCs.Experiments and density functional theory calculations are employed to validate and help explain the predicting results by the ML model.Our work reveals the feasibility of ML in predicting the device performance from the experimental parameters before experiments,which enables the reverse experimental design toward highly efficient PSCs.展开更多
With the frequent occurrence of telecommunications and networkfraud crimes in recent years, new frauds have emerged one after another whichhas caused huge losses to the people. However, due to the lack of an effective...With the frequent occurrence of telecommunications and networkfraud crimes in recent years, new frauds have emerged one after another whichhas caused huge losses to the people. However, due to the lack of an effectivepreventive mechanism, the police are often in a passive position. Usingtechnologies such as web crawlers, feature engineering, deep learning, andartificial intelligence, this paper proposes a user portrait fraudwarning schemebased on Weibo public data. First, we perform preliminary screening andcleaning based on the keyword “defrauded” to obtain valid fraudulent userIdentity Documents (IDs). The basic information and account information ofthese users is user-labeled to achieve the purpose of distinguishing the typesof fraud. Secondly, through feature engineering technologies such as avatarrecognition, Artificial Intelligence (AI) sentiment analysis, data screening,and follower blogger type analysis, these pictures and texts will be abstractedinto user preferences and personality characteristics which integrate multidimensionalinformation to build user portraits. Third, deep neural networktraining is performed on the cube. 80% percent of the data is predicted basedon the N-way K-shot problem and used to train the model, and the remaining20% is used for model accuracy evaluation. Experiments have shown thatFew-short learning has higher accuracy compared with Long Short TermMemory (LSTM), Recurrent Neural Networks (RNN) and ConvolutionalNeural Network (CNN). On this basis, this paper develops a WeChat smallprogram for early warning of telecommunications network fraud based onuser portraits. When the user enters some personal information on the frontend, the back-end database can perform correlation analysis by itself, so as tomatch the most likely fraud types and give relevant early warning information.The fraud warning model is highly scaleable. The data of other Applications(APPs) can be extended to further improve the efficiency of anti-fraud whichhas extremely high public welfare value.展开更多
As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs...As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs efficient information and communication technology(ICT)and cloud computing.As a result of the complicated architecture of cloud computing,the distinctive working of advanced metering infrastructures(AMI),and the use of sensitive data,it has become challenging tomake the SG secure.Faults of the SG are categorized into two main categories,Technical Losses(TLs)and Non-Technical Losses(NTLs).Hardware failure,communication issues,ohmic losses,and energy burnout during transmission and propagation of energy are TLs.NTL’s are human-induced errors for malicious purposes such as attacking sensitive data and electricity theft,along with tampering with AMI for bill reduction by fraudulent customers.This research proposes a data-driven methodology based on principles of computational intelligence as well as big data analysis to identify fraudulent customers based on their load profile.In our proposed methodology,a hybrid Genetic Algorithm and Support Vector Machine(GA-SVM)model has been used to extract the relevant subset of feature data from a large and unsupervised public smart grid project dataset in London,UK,for theft detection.A subset of 26 out of 71 features is obtained with a classification accuracy of 96.6%,compared to studies conducted on small and limited datasets.展开更多
A probabilistic multi-dimensional selective ensemble learning method and its application in the prediction of users' online purchase behavior are studied in this work.Firstly, the classifier is integrated based on...A probabilistic multi-dimensional selective ensemble learning method and its application in the prediction of users' online purchase behavior are studied in this work.Firstly, the classifier is integrated based on the dimension of predicted probability, and the pruning algorithm based on greedy forward search is obtained by combining the two indicators of accuracy and complementarity.Then the pruning algorithm is integrated into the Stacking ensemble method to establish a user online shopping behavior prediction model based on the probabilistic multi-dimensional selective ensemble method.Finally, the research method is compared with the prediction results of individual learners in ensemble learning and the Stacking ensemble method without pruning.The experimental results show that the proposed method can reduce the scale of integration, improve the prediction accuracy of the model, and predict the user's online purchase behavior.展开更多
Tax fraud is one of the substantial issues affecting governments around the world.It is defined as the intentional alteration of information provided on a tax return to reduce someone’s tax liability.This is done by ...Tax fraud is one of the substantial issues affecting governments around the world.It is defined as the intentional alteration of information provided on a tax return to reduce someone’s tax liability.This is done by either reducing sales or increasing purchases.According to recent studies,governments lose over$500 billion annually due to tax fraud.A loss of this magnitude motivates tax authorities worldwide to implement efficient fraud detection strategies.Most of the work done in tax fraud using machine learning is centered on supervised models.A significant drawback of this approach is that it requires tax returns that have been previously audited,which constitutes a small percentage of the data.Other strategies focus on using unsupervised models that utilize the whole data when they search for patterns,though ignore whether the tax returns are fraudulent or not.Therefore,unsupervised models are limited in their usefulness if they are used independently to detect tax fraud.The work done in this paper focuses on addressing such limitations by proposing a fraud detection framework that utilizes supervised and unsupervised models to exploit the entire set of tax returns.The framework consists of four modules:A supervised module,which utilizes a tree-based model to extract knowledge from the data;an unsupervised module,which calculates anomaly scores;a behavioral module,which assigns a compliance score for each taxpayer;and a prediction module,which utilizes the output of the previous modules to output a probability of fraud for each tax return.We demonstrate the effectiveness of our framework by testing it on existent tax returns provided by the Saudi tax authority.展开更多
Massive open online courses(MOOCs)have become a way of online learning across the world in the past few years.However,the extremely high dropout rate has brought many challenges to the development of online learning.M...Massive open online courses(MOOCs)have become a way of online learning across the world in the past few years.However,the extremely high dropout rate has brought many challenges to the development of online learning.Most of the current methods have low accuracy and poor generalization ability when dealing with high-dimensional dropout features.They focus on the analysis of the learning score and check result of online course,but neglect the phased student behaviors.Besides,the status of student participation at a given moment is necessarily impacted by the prior status of learning.To address these issues,this paper has proposed an ensemble learning model for early dropout prediction(ELM-EDP)that integrates attention-based document representation as a vector(A-Doc2vec),feature learning of course difficulty,and weighted soft voting ensemble with heterogeneous classifiers(WSV-HC).First,A-Doc2vec is proposed to learn sequence features of student behaviors of watching lecture videos and completing course assignments.It also captures the relationship between courses and videos.Then,a feature learning method is proposed to reduce the interference caused by the differences of course difficulty on the dropout prediction.Finally,WSV-HC is proposed to highlight the benefits of integration strategies of boosting and bagging.Experiments on the MOOCCube2020 dataset show that the high accuracy of our ELM-EDP has better results on Accuracy,Precision,Recall,and F1.展开更多
In recent years, the number of cases of heart disease has been greatly increasing, and heart disease is associated with a high mortality rate. Moreover, with the development of technologies, some advanced types of equ...In recent years, the number of cases of heart disease has been greatly increasing, and heart disease is associated with a high mortality rate. Moreover, with the development of technologies, some advanced types of equipment were invented to help patients measure health conditions at home and predict the risks of having heart disease. The research aims to find the accuracy of self-measurable physical health indicators compared to all indicators measured by healthcare providers in predicting heart disease using five machine learning models. Five models were used to predict heart disease, including Logistics Regression, K Nearest Neighbors, Support Vector Model, Decision tree, and Random Forest. The database used for the research contains 13 types of health test results and the risks of having heart disease for 303 patients. All matrices consisted of all 13 test results, while the home matrices included 6 results that could test at home. After constructing five models for both the home matrices and all matrices, the accuracy score and false negative rate were computed for every five models. The results showed all matrices had higher accuracy scores than home matrices in all five models. The false negative rates were lower or equal for all matrices than home matrices for five machine learning models. The conclusion was drawn from the results that home-measured physical health indicators were less accurate than all physical indicators in predicting patients’ risk for heart disease. Therefore, without the future development of home-testable indicators, all physical health indicators are preferred in measuring the risk for heart diseases.展开更多
Cryptocurrency price prediction has garnered significant attention due to the growing importance of digital assets in the financial landscape. This paper presents a comprehensive study on predicting future cryptocurre...Cryptocurrency price prediction has garnered significant attention due to the growing importance of digital assets in the financial landscape. This paper presents a comprehensive study on predicting future cryptocurrency prices using machine learning algorithms. Open-source historical data from various cryptocurrency exchanges is utilized. Interpolation techniques are employed to handle missing data, ensuring the completeness and reliability of the dataset. Four technical indicators are selected as features for prediction. The study explores the application of five machine learning algorithms to capture the complex patterns in the highly volatile cryptocurrency market. The findings demonstrate the strengths and limitations of the different approaches, highlighting the significance of feature engineering and algorithm selection in achieving accurate cryptocurrency price predictions. The research contributes valuable insights into the dynamic and rapidly evolving field of cryptocurrency price prediction, assisting investors and traders in making informed decisions amidst the challenges posed by the cryptocurrency market.展开更多
文摘To overcome the problem of imprecise and unclear information in the development of quality functions,a method for determining the priority of engineering features based on mixed linguistic variables is proposed.First,the evaluation member uses the determined linguistic variable to give the correlation strength evaluation matrix of customer requirements and engineering features.Secondly,the relative importance of the evaluation member and customer requirements are aggregated.Finally,the priority of engineering features is obtained by calculating the deviation.The feasibility and practicability of this method are proven by taking the design of a new product of a long bag low-pressure pulse dust collector as an example.
文摘The geological features of three types of tropical volcanic rock and soil distributed along Jakarta-Bandung high-speed railway(HSR),including pozzolanic clayey soil,mud shale and deep soft soil,are studied through field and laboratory tests.The paper analyzes the mechanism and causes of engineering geological problems caused by tropical volcanic rock and soil and puts forward measures to control subgrade slope instability by rationally determining project type,making side slope stability control and strengthening waterproofing and drainage.The“zero front slope”tunneling technology at the portal,the simplified excavation method of double-side wall heading and the cross brace construction method of arch protection within the semi-open cut row pile frame in the“mountainside”eccentrically loaded soft soil stratum are adopted to control the instability of tunnel side and front slopes,foundation pits and working faces;CFG or pipe piles shall be used to reinforce soft and expansive foundation or replacement measures shall be taken,and the scheme of blind ditch+double-layer water sealing in ballastless track section shall be put forward to prevent arching deformation of foundation;the treatment measures of CFG pile,pipe pile and vacuum combined piled preloading are adopted to improve the bearing capacity of foundation in deep soft soil section and solve the problems of settlement control and uneven settlement.These engineering countermeasures have been applied during the construction of Jakarta-Bandung HSR and achieved good results.
基金funded by China Scholarship Council.The fund number is 202108320111 and 202208320055。
文摘State of health(SOH)estimation of e-mobilities operated in real and dynamic conditions is essential and challenging.Most of existing estimations are based on a fixed constant current charging and discharging aging profiles,which overlooked the fact that the charging and discharging profiles are random and not complete in real application.This work investigates the influence of feature engineering on the accuracy of different machine learning(ML)-based SOH estimations acting on different recharging sub-profiles where a realistic battery mission profile is considered.Fifteen features were extracted from the battery partial recharging profiles,considering different factors such as starting voltage values,charge amount,and charging sliding windows.Then,features were selected based on a feature selection pipeline consisting of filtering and supervised ML-based subset selection.Multiple linear regression(MLR),Gaussian process regression(GPR),and support vector regression(SVR)were applied to estimate SOH,and root mean square error(RMSE)was used to evaluate and compare the estimation performance.The results showed that the feature selection pipeline can improve SOH estimation accuracy by 55.05%,2.57%,and 2.82%for MLR,GPR and SVR respectively.It was demonstrated that the estimation based on partial charging profiles with lower starting voltage,large charge,and large sliding window size is more likely to achieve higher accuracy.This work hopes to give some insights into the supervised ML-based feature engineering acting on random partial recharges on SOH estimation performance and tries to fill the gap of effective SOH estimation between theoretical study and real dynamic application.
基金Supported by the Centre for Digital Entertainment at Bournemouth University by the UK Engineering and Physical Sciences Research Council(EPSRC)EP/L016540/1 and Humain Ltd.
文摘Background Deep 3D morphable models(deep 3DMMs)play an essential role in computer vision.They are used in facial synthesis,compression,reconstruction and animation,avatar creation,virtual try-on,facial recognition systems and medical imaging.These applications require high spatial and perceptual quality of synthesised meshes.Despite their significance,these models have not been compared with different mesh representations and evaluated jointly with point-wise distance and perceptual metrics.Methods We compare the influence of different mesh representation features to various deep 3DMMs on spatial and perceptual fidelity of the reconstructed meshes.This paper proves the hypothesis that building deep 3DMMs from meshes represented with global representations leads to lower spatial reconstruction error measured with L_(1) and L_(2) norm metrics and underperforms on perceptual metrics.In contrast,using differential mesh representations which describe differential surface properties yields lower perceptual FMPD and DAME and higher spatial fidelity error.The influence of mesh feature normalisation and standardisation is also compared and analysed from perceptual and spatial fidelity perspectives.Results The results presented in this paper provide guidance in selecting mesh representations to build deep 3DMMs accordingly to spatial and perceptual quality objectives and propose combinations of mesh representations and deep 3DMMs which improve either perceptual or spatial fidelity of existing methods.
基金This work was supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(NRF-2023R1A2C1005950)Jana Shafi is supported via funding from Prince Sattam bin Abdulaziz University Project Number(PSAU/2024/R/1445).
文摘Fetal health care is vital in ensuring the health of pregnant women and the fetus.Regular check-ups need to be taken by the mother to determine the status of the fetus’growth and identify any potential problems.To know the status of the fetus,doctors monitor blood reports,Ultrasounds,cardiotocography(CTG)data,etc.Still,in this research,we have considered CTG data,which provides information on heart rate and uterine contractions during pregnancy.Several researchers have proposed various methods for classifying the status of fetus growth.Manual processing of CTG data is time-consuming and unreliable.So,automated tools should be used to classify fetal health.This study proposes a novel neural network-based architecture,the Dynamic Multi-Layer Perceptron model,evaluated from a single layer to several layers to classify fetal health.Various strategies were applied,including pre-processing data using techniques like Balancing,Scaling,Normalization hyperparameter tuning,batch normalization,early stopping,etc.,to enhance the model’s performance.A comparative analysis of the proposed method is done against the traditional machine learning models to showcase its accuracy(97%).An ablation study without any pre-processing techniques is also illustrated.This study easily provides valuable interpretations for healthcare professionals in the decision-making process.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2024R435),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘The article describes a new method for malware classification,based on a Machine Learning(ML)model architecture specifically designed for malware detection,enabling real-time and accurate malware identification.Using an innovative feature dimensionality reduction technique called the Interpolation-based Feature Dimensionality Reduction Technique(IFDRT),the authors have significantly reduced the feature space while retaining critical information necessary for malware classification.This technique optimizes the model’s performance and reduces computational requirements.The proposed method is demonstrated by applying it to the BODMAS malware dataset,which contains 57,293 malware samples and 77,142 benign samples,each with a 2381-feature vector.Through the IFDRT method,the dataset is transformed,reducing the number of features while maintaining essential data for accurate classification.The evaluation results show outstanding performance,with an F1 score of 0.984 and a high accuracy of 98.5%using only two reduced features.This demonstrates the method’s ability to classify malware samples accurately while minimizing processing time.The method allows for improving computational efficiency by reducing the feature space,which decreases the memory and time requirements for training and prediction.The new method’s effectiveness is confirmed by the calculations,which indicate significant improvements in malware classification accuracy and efficiency.The research results enhance existing malware detection techniques and can be applied in various cybersecurity applications,including real-timemalware detection on resource-constrained devices.Novelty and scientific contribution lie in the development of the IFDRT method,which provides a robust and efficient solution for feature reduction in ML-based malware classification,paving the way for more effective and scalable cybersecurity measures.
文摘Reducing neonatal mortality is a critical global health objective,especially in resource-constrained developing countries.This study employs machine learning(ML)techniques to predict fetal health status based on cardiotocography(CTG)examination findings,utilizing a dataset from the Kaggle repository due to the limited comprehensive healthcare data available in developing nations.Features such as baseline fetal heart rate,uterine contractions,and waveform characteristics were extracted using the RFE wrapper feature engineering technique and scaled with a standard scaler.Six ML models—Logistic Regression(LR),Decision Tree(DT),Random Forest(RF),Gradient Boosting(GB),Categorical Boosting(CB),and Extended Gradient Boosting(XGB)—are trained via cross-validation and evaluated using performance metrics.The developed models were trained via cross-validation and evaluated using ML performance metrics.Eight out of the 21 features selected by GB returned their maximum Matthews Correlation Coefficient(MCC)score of 0.6255,while CB,with 20 of the 21 features,returned the maximum and highest MCC score of 0.6321.The study demonstrated the ability of ML models to predict fetal health conditions from CTG exam results,facilitating early identification of high-risk pregnancies and enabling prompt treatment to prevent severe neonatal outcomes.
文摘The nature of the measured data varies among different disciplines of geosciences.In rock engineering,features of data play a leading role in determining the feasible methods of its proper manipulation.The present study focuses on resolving one of the major deficiencies of conventional neural networks(NNs)in dealing with rock engineering data.Herein,since the samples are obtained from hundreds of meters below the surface with the utmost difficulty,the number of samples is always limited.Meanwhile,the experimental analysis of these samples may result in many repetitive values and 0 s.However,conventional neural networks are incapable of making robust models in the presence of such data.On the other hand,these networks strongly depend on the initial weights and bias values for making reliable predictions.With this in mind,the current research introduces a novel kind of neural network processing framework for the geological that does not suffer from the limitations of the conventional NNs.The introduced single-data-based feature engineering network extracts all the information wrapped in every single data point without being affected by the other points.This method,being completely different from the conventional NNs,re-arranges all the basic elements of the neuron model into a new structure.Therefore,its mathematical calculations were performed from the very beginning.Moreover,the corresponding programming codes were developed in MATLAB and Python since they could not be found in any common programming software at the time being.This new kind of network was first evaluated through computer-based simulations of rock cracks in the 3 DEC environment.After the model’s reliability was confirmed,it was adopted in two case studies for estimating respectively tensile strength and shear strength of real rock samples.These samples were coal core samples from the Southern Qinshui Basin of China,and gas hydrate-bearing sediment(GHBS)samples from the Nankai Trough of Japan.The coal samples used in the experiments underwent nuclear magnetic resonance(NMR)measurements,and Scanning Electron Microscopy(SEM)imaging to investigate their original micro and macro fractures.Once done with these experiments,measurement of the rock mechanical properties,including tensile strength,was performed using a rock mechanical test system.However,the shear strength of GHBS samples was acquired through triaxial and direct shear tests.According to the obtained result,the new network structure outperformed the conventional neural networks in both cases of simulation-based and case study estimations of the tensile and shear strength.Even though the proposed approach of the current study originally aimed at resolving the issue of having a limited dataset,its unique properties would also be applied to larger datasets from other subsurface measurements.
基金This project is supported by General Electric Corporate ResearchDevelopment and National Advanced Technology Project of China (No.863-511-942-018).
文摘A new method of extraction of blend surface feature is presented. It contains two steps: segmentation and recovery of parametric representation of the blend. The segmentation separates the points in the blend region from the rest of the input point cloud with the processes of sampling point data, estimation of local surface curvature properties and comparison of maximum curvature values. The recovery of parametric representation generates a set of profile curves by marching throughout the blend and fitting cylinders. Compared with the existing approaches of blend surface feature extraction, the proposed method reduces the requirement of user interaction and is capable of extracting blend surface with either constant radius or variable radius. Application examples are presented to verify the proposed method.
文摘Diabetes is increasing commonly in people’s daily life and represents an extraordinary threat to human well-being.Machine Learning(ML)in the healthcare industry has recently made headlines.Several ML models are developed around different datasets for diabetic prediction.It is essential for ML models to predict diabetes accurately.Highly informative features of the dataset are vital to determine the capability factors of the model in the prediction of diabetes.Feature engineering(FE)is the way of taking forward in yielding highly informative features.Pima Indian Diabetes Dataset(PIDD)is used in this work,and the impact of informative features in ML models is experimented with and analyzed for the prediction of diabetes.Missing values(MV)and the effect of the imputation process in the data distribution of each feature are analyzed.Permutation importance and partial dependence are carried out extensively and the results revealed that Glucose(GLUC),Body Mass Index(BMI),and Insulin(INS)are highly informative features.Derived features are obtained for BMI and INS to add more information with its raw form.The ensemble classifier with an ensemble of AdaBoost(AB)and XGBoost(XB)is considered for the impact analysis of the proposed FE approach.The ensemble model performs well for the inclusion of derived features provided the high Diagnostics Odds Ratio(DOR)of 117.694.This shows a high margin of 8.2%when compared with the ensemble model with no derived features(DOR=96.306)included in the experiment.The inclusion of derived features with the FE approach of the current state-of-the-art made the ensemble model performs well with Sensitivity(0.793),Specificity(0.945),DOR(79.517),and False Omission Rate(0.090)which further improves the state-of-the-art results.
基金This work is supported by the National Key Research and Development Program of China(Grant No.2020AAA0108803).
文摘With the emergence of massive online courses,how to evaluate the quality of courses with different qualities to improve the discrimination between courses and recommend personalized online course learning resources for learners needs to be evaluated from all aspects.In this paper,a method of constructing an online course portrait based on feature engineering is proposed.Firstly,the framework of online course portrait is established,the related features of the portrait are extracted by feature engineering method,and then the indicator weights of the portrait are calculated by entropy weight method.Finally,experiments are designed to evaluate the performance of the algorithms,and an example of the course portrait is given.
基金the National Natural Science Foundation of China(Grant No.62075006)the National Key Research and Development Program of China(Grant No.2021YFB3600403)the Natural Science Talents Foundation(Grant No.KSRC22001532)。
文摘The performance of the metal halide perovskite solar cells(PSCs)highly relies on the experimental parameters,including the fabrication processes and the compositions of the perovskites;tremendous experimental work has been done to optimize these factors.However,predicting the device performance of the PSCs from the fabrication parameters before experiments is still challenging.Herein,we bridge this gap by machine learning(ML)based on a dataset including 1072 devices from peer-reviewed publications.The optimized ML model accurately predicts the PCE from the experimental parameters with a root mean square error of 1.28%and a Pearson coefficientr of 0.768.Moreover,the factors governing the device performance are ranked by shapley additive explanations(SHAP),among which,A-site cation is crucial to getting highly efficient PSCs.Experiments and density functional theory calculations are employed to validate and help explain the predicting results by the ML model.Our work reveals the feasibility of ML in predicting the device performance from the experimental parameters before experiments,which enables the reverse experimental design toward highly efficient PSCs.
文摘With the frequent occurrence of telecommunications and networkfraud crimes in recent years, new frauds have emerged one after another whichhas caused huge losses to the people. However, due to the lack of an effectivepreventive mechanism, the police are often in a passive position. Usingtechnologies such as web crawlers, feature engineering, deep learning, andartificial intelligence, this paper proposes a user portrait fraudwarning schemebased on Weibo public data. First, we perform preliminary screening andcleaning based on the keyword “defrauded” to obtain valid fraudulent userIdentity Documents (IDs). The basic information and account information ofthese users is user-labeled to achieve the purpose of distinguishing the typesof fraud. Secondly, through feature engineering technologies such as avatarrecognition, Artificial Intelligence (AI) sentiment analysis, data screening,and follower blogger type analysis, these pictures and texts will be abstractedinto user preferences and personality characteristics which integrate multidimensionalinformation to build user portraits. Third, deep neural networktraining is performed on the cube. 80% percent of the data is predicted basedon the N-way K-shot problem and used to train the model, and the remaining20% is used for model accuracy evaluation. Experiments have shown thatFew-short learning has higher accuracy compared with Long Short TermMemory (LSTM), Recurrent Neural Networks (RNN) and ConvolutionalNeural Network (CNN). On this basis, this paper develops a WeChat smallprogram for early warning of telecommunications network fraud based onuser portraits. When the user enters some personal information on the frontend, the back-end database can perform correlation analysis by itself, so as tomatch the most likely fraud types and give relevant early warning information.The fraud warning model is highly scaleable. The data of other Applications(APPs) can be extended to further improve the efficiency of anti-fraud whichhas extremely high public welfare value.
基金This research is funded by Fayoum University,Egypt.
文摘As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs efficient information and communication technology(ICT)and cloud computing.As a result of the complicated architecture of cloud computing,the distinctive working of advanced metering infrastructures(AMI),and the use of sensitive data,it has become challenging tomake the SG secure.Faults of the SG are categorized into two main categories,Technical Losses(TLs)and Non-Technical Losses(NTLs).Hardware failure,communication issues,ohmic losses,and energy burnout during transmission and propagation of energy are TLs.NTL’s are human-induced errors for malicious purposes such as attacking sensitive data and electricity theft,along with tampering with AMI for bill reduction by fraudulent customers.This research proposes a data-driven methodology based on principles of computational intelligence as well as big data analysis to identify fraudulent customers based on their load profile.In our proposed methodology,a hybrid Genetic Algorithm and Support Vector Machine(GA-SVM)model has been used to extract the relevant subset of feature data from a large and unsupervised public smart grid project dataset in London,UK,for theft detection.A subset of 26 out of 71 features is obtained with a classification accuracy of 96.6%,compared to studies conducted on small and limited datasets.
基金Supported by the Scientific Research Foundation of Liaoning Provincial Department of Education (No.LJKZ0139)。
文摘A probabilistic multi-dimensional selective ensemble learning method and its application in the prediction of users' online purchase behavior are studied in this work.Firstly, the classifier is integrated based on the dimension of predicted probability, and the pruning algorithm based on greedy forward search is obtained by combining the two indicators of accuracy and complementarity.Then the pruning algorithm is integrated into the Stacking ensemble method to establish a user online shopping behavior prediction model based on the probabilistic multi-dimensional selective ensemble method.Finally, the research method is compared with the prediction results of individual learners in ensemble learning and the Stacking ensemble method without pruning.The experimental results show that the proposed method can reduce the scale of integration, improve the prediction accuracy of the model, and predict the user's online purchase behavior.
基金This work was supported by ZATCAThe author is grateful for the help provided by the risk and intelligence department as well as the continued support of the governor for advancing the field of AI and machine learning in government entities。
文摘Tax fraud is one of the substantial issues affecting governments around the world.It is defined as the intentional alteration of information provided on a tax return to reduce someone’s tax liability.This is done by either reducing sales or increasing purchases.According to recent studies,governments lose over$500 billion annually due to tax fraud.A loss of this magnitude motivates tax authorities worldwide to implement efficient fraud detection strategies.Most of the work done in tax fraud using machine learning is centered on supervised models.A significant drawback of this approach is that it requires tax returns that have been previously audited,which constitutes a small percentage of the data.Other strategies focus on using unsupervised models that utilize the whole data when they search for patterns,though ignore whether the tax returns are fraudulent or not.Therefore,unsupervised models are limited in their usefulness if they are used independently to detect tax fraud.The work done in this paper focuses on addressing such limitations by proposing a fraud detection framework that utilizes supervised and unsupervised models to exploit the entire set of tax returns.The framework consists of four modules:A supervised module,which utilizes a tree-based model to extract knowledge from the data;an unsupervised module,which calculates anomaly scores;a behavioral module,which assigns a compliance score for each taxpayer;and a prediction module,which utilizes the output of the previous modules to output a probability of fraud for each tax return.We demonstrate the effectiveness of our framework by testing it on existent tax returns provided by the Saudi tax authority.
基金supported by the National Natural Science Foundation of China(No.61772231)the Natural Science Foundation of Shandong Province(No.ZR2022LZH016&No.ZR2017MF025)+3 种基金the Project of Shandong Provincial Social Science Program(No.18CHLJ39)the Shandong Provincial Key R&D Program of China(No.2021CXGC010103)the Shandong Provincial Teaching Research Project of Graduate Education(No.SDYAL2022102&No.SDYJG21034)the Teaching Research Project of University of Jinan(No.JZ2212)。
文摘Massive open online courses(MOOCs)have become a way of online learning across the world in the past few years.However,the extremely high dropout rate has brought many challenges to the development of online learning.Most of the current methods have low accuracy and poor generalization ability when dealing with high-dimensional dropout features.They focus on the analysis of the learning score and check result of online course,but neglect the phased student behaviors.Besides,the status of student participation at a given moment is necessarily impacted by the prior status of learning.To address these issues,this paper has proposed an ensemble learning model for early dropout prediction(ELM-EDP)that integrates attention-based document representation as a vector(A-Doc2vec),feature learning of course difficulty,and weighted soft voting ensemble with heterogeneous classifiers(WSV-HC).First,A-Doc2vec is proposed to learn sequence features of student behaviors of watching lecture videos and completing course assignments.It also captures the relationship between courses and videos.Then,a feature learning method is proposed to reduce the interference caused by the differences of course difficulty on the dropout prediction.Finally,WSV-HC is proposed to highlight the benefits of integration strategies of boosting and bagging.Experiments on the MOOCCube2020 dataset show that the high accuracy of our ELM-EDP has better results on Accuracy,Precision,Recall,and F1.
文摘In recent years, the number of cases of heart disease has been greatly increasing, and heart disease is associated with a high mortality rate. Moreover, with the development of technologies, some advanced types of equipment were invented to help patients measure health conditions at home and predict the risks of having heart disease. The research aims to find the accuracy of self-measurable physical health indicators compared to all indicators measured by healthcare providers in predicting heart disease using five machine learning models. Five models were used to predict heart disease, including Logistics Regression, K Nearest Neighbors, Support Vector Model, Decision tree, and Random Forest. The database used for the research contains 13 types of health test results and the risks of having heart disease for 303 patients. All matrices consisted of all 13 test results, while the home matrices included 6 results that could test at home. After constructing five models for both the home matrices and all matrices, the accuracy score and false negative rate were computed for every five models. The results showed all matrices had higher accuracy scores than home matrices in all five models. The false negative rates were lower or equal for all matrices than home matrices for five machine learning models. The conclusion was drawn from the results that home-measured physical health indicators were less accurate than all physical indicators in predicting patients’ risk for heart disease. Therefore, without the future development of home-testable indicators, all physical health indicators are preferred in measuring the risk for heart diseases.
文摘Cryptocurrency price prediction has garnered significant attention due to the growing importance of digital assets in the financial landscape. This paper presents a comprehensive study on predicting future cryptocurrency prices using machine learning algorithms. Open-source historical data from various cryptocurrency exchanges is utilized. Interpolation techniques are employed to handle missing data, ensuring the completeness and reliability of the dataset. Four technical indicators are selected as features for prediction. The study explores the application of five machine learning algorithms to capture the complex patterns in the highly volatile cryptocurrency market. The findings demonstrate the strengths and limitations of the different approaches, highlighting the significance of feature engineering and algorithm selection in achieving accurate cryptocurrency price predictions. The research contributes valuable insights into the dynamic and rapidly evolving field of cryptocurrency price prediction, assisting investors and traders in making informed decisions amidst the challenges posed by the cryptocurrency market.