With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorith...With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.展开更多
Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the op...Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the operating environment of acid production with flue gas is complex and there is much equipment.The data obtained by the detection equipment is seriously polluted and prone to abnormal phenomena such as data loss and outliers.Therefore,to solve the problem of abnormal data in the process of acid production with flue gas,a data cleaning method based on improved random forest is proposed.Firstly,an outlier data recognition model based on isolation forest is designed to identify and eliminate the outliers in the dataset.Secondly,an improved random forest regression model is established.Genetic algorithm is used to optimize the hyperparameters of the random forest regression model.Then the optimal parameter combination is found in the search space and the trend of data is predicted.Finally,the improved random forest data cleaning method is used to compensate for the missing data after eliminating abnormal data and the data cleaning is realized.Results show that the proposed method can accurately eliminate and compensate for the abnormal data in the process of acid production with flue gas.The method improves the accuracy of compensation for missing data.With the data after cleaning,a more accurate model can be established,which is significant to the subsequent temperature control.The conversion rate of SO_(2) can be further improved,thereby improving the yield of sulfuric acid and economic benefits.展开更多
It is not easy to construct a model to describe the geochemical background in geochemical anomaly detection due to the complexity of the geological setting.Isolation forest and its improved algorithms can detect geoch...It is not easy to construct a model to describe the geochemical background in geochemical anomaly detection due to the complexity of the geological setting.Isolation forest and its improved algorithms can detect geochemical anomalies without modeling the complex geochemical background.These methods can effec-tively extract multivariate anomalies from large volume of high-dimensional geochemical data with unknown population distribution.To test the performance of these algorithms in the detection of mineralization-related geochemical anomalies,the isolation forest,extended isolation forest and generalized isolation forest models were established to detect multivariate anomalies from the stream sediment survey data collected in the Wu-laga area in Heilongjiang Province.The geochemical anomalies detected by the generalized isolation forest model account for 40%of the study area,and contain 100%of the known gold deposits.The geochemical anomalies detected by the isolation forest model account for 20%of the study area,and contain 71%of the known gold deposits.The geochemical anomalies detected by the extended isolation forest algorithm account for 34%of the study area,and contain 100%of the known gold deposits.Therefore,the isolation forest mo-del,extended isolation fo-rest model and generalized isolation forest model are comparable in geochemical anomaly detection.展开更多
The cigarette detection data contains a large amount of true sample data and a small amount of false sample data. The false sample data is regarded as abnormal data, and anomaly detection is performed to realize the i...The cigarette detection data contains a large amount of true sample data and a small amount of false sample data. The false sample data is regarded as abnormal data, and anomaly detection is performed to realize the identification of real and fake cigarettes. Binary particle swarm optimization algorithm is used to improve the isolation forest construction process, and isolation trees with high precision and large differences are selected, which improves the accuracy and efficiency of the algorithm. The distance between the obtained anomaly score and the clustering center of the k-means algorithm is used as the threshold for anomaly judgment. The experimental results show that the accuracy of the BPSO-iForest algorithm is improved compared with the standard iForest algorithm. The experimental results of multiple brand samples also show that the method in this paper can accurately use the detection data for authenticity identification.展开更多
With the high-speed development of decentralized applications,account-based blockchain platforms have become a hotbed of various financial scams and hacks due to their anonymity and high financial value.Financial secu...With the high-speed development of decentralized applications,account-based blockchain platforms have become a hotbed of various financial scams and hacks due to their anonymity and high financial value.Financial security has become a top priority with the sustainable development of blockchain-based platforms because of an increasing number of cyber attacks,which have resulted in a huge loss of crypto assets in recent years.Therefore,it is imperative to study the real-time detection of cyber attacks to facilitate effective supervision and regulation.To this end,this paper proposes the weighted and extended isolation forest algorithms and designs a novel framework for the real-time detection of cyber-attack transactions by thoroughly studying and summarizing real-world examples.Furthermore,this study develops a new detection approach for locating the compromised address of a cyber attack to resolve the data scarcity of hack addresses and reduce time consumption.Moreover,three experiments are carried out not only to apply on different types of cyber attacks but also to compare the proposed approach with the widely used existing methods.The results demonstrate the high efficiency and generality of the proposed approach.Finally,the lower time consumption and robustness of our method were validated through additional experiments.In conclusion,the proposed blockchain-oriented approach in this study can handle real-time detection of cyber attacks and has significant scope for applications.展开更多
Parking space is usually very limited in major cities,especially Cairo,leading to traffic congestion,air pollution,and driver frustration.Existing car parking systems tend to tackle parking issues in a non-digitized m...Parking space is usually very limited in major cities,especially Cairo,leading to traffic congestion,air pollution,and driver frustration.Existing car parking systems tend to tackle parking issues in a non-digitized manner.These systems require the drivers to search for an empty parking space with no guaran-tee of finding any wasting time,resources,and causing unnecessary congestion.To address these issues,this paper proposes a digitized parking system with a proof-of-concept implementation that combines multiple technological concepts into one solution with the advantages of using IoT for real-time tracking of park-ing availability.User authentication and automated payments are handled using a quick response(QR)code on entry and exit.Some experiments were done on real data collected for six different locations in Cairo via a live popular times library.Several machine learning models were investigated in order to estimate the occu-pancy rate of certain places.Moreover,a clear analysis of the differences in per-formance is illustrated with the final model deployed being XGboost.It has achieved the most efficient results with a R^(2) score of 85.7%.展开更多
Aiming at the problem of abnormal data generated by a power transformer on-line monitoring system due to the influences of transformer operation state change,external environmental interference,communication interrupt...Aiming at the problem of abnormal data generated by a power transformer on-line monitoring system due to the influences of transformer operation state change,external environmental interference,communication interruption,and other factors,a method of anomaly recognition and differentiation for monitoring data was proposed.Firstly,the empirical wavelet transform(EWT)and the autoregressive integrated moving average(ARIMA)model were used for time series modelling of monitoring data to obtain the residual sequence reflecting the anomaly monitoring data value,and then the isolation forest algorithm was used to identify the abnormal information,and the monitoring sequence was segmented according to the recognition results.Secondly,the segmented sequence was symbolised by the improved multi-dimensional SAX vector representation method,and the assessment of the anomaly pattern was made by calculating the similarity score of the adjacent symbol vectors,and the monitoring sequence correlation was further used to verify the assessment.Finally,the case study result shows that the proposed method can reliably recognise abnormal data and accurately distinguish between invalid and valid anomaly patterns.展开更多
With the rising demand for data access,network service providers face the challenge of growing their capital and operating costs while at the same time enhancing network capacity and meeting the increased demand for a...With the rising demand for data access,network service providers face the challenge of growing their capital and operating costs while at the same time enhancing network capacity and meeting the increased demand for access.To increase efficacy of Software Defined Network(SDN)and Network Function Virtualization(NFV)framework,we need to eradicate network security configuration errors that may create vulnerabilities to affect overall efficiency,reduce network performance,and increase maintenance cost.The existing frameworks lack in security,and computer systems face few abnormalities,which prompts the need for different recognition and mitigation methods to keep the system in the operational state proactively.The fundamental concept behind SDN-NFV is the encroachment from specific resource execution to the programming-based structure.This research is around the combination of SDN and NFV for rational decision making to control and monitor traffic in the virtualized environment.The combination is often seen as an extra burden in terms of resources usage in a heterogeneous network environment,but as well as it provides the solution for critical problems specially regarding massive network traffic issues.The attacks have been expanding step by step;therefore,it is hard to recognize and protect by conventional methods.To overcome these issues,there must be an autonomous system to recognize and characterize the network traffic’s abnormal conduct if there is any.Only four types of assaults,including HTTP Flood,UDP Flood,Smurf Flood,and SiDDoS Flood,are considered in the identified dataset,to optimize the stability of the SDN-NFVenvironment and security management,through several machine learning based characterization techniques like Support Vector Machine(SVM),K-Nearest Neighbors(KNN),Logistic Regression(LR)and Isolation Forest(IF).Python is used for simulation purposes,including several valuable utilities like the mine package,the open-source Python ML libraries Scikit-learn,NumPy,SciPy,Matplotlib.Few Flood assaults and Structured Query Language(SQL)injections anomalies are validated and effectively-identified through the anticipated procedure.The classification results are promising and show that overall accuracy lies between 87%to 95%for SVM,LR,KNN,and IF classifiers in the scrutiny of traffic,whether the network traffic is normal or anomalous in the SDN-NFV environment.展开更多
The COVID-19 virus exhibits pneumonia-like symptoms,including fever,cough,and shortness of breath,and may be fatal.Many COVID-19 contraction experiments require comprehensive clinical procedures at medical facilities....The COVID-19 virus exhibits pneumonia-like symptoms,including fever,cough,and shortness of breath,and may be fatal.Many COVID-19 contraction experiments require comprehensive clinical procedures at medical facilities.Clinical studies help to make a correct diagnosis of COVID-19,where the disease has already spread to the organs in most cases.Prompt and early diagnosis is indispensable for providing patients with the possibility of early clinical diagnosis and slowing down the disease spread.Therefore,clinical investigations in patients with COVID-19 have revealed distinct patterns of breathing relative to other diseases such as flu and cold,which are worth investigating.Current supervised Machine Learning(ML)based techniques mostly investigate clinical reports such as X-Rays and Computerized Tomography(CT)for disease detection.This strategy relies on a larger clinical dataset and does not focus on early symptom identification.Towards this end,an innovative hybrid unsupervised ML technique is introduced to uncover the probability of COVID-19 occurrence based on the breathing patterns and commonly reported symptoms,fever,and cough.Specifically,various metrics,including body temperature,breathing and cough patterns,and physical activity,were considered in this study.Finally,a lightweight ML algorithm based on the K-Means and Isolation Forest technique was implemented on relatively small data including 40 individuals.The proposed technique shows an outlier detection with an accuracy of 89%,on average.展开更多
Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter c...Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter counterparts were used to detect multivariate geochemical anomalies from the stream sediment survey data of 1:50000 scale collected from the Helong district,Jilin Province,China.Based on the data modeling results,the receiver operating characteristic(ROC)curve analysis was performed to evaluate the performance of the two bat-optimized models and their default-parameter counterparts.The results show that the bat algorithm can improve the performance of the two models by optimizing their parameters in geochemical anomaly detection.The optimal threshold determined by the Youden index was used to identify geochemical anomalies from the geochemical data points.Compared with the anomalies detected by the elliptic envelope models,the anomalies detected by the isolation forest models have higher spatial relationship with the mineral occurrences discovered in the study area.According to the results of this study and previous work,it can be inferred that the background population of the study area is complex,which is not suitable for the establishment of elliptic envelope model.展开更多
Industrial Internet of Things(IIoT)represents the expansion of the Internet of Things(IoT)in industrial sectors.It is designed to implicate embedded technologies in manufacturing fields to enhance their operations.How...Industrial Internet of Things(IIoT)represents the expansion of the Internet of Things(IoT)in industrial sectors.It is designed to implicate embedded technologies in manufacturing fields to enhance their operations.However,IIoT involves some security vulnerabilities that are more damaging than those of IoT.Accordingly,Intrusion Detection Systems(IDSs)have been developed to forestall inevitable harmful intrusions.IDSs survey the environment to identify intrusions in real time.This study designs an intrusion detection model exploiting feature engineering and machine learning for IIoT security.We combine Isolation Forest(IF)with Pearson’s Correlation Coefficient(PCC)to reduce computational cost and prediction time.IF is exploited to detect and remove outliers from datasets.We apply PCC to choose the most appropriate features.PCC and IF are applied exchangeably(PCCIF and IFPCC).The Random Forest(RF)classifier is implemented to enhance IDS performances.For evaluation,we use the Bot-IoT and NF-UNSW-NB15-v2 datasets.RF-PCCIF and RF-IFPCC show noteworthy results with 99.98%and 99.99%Accuracy(ACC)and 6.18 s and 6.25 s prediction time on Bot-IoT,respectively.The two models also score 99.30%and 99.18%ACC and 6.71 s and 6.87 s prediction time on NF-UNSW-NB15-v2,respectively.Results prove that our designed model has several advantages and higher performance than related models.展开更多
Froth flotation is an important mineral concentration technique.Faulty conditions in flotation processes may cause the huge waste of mineral resources and reagents,and consequently,may lead to deterioration in terms o...Froth flotation is an important mineral concentration technique.Faulty conditions in flotation processes may cause the huge waste of mineral resources and reagents,and consequently,may lead to deterioration in terms of benefits of flotation plants.In this paper,we propose a computer vision-aided fault detection and diagnosis approach for froth flotation.Specifically,a joint Gabor texture feature based on the Copula model is designed to describe froth images;a rejection sampling technique is developed to generate training sets from the quality distribution of real flotation products,and then an isolation forest-based fault detector is learned;and a fault diagnosis model based on spline regression is developed for root cause identification.Simulation experiments conducted on the historical industry data show that the proposed strategy has better performance than the alternative methods.Thereafter,the entire framework has been tested on a lead-zinc flotation plant in China.Experimental results have demonstrated the effectiveness of the proposed method.展开更多
基金supported by the State Grid Liaoning Electric Power Supply CO, LTDthe financial support for the “Key Technology and Application Research of the Self-Service Grid Big Data Governance (No.SGLNXT00YJJS1800110)”
文摘With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.
基金supported by the National Natural Science Foundation of China(61873006)Beijing Natural Science Foundation(4204087,4212040).
文摘Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the operating environment of acid production with flue gas is complex and there is much equipment.The data obtained by the detection equipment is seriously polluted and prone to abnormal phenomena such as data loss and outliers.Therefore,to solve the problem of abnormal data in the process of acid production with flue gas,a data cleaning method based on improved random forest is proposed.Firstly,an outlier data recognition model based on isolation forest is designed to identify and eliminate the outliers in the dataset.Secondly,an improved random forest regression model is established.Genetic algorithm is used to optimize the hyperparameters of the random forest regression model.Then the optimal parameter combination is found in the search space and the trend of data is predicted.Finally,the improved random forest data cleaning method is used to compensate for the missing data after eliminating abnormal data and the data cleaning is realized.Results show that the proposed method can accurately eliminate and compensate for the abnormal data in the process of acid production with flue gas.The method improves the accuracy of compensation for missing data.With the data after cleaning,a more accurate model can be established,which is significant to the subsequent temperature control.The conversion rate of SO_(2) can be further improved,thereby improving the yield of sulfuric acid and economic benefits.
文摘It is not easy to construct a model to describe the geochemical background in geochemical anomaly detection due to the complexity of the geological setting.Isolation forest and its improved algorithms can detect geochemical anomalies without modeling the complex geochemical background.These methods can effec-tively extract multivariate anomalies from large volume of high-dimensional geochemical data with unknown population distribution.To test the performance of these algorithms in the detection of mineralization-related geochemical anomalies,the isolation forest,extended isolation forest and generalized isolation forest models were established to detect multivariate anomalies from the stream sediment survey data collected in the Wu-laga area in Heilongjiang Province.The geochemical anomalies detected by the generalized isolation forest model account for 40%of the study area,and contain 100%of the known gold deposits.The geochemical anomalies detected by the isolation forest model account for 20%of the study area,and contain 71%of the known gold deposits.The geochemical anomalies detected by the extended isolation forest algorithm account for 34%of the study area,and contain 100%of the known gold deposits.Therefore,the isolation forest mo-del,extended isolation fo-rest model and generalized isolation forest model are comparable in geochemical anomaly detection.
文摘The cigarette detection data contains a large amount of true sample data and a small amount of false sample data. The false sample data is regarded as abnormal data, and anomaly detection is performed to realize the identification of real and fake cigarettes. Binary particle swarm optimization algorithm is used to improve the isolation forest construction process, and isolation trees with high precision and large differences are selected, which improves the accuracy and efficiency of the algorithm. The distance between the obtained anomaly score and the clustering center of the k-means algorithm is used as the threshold for anomaly judgment. The experimental results show that the accuracy of the BPSO-iForest algorithm is improved compared with the standard iForest algorithm. The experimental results of multiple brand samples also show that the method in this paper can accurately use the detection data for authenticity identification.
基金supported by the National Natural Science Foundation of China(72171059,71771041,72121001)the Fundamental Research Funds for the Central Universities(FRFCU5710000220)the Natural Science Foundation of Heilongjiang Province,China(No.YQ2020G003).
文摘With the high-speed development of decentralized applications,account-based blockchain platforms have become a hotbed of various financial scams and hacks due to their anonymity and high financial value.Financial security has become a top priority with the sustainable development of blockchain-based platforms because of an increasing number of cyber attacks,which have resulted in a huge loss of crypto assets in recent years.Therefore,it is imperative to study the real-time detection of cyber attacks to facilitate effective supervision and regulation.To this end,this paper proposes the weighted and extended isolation forest algorithms and designs a novel framework for the real-time detection of cyber-attack transactions by thoroughly studying and summarizing real-world examples.Furthermore,this study develops a new detection approach for locating the compromised address of a cyber attack to resolve the data scarcity of hack addresses and reduce time consumption.Moreover,three experiments are carried out not only to apply on different types of cyber attacks but also to compare the proposed approach with the widely used existing methods.The results demonstrate the high efficiency and generality of the proposed approach.Finally,the lower time consumption and robustness of our method were validated through additional experiments.In conclusion,the proposed blockchain-oriented approach in this study can handle real-time detection of cyber attacks and has significant scope for applications.
文摘Parking space is usually very limited in major cities,especially Cairo,leading to traffic congestion,air pollution,and driver frustration.Existing car parking systems tend to tackle parking issues in a non-digitized manner.These systems require the drivers to search for an empty parking space with no guaran-tee of finding any wasting time,resources,and causing unnecessary congestion.To address these issues,this paper proposes a digitized parking system with a proof-of-concept implementation that combines multiple technological concepts into one solution with the advantages of using IoT for real-time tracking of park-ing availability.User authentication and automated payments are handled using a quick response(QR)code on entry and exit.Some experiments were done on real data collected for six different locations in Cairo via a live popular times library.Several machine learning models were investigated in order to estimate the occu-pancy rate of certain places.Moreover,a clear analysis of the differences in per-formance is illustrated with the final model deployed being XGboost.It has achieved the most efficient results with a R^(2) score of 85.7%.
基金supported by State Grid Hebei Electric Power Co.,Ltd.(kj2020-040).
文摘Aiming at the problem of abnormal data generated by a power transformer on-line monitoring system due to the influences of transformer operation state change,external environmental interference,communication interruption,and other factors,a method of anomaly recognition and differentiation for monitoring data was proposed.Firstly,the empirical wavelet transform(EWT)and the autoregressive integrated moving average(ARIMA)model were used for time series modelling of monitoring data to obtain the residual sequence reflecting the anomaly monitoring data value,and then the isolation forest algorithm was used to identify the abnormal information,and the monitoring sequence was segmented according to the recognition results.Secondly,the segmented sequence was symbolised by the improved multi-dimensional SAX vector representation method,and the assessment of the anomaly pattern was made by calculating the similarity score of the adjacent symbol vectors,and the monitoring sequence correlation was further used to verify the assessment.Finally,the case study result shows that the proposed method can reliably recognise abnormal data and accurately distinguish between invalid and valid anomaly patterns.
文摘With the rising demand for data access,network service providers face the challenge of growing their capital and operating costs while at the same time enhancing network capacity and meeting the increased demand for access.To increase efficacy of Software Defined Network(SDN)and Network Function Virtualization(NFV)framework,we need to eradicate network security configuration errors that may create vulnerabilities to affect overall efficiency,reduce network performance,and increase maintenance cost.The existing frameworks lack in security,and computer systems face few abnormalities,which prompts the need for different recognition and mitigation methods to keep the system in the operational state proactively.The fundamental concept behind SDN-NFV is the encroachment from specific resource execution to the programming-based structure.This research is around the combination of SDN and NFV for rational decision making to control and monitor traffic in the virtualized environment.The combination is often seen as an extra burden in terms of resources usage in a heterogeneous network environment,but as well as it provides the solution for critical problems specially regarding massive network traffic issues.The attacks have been expanding step by step;therefore,it is hard to recognize and protect by conventional methods.To overcome these issues,there must be an autonomous system to recognize and characterize the network traffic’s abnormal conduct if there is any.Only four types of assaults,including HTTP Flood,UDP Flood,Smurf Flood,and SiDDoS Flood,are considered in the identified dataset,to optimize the stability of the SDN-NFVenvironment and security management,through several machine learning based characterization techniques like Support Vector Machine(SVM),K-Nearest Neighbors(KNN),Logistic Regression(LR)and Isolation Forest(IF).Python is used for simulation purposes,including several valuable utilities like the mine package,the open-source Python ML libraries Scikit-learn,NumPy,SciPy,Matplotlib.Few Flood assaults and Structured Query Language(SQL)injections anomalies are validated and effectively-identified through the anticipated procedure.The classification results are promising and show that overall accuracy lies between 87%to 95%for SVM,LR,KNN,and IF classifiers in the scrutiny of traffic,whether the network traffic is normal or anomalous in the SDN-NFV environment.
基金This work is sponsored by Universiti Sains Malaysia Research Grant:(RUI:1001/PELECT/8014049).
文摘The COVID-19 virus exhibits pneumonia-like symptoms,including fever,cough,and shortness of breath,and may be fatal.Many COVID-19 contraction experiments require comprehensive clinical procedures at medical facilities.Clinical studies help to make a correct diagnosis of COVID-19,where the disease has already spread to the organs in most cases.Prompt and early diagnosis is indispensable for providing patients with the possibility of early clinical diagnosis and slowing down the disease spread.Therefore,clinical investigations in patients with COVID-19 have revealed distinct patterns of breathing relative to other diseases such as flu and cold,which are worth investigating.Current supervised Machine Learning(ML)based techniques mostly investigate clinical reports such as X-Rays and Computerized Tomography(CT)for disease detection.This strategy relies on a larger clinical dataset and does not focus on early symptom identification.Towards this end,an innovative hybrid unsupervised ML technique is introduced to uncover the probability of COVID-19 occurrence based on the breathing patterns and commonly reported symptoms,fever,and cough.Specifically,various metrics,including body temperature,breathing and cough patterns,and physical activity,were considered in this study.Finally,a lightweight ML algorithm based on the K-Means and Isolation Forest technique was implemented on relatively small data including 40 individuals.The proposed technique shows an outlier detection with an accuracy of 89%,on average.
基金supported by the National Natural Science Foundation of China(Nos.41672322,41872244)。
文摘Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter counterparts were used to detect multivariate geochemical anomalies from the stream sediment survey data of 1:50000 scale collected from the Helong district,Jilin Province,China.Based on the data modeling results,the receiver operating characteristic(ROC)curve analysis was performed to evaluate the performance of the two bat-optimized models and their default-parameter counterparts.The results show that the bat algorithm can improve the performance of the two models by optimizing their parameters in geochemical anomaly detection.The optimal threshold determined by the Youden index was used to identify geochemical anomalies from the geochemical data points.Compared with the anomalies detected by the elliptic envelope models,the anomalies detected by the isolation forest models have higher spatial relationship with the mineral occurrences discovered in the study area.According to the results of this study and previous work,it can be inferred that the background population of the study area is complex,which is not suitable for the establishment of elliptic envelope model.
文摘Industrial Internet of Things(IIoT)represents the expansion of the Internet of Things(IoT)in industrial sectors.It is designed to implicate embedded technologies in manufacturing fields to enhance their operations.However,IIoT involves some security vulnerabilities that are more damaging than those of IoT.Accordingly,Intrusion Detection Systems(IDSs)have been developed to forestall inevitable harmful intrusions.IDSs survey the environment to identify intrusions in real time.This study designs an intrusion detection model exploiting feature engineering and machine learning for IIoT security.We combine Isolation Forest(IF)with Pearson’s Correlation Coefficient(PCC)to reduce computational cost and prediction time.IF is exploited to detect and remove outliers from datasets.We apply PCC to choose the most appropriate features.PCC and IF are applied exchangeably(PCCIF and IFPCC).The Random Forest(RF)classifier is implemented to enhance IDS performances.For evaluation,we use the Bot-IoT and NF-UNSW-NB15-v2 datasets.RF-PCCIF and RF-IFPCC show noteworthy results with 99.98%and 99.99%Accuracy(ACC)and 6.18 s and 6.25 s prediction time on Bot-IoT,respectively.The two models also score 99.30%and 99.18%ACC and 6.71 s and 6.87 s prediction time on NF-UNSW-NB15-v2,respectively.Results prove that our designed model has several advantages and higher performance than related models.
基金supported by the Joint Funds of the t National Natural Science Foundation of China(No.U1701261)the National Science Fund for Distinguished Young Scholars of China(No.61725306)+2 种基金the National Natural Science Foundation of China(No.61472134)the Research Funds for Strategic Emerging Industry Technological and Achievements Transformation of Hunan Province(No.2018GK4016)the Fundamental Research Funds for the Central Universities of Central South University(No.2018ZZTS169)。
文摘Froth flotation is an important mineral concentration technique.Faulty conditions in flotation processes may cause the huge waste of mineral resources and reagents,and consequently,may lead to deterioration in terms of benefits of flotation plants.In this paper,we propose a computer vision-aided fault detection and diagnosis approach for froth flotation.Specifically,a joint Gabor texture feature based on the Copula model is designed to describe froth images;a rejection sampling technique is developed to generate training sets from the quality distribution of real flotation products,and then an isolation forest-based fault detector is learned;and a fault diagnosis model based on spline regression is developed for root cause identification.Simulation experiments conducted on the historical industry data show that the proposed strategy has better performance than the alternative methods.Thereafter,the entire framework has been tested on a lead-zinc flotation plant in China.Experimental results have demonstrated the effectiveness of the proposed method.