Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor eff...Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.展开更多
The increasing amount and intricacy of network traffic in the modern digital era have worsened the difficulty of identifying abnormal behaviours that may indicate potential security breaches or operational interruptio...The increasing amount and intricacy of network traffic in the modern digital era have worsened the difficulty of identifying abnormal behaviours that may indicate potential security breaches or operational interruptions. Conventional detection approaches face challenges in keeping up with the ever-changing strategies of cyber-attacks, resulting in heightened susceptibility and significant harm to network infrastructures. In order to tackle this urgent issue, this project focused on developing an effective anomaly detection system that utilizes Machine Learning technology. The suggested model utilizes contemporary machine learning algorithms and frameworks to autonomously detect deviations from typical network behaviour. It promptly identifies anomalous activities that may indicate security breaches or performance difficulties. The solution entails a multi-faceted approach encompassing data collection, preprocessing, feature engineering, model training, and evaluation. By utilizing machine learning methods, the model is trained on a wide range of datasets that include both regular and abnormal network traffic patterns. This training ensures that the model can adapt to numerous scenarios. The main priority is to ensure that the system is functional and efficient, with a particular emphasis on reducing false positives to avoid unwanted alerts. Additionally, efforts are directed on improving anomaly detection accuracy so that the model can consistently distinguish between potentially harmful and benign activity. This project aims to greatly strengthen network security by addressing emerging cyber threats and improving their resilience and reliability.展开更多
This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the...This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the target class outcome in the leaf node's records that leads to a situation where majority voting cannot be applied. To solve the above mentioned exception, we propose to base the prediction of the result on the naive Bayes (NB) estimate, k-nearest neighbour (k-NN) and association rule mining (ARM). The other features used for splitting the parent nodes are also taken into consideration.展开更多
The agricultural sector’s day-to-day operations,such as irrigation and sowing,are impacted by the weather.Therefore,weather constitutes a key role in all regular human activities.Weather forecasting must be accurate ...The agricultural sector’s day-to-day operations,such as irrigation and sowing,are impacted by the weather.Therefore,weather constitutes a key role in all regular human activities.Weather forecasting must be accurate and precise to plan our activities and safeguard ourselves as well as our property from disasters.Rainfall,wind speed,humidity,wind direction,cloud,temperature,and other weather forecasting variables are used in this work for weather prediction.Many research works have been conducted on weather forecasting.The drawbacks of existing approaches are that they are less effective,inaccurate,and time-consuming.To overcome these issues,this paper proposes an enhanced and reliable weather forecasting technique.As well as developing weather forecasting in remote areas.Weather data analysis and machine learning techniques,such as Gradient Boosting Decision Tree,Random Forest,Naive Bayes Bernoulli,and KNN Algorithm are deployed to anticipate weather conditions.A comparative analysis of result outcome said in determining the number of ensemble methods that may be utilized to improve the accuracy of prediction in weather forecasting.The aim of this study is to demonstrate its ability to predict weather forecasts as soon as possible.Experimental evaluation shows our ensemble technique achieves 95%prediction accuracy.Also,for 1000 nodes it is less than 10 s for prediction,and for 5000 nodes it takes less than 40 s for prediction.展开更多
基金This research is supported by the National Natural Science Foundation of China(Grant No.60573174)the Natural Science Foundation of Anhui Province of China(Grant No.050420207).
文摘Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.
文摘The increasing amount and intricacy of network traffic in the modern digital era have worsened the difficulty of identifying abnormal behaviours that may indicate potential security breaches or operational interruptions. Conventional detection approaches face challenges in keeping up with the ever-changing strategies of cyber-attacks, resulting in heightened susceptibility and significant harm to network infrastructures. In order to tackle this urgent issue, this project focused on developing an effective anomaly detection system that utilizes Machine Learning technology. The suggested model utilizes contemporary machine learning algorithms and frameworks to autonomously detect deviations from typical network behaviour. It promptly identifies anomalous activities that may indicate security breaches or performance difficulties. The solution entails a multi-faceted approach encompassing data collection, preprocessing, feature engineering, model training, and evaluation. By utilizing machine learning methods, the model is trained on a wide range of datasets that include both regular and abnormal network traffic patterns. This training ensures that the model can adapt to numerous scenarios. The main priority is to ensure that the system is functional and efficient, with a particular emphasis on reducing false positives to avoid unwanted alerts. Additionally, efforts are directed on improving anomaly detection accuracy so that the model can consistently distinguish between potentially harmful and benign activity. This project aims to greatly strengthen network security by addressing emerging cyber threats and improving their resilience and reliability.
文摘This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the target class outcome in the leaf node's records that leads to a situation where majority voting cannot be applied. To solve the above mentioned exception, we propose to base the prediction of the result on the naive Bayes (NB) estimate, k-nearest neighbour (k-NN) and association rule mining (ARM). The other features used for splitting the parent nodes are also taken into consideration.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number(RGP 2/42/43)Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R135),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘The agricultural sector’s day-to-day operations,such as irrigation and sowing,are impacted by the weather.Therefore,weather constitutes a key role in all regular human activities.Weather forecasting must be accurate and precise to plan our activities and safeguard ourselves as well as our property from disasters.Rainfall,wind speed,humidity,wind direction,cloud,temperature,and other weather forecasting variables are used in this work for weather prediction.Many research works have been conducted on weather forecasting.The drawbacks of existing approaches are that they are less effective,inaccurate,and time-consuming.To overcome these issues,this paper proposes an enhanced and reliable weather forecasting technique.As well as developing weather forecasting in remote areas.Weather data analysis and machine learning techniques,such as Gradient Boosting Decision Tree,Random Forest,Naive Bayes Bernoulli,and KNN Algorithm are deployed to anticipate weather conditions.A comparative analysis of result outcome said in determining the number of ensemble methods that may be utilized to improve the accuracy of prediction in weather forecasting.The aim of this study is to demonstrate its ability to predict weather forecasts as soon as possible.Experimental evaluation shows our ensemble technique achieves 95%prediction accuracy.Also,for 1000 nodes it is less than 10 s for prediction,and for 5000 nodes it takes less than 40 s for prediction.