Early stroke prediction is vital to prevent damage. A stroke happens when the blood flow to the brain is disrupted by a clot or bleeding, resulting in brain death or injury. However, early diagnosis and treatment redu...Early stroke prediction is vital to prevent damage. A stroke happens when the blood flow to the brain is disrupted by a clot or bleeding, resulting in brain death or injury. However, early diagnosis and treatment reduce long-term needs and lower health costs. We aim for this research to be a machine-learning method for forecasting early warning signs of stroke. The methodology we employed feature selection techniques and multiple algorithms. Utilizing the XGboost Algorithm, the research findings indicate that their proposed model achieved an accuracy rate of 96.45%. This research shows that machine learning can effectively predict early warning signs of stroke, which can help reduce long-term treatment and rehabilitation needs and lower health costs.展开更多
Background: A novel approach to modelling individual tree growth dynamics is proposed. The approach combines multiple imputation and copula sampling to produce a stochastic individual tree growth and yield projection...Background: A novel approach to modelling individual tree growth dynamics is proposed. The approach combines multiple imputation and copula sampling to produce a stochastic individual tree growth and yield projection system. Methods: The Nova Scotia, Canada permanent sample plot network is used as a case study to develop and test the modelling approach. Predictions from this model are compared to predictions from the Acadian variant of the Forest Vegetation Simulator, a widely used statistical individual tree growth and yield model. Results: Diameter and height growth rates were predicted with error rates consistent with those produced using statistical models. Mortality and ingrowth error rates were higher than those observed for diameter and height, but also were within the bounds produced by traditional approaches for predicting these rates. Ingrowth species composition was very poorly predicted. The model was capable of reproducing a wide range of stand dynamic trajectories and in some cases reproduced trajectories that the statistical model was incapable of reproducing. Conclusions: The model has potential to be used as a benchmarking tool for evaluating statistical and process models and may provide a mechanism to separate signal from noise and improve our ability to analyze and learn from large regional datasets that often have underlying flaws in sample design.展开更多
Finding Nearest Neighbors efficiently is crucial to the design of any nearest neighbor classifier. This paper shows how Layered Range Trees (LRT) could be utilized for efficient nearest neighbor classification. The pr...Finding Nearest Neighbors efficiently is crucial to the design of any nearest neighbor classifier. This paper shows how Layered Range Trees (LRT) could be utilized for efficient nearest neighbor classification. The presented algorithm is robust and finds the nearest neighbor in a logarithmic order. The proposed algorithm reports the nearest neighbor in , where k is a very small constant when compared with the dataset size n and d is the number of dimensions. Experimental results demonstrate the efficiency of the proposed algorithm.展开更多
Individual tree detection (ITD) and the area-based approach (ABA) are combined to generate tree-lists using airborne LiDAR data. ITD based on the Canopy Height Model (CHM) was applied for overstory trees, while ABA ba...Individual tree detection (ITD) and the area-based approach (ABA) are combined to generate tree-lists using airborne LiDAR data. ITD based on the Canopy Height Model (CHM) was applied for overstory trees, while ABA based on nearest neighbor (NN) imputation was applied for understory trees. Our approach is intended to compensate for the weakness of LiDAR data and ITD in estimating understory trees, keeping the strength of ITD in estimating overstory trees in tree-level. We investigated the effects of three parameters on the performance of our proposed approach: smoothing of CHM, resolution of CHM, and height cutoff (a specific height that classifies trees into overstory and understory). There was no single combination of those parameters that produced the best performance for estimating stems per ha, mean tree height, basal area, diameter distribution and height distribution. The trees in the lowest LiDAR height class yielded the largest relative bias and relative root mean squared error. Although ITD and ABA showed limited explanatory powers to estimate stems per hectare and basal area, there could be improvements from methods such as using LiDAR data with higher density, applying better algorithms for ITD and decreasing distortion of the structure of LiDAR data. Automating the procedure of finding optimal combinations of those parameters is essential to expedite forest management decisions across forest landscapes using remote sensing data.展开更多
The research was carried out on the territory of the Karelian Isthmus of the Leningrad Region using Sentinel-2B images and data from a network of ground sample plots. The ground sample plots are located in the studied...The research was carried out on the territory of the Karelian Isthmus of the Leningrad Region using Sentinel-2B images and data from a network of ground sample plots. The ground sample plots are located in the studied territory mainly in a regular manner, laid and surveyed according to the ICP-Forests methodology with some additions. The total area of the sample plots is a small part of the entire study area. One of the objectives of the study was to determine the possibility of using the k-NN (nearest neighbor method) to assess the state of forests throughout the whole studied territory by joint statistical processing of data from ground sample plots and Sentinel-2B imagery. The data of the ground-based sample plots were divided into 2 equal parts, one for the application of the k-NN method, the second for checking the results of the method application. The systematic error in determining the mean damage class of the tree stands on sample plots by the k-NN method turned out to be zero, the random error is equal to one point. These results offer a possibility to determine the state of the forest in the entire study area. The second objective of the study was to examine the possibility of using the short-wave vegetation index (SWVI) to assess the state of forests. As a result, a close statistically reliable dependence of the average score of the state of plantations and the value of the SWVI index was established, which makes it possible to use the established relationship to determine the state of forests throughout the studied territory. The joint use and statistical processing of remotely sensed data and ground-based test areas by the two studied methods make it possible to assess the state of forests throughout the large studied area within the image. The results obtained can be used to monitor the state of forests in large areas and design appropriate forestry protective measures.展开更多
This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteris...This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteristics with respect to the dynamic data environment. On GIS and CAD systems, the R-tree and its successors have been used. In addition, the NN search algorithm is also proposed in an attempt to obtain good performance from the R-tree. On the other hand, the GBD tree is superior to the R-tree with respect to exact match retrieval, because the GBD tree has auxiliary data that uniquely determines the position of the object in the structure. The proposed NN search algorithm depends on the property of the GBD tree described above. The NN search algorithm on the GBD tree was studied and the performance thereof was evaluated through experiments.展开更多
为解决均值漂移聚类算法聚类效果依赖于带宽参数的主观选取,以及处理密度变化大的数据集时聚类结果精确度问题,提出一种基于覆盖树的自适应均值漂移聚类算法MSCT(MeanShift based on Cover-Tree)。构建一个覆盖树数据集,在计算漂移向量...为解决均值漂移聚类算法聚类效果依赖于带宽参数的主观选取,以及处理密度变化大的数据集时聚类结果精确度问题,提出一种基于覆盖树的自适应均值漂移聚类算法MSCT(MeanShift based on Cover-Tree)。构建一个覆盖树数据集,在计算漂移向量过程中结合覆盖树数据集获得新的漂移向量结果KnnShift,在不同数据密度分布的数据集上都能自适应产生带宽参数,所有数据点完成漂移过程后获得聚类结果。实验结果表明,MSCT算法的聚类效果整体上优于MS、DBSCAN等算法。展开更多
Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malwar...Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malware detection.However,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware detection.Addressing this gap can provide valuable insights for enhancing cybersecurity strategies.While numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware detection.Understanding the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security measures.This study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows systems.The objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows malware.Evaluating the accuracy,efficiency,and suitability of each classifier for real-world malware detection scenarios.Identifying the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and researchers.Offering recommendations for selecting the most effective classifier for Windows malware detection based on empirical evidence.The study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and evaluation.Exploratory data analysis involves understanding the dataset’s characteristics and identifying preprocessing requirements.Data preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for training.Model training utilizes various supervised classifiers,and their performance is evaluated using metrics such as accuracy,precision,recall,and F1 score.The study’s outcomes comprise a comparative analysis of supervised machine learning classifiers for Windows malware detection.Results reveal the effectiveness and efficiency of each classifier in detecting different types of malware.Additionally,insights into their strengths and limitations provide practical guidance for enhancing cybersecurity defenses.Overall,this research contributes to advancing malware detection techniques and bolstering the security posture of Windows systems against evolving cyber threats.展开更多
文摘Early stroke prediction is vital to prevent damage. A stroke happens when the blood flow to the brain is disrupted by a clot or bleeding, resulting in brain death or injury. However, early diagnosis and treatment reduce long-term needs and lower health costs. We aim for this research to be a machine-learning method for forecasting early warning signs of stroke. The methodology we employed feature selection techniques and multiple algorithms. Utilizing the XGboost Algorithm, the research findings indicate that their proposed model achieved an accuracy rate of 96.45%. This research shows that machine learning can effectively predict early warning signs of stroke, which can help reduce long-term treatment and rehabilitation needs and lower health costs.
文摘Background: A novel approach to modelling individual tree growth dynamics is proposed. The approach combines multiple imputation and copula sampling to produce a stochastic individual tree growth and yield projection system. Methods: The Nova Scotia, Canada permanent sample plot network is used as a case study to develop and test the modelling approach. Predictions from this model are compared to predictions from the Acadian variant of the Forest Vegetation Simulator, a widely used statistical individual tree growth and yield model. Results: Diameter and height growth rates were predicted with error rates consistent with those produced using statistical models. Mortality and ingrowth error rates were higher than those observed for diameter and height, but also were within the bounds produced by traditional approaches for predicting these rates. Ingrowth species composition was very poorly predicted. The model was capable of reproducing a wide range of stand dynamic trajectories and in some cases reproduced trajectories that the statistical model was incapable of reproducing. Conclusions: The model has potential to be used as a benchmarking tool for evaluating statistical and process models and may provide a mechanism to separate signal from noise and improve our ability to analyze and learn from large regional datasets that often have underlying flaws in sample design.
文摘Finding Nearest Neighbors efficiently is crucial to the design of any nearest neighbor classifier. This paper shows how Layered Range Trees (LRT) could be utilized for efficient nearest neighbor classification. The presented algorithm is robust and finds the nearest neighbor in a logarithmic order. The proposed algorithm reports the nearest neighbor in , where k is a very small constant when compared with the dataset size n and d is the number of dimensions. Experimental results demonstrate the efficiency of the proposed algorithm.
文摘Individual tree detection (ITD) and the area-based approach (ABA) are combined to generate tree-lists using airborne LiDAR data. ITD based on the Canopy Height Model (CHM) was applied for overstory trees, while ABA based on nearest neighbor (NN) imputation was applied for understory trees. Our approach is intended to compensate for the weakness of LiDAR data and ITD in estimating understory trees, keeping the strength of ITD in estimating overstory trees in tree-level. We investigated the effects of three parameters on the performance of our proposed approach: smoothing of CHM, resolution of CHM, and height cutoff (a specific height that classifies trees into overstory and understory). There was no single combination of those parameters that produced the best performance for estimating stems per ha, mean tree height, basal area, diameter distribution and height distribution. The trees in the lowest LiDAR height class yielded the largest relative bias and relative root mean squared error. Although ITD and ABA showed limited explanatory powers to estimate stems per hectare and basal area, there could be improvements from methods such as using LiDAR data with higher density, applying better algorithms for ITD and decreasing distortion of the structure of LiDAR data. Automating the procedure of finding optimal combinations of those parameters is essential to expedite forest management decisions across forest landscapes using remote sensing data.
文摘The research was carried out on the territory of the Karelian Isthmus of the Leningrad Region using Sentinel-2B images and data from a network of ground sample plots. The ground sample plots are located in the studied territory mainly in a regular manner, laid and surveyed according to the ICP-Forests methodology with some additions. The total area of the sample plots is a small part of the entire study area. One of the objectives of the study was to determine the possibility of using the k-NN (nearest neighbor method) to assess the state of forests throughout the whole studied territory by joint statistical processing of data from ground sample plots and Sentinel-2B imagery. The data of the ground-based sample plots were divided into 2 equal parts, one for the application of the k-NN method, the second for checking the results of the method application. The systematic error in determining the mean damage class of the tree stands on sample plots by the k-NN method turned out to be zero, the random error is equal to one point. These results offer a possibility to determine the state of the forest in the entire study area. The second objective of the study was to examine the possibility of using the short-wave vegetation index (SWVI) to assess the state of forests. As a result, a close statistically reliable dependence of the average score of the state of plantations and the value of the SWVI index was established, which makes it possible to use the established relationship to determine the state of forests throughout the studied territory. The joint use and statistical processing of remotely sensed data and ground-based test areas by the two studied methods make it possible to assess the state of forests throughout the large studied area within the image. The results obtained can be used to monitor the state of forests in large areas and design appropriate forestry protective measures.
文摘This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteristics with respect to the dynamic data environment. On GIS and CAD systems, the R-tree and its successors have been used. In addition, the NN search algorithm is also proposed in an attempt to obtain good performance from the R-tree. On the other hand, the GBD tree is superior to the R-tree with respect to exact match retrieval, because the GBD tree has auxiliary data that uniquely determines the position of the object in the structure. The proposed NN search algorithm depends on the property of the GBD tree described above. The NN search algorithm on the GBD tree was studied and the performance thereof was evaluated through experiments.
文摘为解决均值漂移聚类算法聚类效果依赖于带宽参数的主观选取,以及处理密度变化大的数据集时聚类结果精确度问题,提出一种基于覆盖树的自适应均值漂移聚类算法MSCT(MeanShift based on Cover-Tree)。构建一个覆盖树数据集,在计算漂移向量过程中结合覆盖树数据集获得新的漂移向量结果KnnShift,在不同数据密度分布的数据集上都能自适应产生带宽参数,所有数据点完成漂移过程后获得聚类结果。实验结果表明,MSCT算法的聚类效果整体上优于MS、DBSCAN等算法。
基金This researchwork is supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R411),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malware detection.However,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware detection.Addressing this gap can provide valuable insights for enhancing cybersecurity strategies.While numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware detection.Understanding the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security measures.This study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows systems.The objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows malware.Evaluating the accuracy,efficiency,and suitability of each classifier for real-world malware detection scenarios.Identifying the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and researchers.Offering recommendations for selecting the most effective classifier for Windows malware detection based on empirical evidence.The study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and evaluation.Exploratory data analysis involves understanding the dataset’s characteristics and identifying preprocessing requirements.Data preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for training.Model training utilizes various supervised classifiers,and their performance is evaluated using metrics such as accuracy,precision,recall,and F1 score.The study’s outcomes comprise a comparative analysis of supervised machine learning classifiers for Windows malware detection.Results reveal the effectiveness and efficiency of each classifier in detecting different types of malware.Additionally,insights into their strengths and limitations provide practical guidance for enhancing cybersecurity defenses.Overall,this research contributes to advancing malware detection techniques and bolstering the security posture of Windows systems against evolving cyber threats.