Based on a fuzzy neural network, the letter presents an approach for the induction of decision trees. The approach makes use of the weights of fuzzy mappings in the fuzzy neural network which has been trained. It can ...Based on a fuzzy neural network, the letter presents an approach for the induction of decision trees. The approach makes use of the weights of fuzzy mappings in the fuzzy neural network which has been trained. It can realize the optimization of fuzzy decision trees by branch cutting, and improve the ratio of correctness and efficiency of the induction of decision trees.展开更多
To build any spatial soil database, a set of environmental data including digital elevation model(DEM) and satellite images beside geomorphic landscape description are essentials. Such a database, integrates field obs...To build any spatial soil database, a set of environmental data including digital elevation model(DEM) and satellite images beside geomorphic landscape description are essentials. Such a database, integrates field observations and laboratory analyses data with the results obtained from qualitative and quantitative models. So far, various techniques have been developed for soil data processing. The performance of Artificial Neural Network(ANN) and Decision Tree(DT) models was compared to map out some soil attributes in Alborz Province, Iran. Terrain attributes derived from a DEM along with Landsat 8 ETM+, geomorphology map, and the routine laboratory analyses of the studied area were used as input data. The relationships between soil properties(including sand, silt, clay, electrical conductivity, organic carbon, and carbonates) and the environmental variables were assessed using the Pearson Correlation Coefficient and Principle Components Analysis. Slope, elevation, geomforms, carbonate index, stream network, wetness index, and the band’s number 2, 3, 4, and 5 were the most significantly correlated variables. ANN and DT did not show the same accuracy in predicting all parameters. The DT model showed higher performances in estimating sand(R^2=0.73), silt(R^2=0.70), clay(R^2=0.72), organic carbon(R^2=0.71), and carbonates(R^2=0.70). While the ANN model only showed higher performance in predicting soil electrical conductivity(R^2=0.95). The results showed that determination the best model to use, is dependent upon the relation between the considered soil properties with the environmental variables. However, the DT model showed more reasonable results than the ANN model in this study. The results showed that before using a certain model to predict variability of all soil parameters, it would be better to evaluate the efficiency of all possible models for choosing the best fitted model for each property. In other words, most of the developed models are sitespecific and may not be applicable to use for predicting other soil properties or other area.展开更多
Fetal distress is one of the main factors to cesarean section in obstetrics and gynecology. If the fetus lack of oxygen in uterus, threat to the fetal health and fetal death could happen. Cardiotocography (CTG) is the...Fetal distress is one of the main factors to cesarean section in obstetrics and gynecology. If the fetus lack of oxygen in uterus, threat to the fetal health and fetal death could happen. Cardiotocography (CTG) is the most widely used technique to monitor the fetal health and fetal heart rate (FHR) is an important index to identify occurs of fetal distress. This study is to propose discriminant analysis (DA), decision tree (DT), and artificial neural network (ANN) to evaluate fetal distress. The results show that the accuracies of DA, DT and ANN are 82.1%, 86.36% and 97.78%, respectively.展开更多
Tuberculosis remains an important problem in public health that threatens the world, including the Philippines. Treatment relapse continues to place a severe problem on patients and TB programs worldwide. A significan...Tuberculosis remains an important problem in public health that threatens the world, including the Philippines. Treatment relapse continues to place a severe problem on patients and TB programs worldwide. A significant reason for the development of decline is poor compliance with medical treatments. The objectives of this research are to generate a predictive data mining model to classify the treatment relapse of TB patients and to identify the features influencing the category of treatment relapse. The TB patient dataset is applied and tested in decision tree J48 algorithm using WEKA. The J48 model identified the three (3) significant independent variables (DSSM Result, Age, and Sex) as predictors of category treatment relapse.展开更多
In computational physics proton transfer phenomena could be viewed as pattern classification problems based on a set of input features allowing classification of the proton motion into two categories: transfer 'occu...In computational physics proton transfer phenomena could be viewed as pattern classification problems based on a set of input features allowing classification of the proton motion into two categories: transfer 'occurred' and transfer 'not occurred'. The goal of this paper is to evaluate the use of artificial neural networks in the classification of proton transfer events, based on the feed-forward back propagation neural network, used as a classifier to distinguish between the two transfer cases. In this paper, we use a new developed data mining and pattern recognition tool for automating, controlling, and drawing charts of the output data of an Empirical Valence Bond existing code. The study analyzes the need for pattern recognition in aqueous proton transfer processes and how the learning approach in error back propagation (multilayer perceptron algorithms) could be satisfactorily employed in the present case. We present a tool for pattern recognition and validate the code including a real physical case study. The results of applying the artificial neural networks methodology to crowd patterns based upon selected physical properties (e.g., temperature, density) show the abilities of the network to learn proton transfer patterns corresponding to properties of the aqueous environments, which is in turn proved to be fully compatible with previous proton transfer studies.展开更多
In this letter,Constructive Neural Networks (CNN) is used in large-scale data mining. By introducing the principle and characteristics of CNN and pointing out its deficiencies,fuzzy theory is adopted to improve the co...In this letter,Constructive Neural Networks (CNN) is used in large-scale data mining. By introducing the principle and characteristics of CNN and pointing out its deficiencies,fuzzy theory is adopted to improve the covering algorithms. The threshold of covering algorithms is redefined. "Extended area" for test samples is built. The inference of the outlier is eliminated. Furthermore,"Sphere Neighborhood (SN)" are constructed. The membership functions of test samples are given and all of the test samples are determined accordingly. The method is used to mine large wireless monitor data (about 3×107 data points),and knowledge is found effectively.展开更多
Rough set (RS) and radial basis function neural network (RBFNN) based insulation data mining fault diagnosis for power transformer is proposed. On the one hand rough set is used as front of RBFNN to simplify the input...Rough set (RS) and radial basis function neural network (RBFNN) based insulation data mining fault diagnosis for power transformer is proposed. On the one hand rough set is used as front of RBFNN to simplify the input of RBFNN and mine the rules. The mined rules whose “confidence” and “support” is higher than requirement are used to offer fault diagnosis service for power transformer directly. On the other hand the mining samples corresponding to the mined rule, whose “confidence and support” is lower than requirement, are used to be training samples set of RBFNN and these samples are clustered by rough set. The center of each clustering set is used to be center of radial basis function, i.e., as the hidden layer neuron. The RBFNN is structured with above base, which is used to diagnose the case that can not be diagnosed by mined simplified valuable rules based on rough set. The advantages and effectiveness of this method are verified by testing.展开更多
With the progress of computer technology, data mining has become a hot research area in the computer science community. In this paper, we undertake theoretical research on the novel data mining algorithm based on fuzz...With the progress of computer technology, data mining has become a hot research area in the computer science community. In this paper, we undertake theoretical research on the novel data mining algorithm based on fuzzy clustering theory and deep neural network. The focus of data mining in seeking the visualization methods in the process of data mining, knowledge discovery process can be users to understand, to facilitate human-computer interaction in knowledge discovery process. Inspired by the brain structure layers, neural network researchers have been trying to multilayer neural network research. The experiment result shows that out algorithm is effective and robust.展开更多
This paper first puts forward a case based system framework based on data mining techniques. Then the paper examines the possibility of using neural networks as a method of retrieval in such a case based system. In ...This paper first puts forward a case based system framework based on data mining techniques. Then the paper examines the possibility of using neural networks as a method of retrieval in such a case based system. In this system we propose data mining algorithms to discover case knowledge and other algorithms.展开更多
This paper integrates genetic algorithm and neura l network techniques to build new temporal predicting analysis tools for geographic information system (GIS). These new GIS tools can be readily applied in a practical...This paper integrates genetic algorithm and neura l network techniques to build new temporal predicting analysis tools for geographic information system (GIS). These new GIS tools can be readily applied in a practical and appropriate manner in spatial and temp oral research to patch the gaps in GIS data mining and knowledge discovery functions. The specific achievement here is the integration of related artificial intellig ent technologies into GIS software to establish a conceptual spatial and temporal analysis framework. And, by using this framework to develop an artificial intelligent spatial and tempor al information analyst (ASIA) system which then is fully utilized in the existin g GIS package. This study of air pollutants forecasting provides a geographical practical case to prove the rationalization and justness of the conceptual tempo ral analysis framework.展开更多
By rapid progress of network and storage technologies, a huge amount of electronic data such as Web pages and XML has been available on Internet. In this paper, we study a data-mining problem of discovering frequent o...By rapid progress of network and storage technologies, a huge amount of electronic data such as Web pages and XML has been available on Internet. In this paper, we study a data-mining problem of discovering frequent ordered sub-trees in a large collection of XML data, where both of the patterns and the data are modeled by labeled ordered trees. We present an efficient algorithm of Ordered Subtree Miner (OSTMiner) based on two- layer neural networks with Hebb rule, that computes all ordered sub-trees appearing in a collection of XML trees with frequent above a user-specified threshold using a special structure EM-tree. In this algo- rithm, EM-tree is used as an extended merging tree to supply scheme information for efficient pruning and mining frequent sub-trees. Experiments results showed that OSTMiner has good response time and scales well.展开更多
It is difficult if not impossible to appropriately and effectively select from among the vast pool of existing neural network machine learning predictive models for industrial incorporation or academic research explor...It is difficult if not impossible to appropriately and effectively select from among the vast pool of existing neural network machine learning predictive models for industrial incorporation or academic research exploration and enhancement. When all models outperform all the others under disparate circumstances, none of the models do. Selecting the ideal model becomes a matter of ill-supported opinion ungrounded on the extant real world environment. This paper proposes a novel grouping of the model pool grounded along a non-stationary real world data line into two groups: Permanent Data Learning and Reversible Data Learning. This paper further proposes a novel approach towards qualitatively and quantitatively demonstrating their significant differences based on how they alternatively approach dynamic and raw real world data vs static and prescient data mining biased laboratory data. The results across 2040 separate simulation runs using 15,600 data points in realistically operationally controlled data environments show that the two-group division is effective and significant with clear qualitative, quantitative and theoretical support. Results across the empirical and theoretical spectrum are internally and externally consistent yet demonstrative of why and how this result is non-obvious.展开更多
This paper explores the use of soft decision trees [1] in basic reinforcement applications to examine the efficacy of using passive-expert like networks for optimal Q-Value learning on Artificial Neural Networks (ANN)...This paper explores the use of soft decision trees [1] in basic reinforcement applications to examine the efficacy of using passive-expert like networks for optimal Q-Value learning on Artificial Neural Networks (ANN). The soft decision tree networks were built using the PyTorch machine learning and the OpenAi’s Gym environment frameworks. The conducted research study aimed at assessing the performance of soft decision tree networks on Cartpole as provided in the OpenAi Gym software package. The baseline performance metric that the soft decision tree networks were compared against was a simple Deep Neural Network using several linear layers with ReLU and Softmax activation functions for the input and output layers, respectively. All networks were trained using the Backpropagation algorithm provided generically by PyTorch’sAutograd module.展开更多
The medical community has more concern on lung cancer analysis.Medical experts’physical segmentation of lung cancers is time-consuming and needs to be automated.The research study’s objective is to diagnose lung tum...The medical community has more concern on lung cancer analysis.Medical experts’physical segmentation of lung cancers is time-consuming and needs to be automated.The research study’s objective is to diagnose lung tumors at an early stage to extend the life of humans using deep learning techniques.Computer-Aided Diagnostic(CAD)system aids in the diagnosis and shortens the time necessary to detect the tumor detected.The application of Deep Neural Networks(DNN)has also been exhibited as an excellent and effective method in classification and segmentation tasks.This research aims to separate lung cancers from images of Magnetic Resonance Imaging(MRI)with threshold segmentation.The Honey hook process categorizes lung cancer based on characteristics retrieved using several classifiers.Considering this principle,the work presents a solution for image compression utilizing a Deep Wave Auto-Encoder(DWAE).The combination of the two approaches significantly reduces the overall size of the feature set required for any future classification process performed using DNN.The proposed DWAE-DNN image classifier is applied to a lung imaging dataset with Radial Basis Function(RBF)classifier.The study reported promising results with an accuracy of 97.34%,whereas using the Decision Tree(DT)classifier has an accuracy of 94.24%.The proposed approach(DWAE-DNN)is found to classify the images with an accuracy of 98.67%,either as malignant or normal patients.In contrast to the accuracy requirements,the work also uses the benchmark standards like specificity,sensitivity,and precision to evaluate the efficiency of the network.It is found from an investigation that the DT classifier provides the maximum performance in the DWAE-DNN depending on the network’s performance on image testing,as shown by the data acquired by the categorizers themselves.展开更多
Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting mo...Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting model of agro-meteorological disaster grade was established by adopting the C4.5 classification algorithm of decision tree,which can forecast the direct economic loss degree to provide rational data mining model and obtain effective analysis results.展开更多
Heart failure is now widely spread throughout the world.Heart disease affects approximately 48%of the population.It is too expensive and also difficult to cure the disease.This research paper represents machine learni...Heart failure is now widely spread throughout the world.Heart disease affects approximately 48%of the population.It is too expensive and also difficult to cure the disease.This research paper represents machine learning models to predict heart failure.The fundamental concept is to compare the correctness of various Machine Learning(ML)algorithms and boost algorithms to improve models’accuracy for prediction.Some supervised algorithms like K-Nearest Neighbor(KNN),Support Vector Machine(SVM),Decision Trees(DT),Random Forest(RF),Logistic Regression(LR)are considered to achieve the best results.Some boosting algorithms like Extreme Gradient Boosting(XGBoost)and Cat-Boost are also used to improve the prediction using Artificial Neural Networks(ANN).This research also focuses on data visualization to identify patterns,trends,and outliers in a massive data set.Python and Scikit-learns are used for ML.Tensor Flow and Keras,along with Python,are used for ANN model train-ing.The DT and RF algorithms achieved the highest accuracy of 95%among the classifiers.Meanwhile,KNN obtained a second height accuracy of 93.33%.XGBoost had a gratified accuracy of 91.67%,SVM,CATBoost,and ANN had an accuracy of 90%,and LR had 88.33%accuracy.展开更多
The recommendation system(RS)on the strength of Graph Neural Networks(GNN)perceives a user-item interaction graph after collecting all items the user has interacted with.Afterward the RS performs neighborhood aggregat...The recommendation system(RS)on the strength of Graph Neural Networks(GNN)perceives a user-item interaction graph after collecting all items the user has interacted with.Afterward the RS performs neighborhood aggregation on the graph to generate long-term preference representations for the user in quick succession.However,user preferences are dynamic.With the passage of time and some trend guidance,users may generate some short-term preferences,which are more likely to lead to user-item interactions.A GNN recommendation based on long-and short-term preference(LSGNN)is proposed to address the above problems.LSGNN consists of four modules,using a GNN combined with the attention mechanism to extract long-term preference features,using Bidirectional Encoder Representation from Transformers(BERT)and the attention mechanism combined with Bi-Directional Gated Recurrent Unit(Bi-GRU)to extract short-term preference features,using Convolutional Neural Network(CNN)combined with the attention mechanism to add title and description representations of items,finally inner-producing long-term and short-term preference features as well as features of items to achieve recommendations.In experiments conducted on five publicly available datasets from Amazon,LSGNN is superior to state-of-the-art personalized recommendation techniques.展开更多
In the data retrieval process of the Data recommendation system,the matching prediction and similarity identification take place a major role in the ontology.In that,there are several methods to improve the retrieving...In the data retrieval process of the Data recommendation system,the matching prediction and similarity identification take place a major role in the ontology.In that,there are several methods to improve the retrieving process with improved accuracy and to reduce the searching time.Since,in the data recommendation system,this type of data searching becomes complex to search for the best matching for given query data and fails in the accuracy of the query recommendation process.To improve the performance of data validation,this paper proposed a novel model of data similarity estimation and clustering method to retrieve the relevant data with the best matching in the big data processing.In this paper advanced model of the Logarithmic Directionality Texture Pattern(LDTP)method with a Metaheuristic Pattern Searching(MPS)system was used to estimate the similarity between the query data in the entire database.The overall work was implemented for the application of the data recommendation process.These are all indexed and grouped as a cluster to form a paged format of database structure which can reduce the computation time while at the searching period.Also,with the help of a neural network,the relevancies of feature attributes in the database are predicted,and the matching index was sorted to provide the recommended data for given query data.This was achieved by using the Distributional Recurrent Neural Network(DRNN).This is an enhanced model of Neural Network technology to find the relevancy based on the correlation factor of the feature set.The training process of the DRNN classifier was carried out by estimating the correlation factor of the attributes of the dataset.These are formed as clusters and paged with proper indexing based on the MPS parameter of similarity metric.The overall performance of the proposed work can be evaluated by varying the size of the training database by 60%,70%,and 80%.The parameters that are considered for performance analysis are Precision,Recall,F1-score and the accuracy of data retrieval,the query recommendation output,and comparison with other state-of-art methods.展开更多
文摘Based on a fuzzy neural network, the letter presents an approach for the induction of decision trees. The approach makes use of the weights of fuzzy mappings in the fuzzy neural network which has been trained. It can realize the optimization of fuzzy decision trees by branch cutting, and improve the ratio of correctness and efficiency of the induction of decision trees.
基金College of Agriculture and Natural Resources,University of Tehran for financial support of the study(Grant No.7104017/6/24 and 28)
文摘To build any spatial soil database, a set of environmental data including digital elevation model(DEM) and satellite images beside geomorphic landscape description are essentials. Such a database, integrates field observations and laboratory analyses data with the results obtained from qualitative and quantitative models. So far, various techniques have been developed for soil data processing. The performance of Artificial Neural Network(ANN) and Decision Tree(DT) models was compared to map out some soil attributes in Alborz Province, Iran. Terrain attributes derived from a DEM along with Landsat 8 ETM+, geomorphology map, and the routine laboratory analyses of the studied area were used as input data. The relationships between soil properties(including sand, silt, clay, electrical conductivity, organic carbon, and carbonates) and the environmental variables were assessed using the Pearson Correlation Coefficient and Principle Components Analysis. Slope, elevation, geomforms, carbonate index, stream network, wetness index, and the band’s number 2, 3, 4, and 5 were the most significantly correlated variables. ANN and DT did not show the same accuracy in predicting all parameters. The DT model showed higher performances in estimating sand(R^2=0.73), silt(R^2=0.70), clay(R^2=0.72), organic carbon(R^2=0.71), and carbonates(R^2=0.70). While the ANN model only showed higher performance in predicting soil electrical conductivity(R^2=0.95). The results showed that determination the best model to use, is dependent upon the relation between the considered soil properties with the environmental variables. However, the DT model showed more reasonable results than the ANN model in this study. The results showed that before using a certain model to predict variability of all soil parameters, it would be better to evaluate the efficiency of all possible models for choosing the best fitted model for each property. In other words, most of the developed models are sitespecific and may not be applicable to use for predicting other soil properties or other area.
文摘Fetal distress is one of the main factors to cesarean section in obstetrics and gynecology. If the fetus lack of oxygen in uterus, threat to the fetal health and fetal death could happen. Cardiotocography (CTG) is the most widely used technique to monitor the fetal health and fetal heart rate (FHR) is an important index to identify occurs of fetal distress. This study is to propose discriminant analysis (DA), decision tree (DT), and artificial neural network (ANN) to evaluate fetal distress. The results show that the accuracies of DA, DT and ANN are 82.1%, 86.36% and 97.78%, respectively.
文摘Tuberculosis remains an important problem in public health that threatens the world, including the Philippines. Treatment relapse continues to place a severe problem on patients and TB programs worldwide. A significant reason for the development of decline is poor compliance with medical treatments. The objectives of this research are to generate a predictive data mining model to classify the treatment relapse of TB patients and to identify the features influencing the category of treatment relapse. The TB patient dataset is applied and tested in decision tree J48 algorithm using WEKA. The J48 model identified the three (3) significant independent variables (DSSM Result, Age, and Sex) as predictors of category treatment relapse.
基金Dr. Steve Jones, Scientific Advisor of the Canon Foundation for Scientific Research (7200 The Quorum, Oxford Business Park, Oxford OX4 2JZ, England). Canon Foundation for Scientific Research funded the UPC 2013 tuition fees of the corresponding author during her writing this article
文摘In computational physics proton transfer phenomena could be viewed as pattern classification problems based on a set of input features allowing classification of the proton motion into two categories: transfer 'occurred' and transfer 'not occurred'. The goal of this paper is to evaluate the use of artificial neural networks in the classification of proton transfer events, based on the feed-forward back propagation neural network, used as a classifier to distinguish between the two transfer cases. In this paper, we use a new developed data mining and pattern recognition tool for automating, controlling, and drawing charts of the output data of an Empirical Valence Bond existing code. The study analyzes the need for pattern recognition in aqueous proton transfer processes and how the learning approach in error back propagation (multilayer perceptron algorithms) could be satisfactorily employed in the present case. We present a tool for pattern recognition and validate the code including a real physical case study. The results of applying the artificial neural networks methodology to crowd patterns based upon selected physical properties (e.g., temperature, density) show the abilities of the network to learn proton transfer patterns corresponding to properties of the aqueous environments, which is in turn proved to be fully compatible with previous proton transfer studies.
基金Supported by the National Natural Science Foundation of China (No.60135010)partially supported by the National Grand Fundamental Research 973 Program of China (No.G1998030509).
文摘In this letter,Constructive Neural Networks (CNN) is used in large-scale data mining. By introducing the principle and characteristics of CNN and pointing out its deficiencies,fuzzy theory is adopted to improve the covering algorithms. The threshold of covering algorithms is redefined. "Extended area" for test samples is built. The inference of the outlier is eliminated. Furthermore,"Sphere Neighborhood (SN)" are constructed. The membership functions of test samples are given and all of the test samples are determined accordingly. The method is used to mine large wireless monitor data (about 3×107 data points),and knowledge is found effectively.
基金the National Natural Science Foundation of China (Grant No. 50128706).
文摘Rough set (RS) and radial basis function neural network (RBFNN) based insulation data mining fault diagnosis for power transformer is proposed. On the one hand rough set is used as front of RBFNN to simplify the input of RBFNN and mine the rules. The mined rules whose “confidence” and “support” is higher than requirement are used to offer fault diagnosis service for power transformer directly. On the other hand the mining samples corresponding to the mined rule, whose “confidence and support” is lower than requirement, are used to be training samples set of RBFNN and these samples are clustered by rough set. The center of each clustering set is used to be center of radial basis function, i.e., as the hidden layer neuron. The RBFNN is structured with above base, which is used to diagnose the case that can not be diagnosed by mined simplified valuable rules based on rough set. The advantages and effectiveness of this method are verified by testing.
文摘With the progress of computer technology, data mining has become a hot research area in the computer science community. In this paper, we undertake theoretical research on the novel data mining algorithm based on fuzzy clustering theory and deep neural network. The focus of data mining in seeking the visualization methods in the process of data mining, knowledge discovery process can be users to understand, to facilitate human-computer interaction in knowledge discovery process. Inspired by the brain structure layers, neural network researchers have been trying to multilayer neural network research. The experiment result shows that out algorithm is effective and robust.
基金Supported by the National Science of China(6 0 0 75 0 15 ) and Key Project of Scientific and Technological Departmentin Anhui
文摘This paper first puts forward a case based system framework based on data mining techniques. Then the paper examines the possibility of using neural networks as a method of retrieval in such a case based system. In this system we propose data mining algorithms to discover case knowledge and other algorithms.
文摘This paper integrates genetic algorithm and neura l network techniques to build new temporal predicting analysis tools for geographic information system (GIS). These new GIS tools can be readily applied in a practical and appropriate manner in spatial and temp oral research to patch the gaps in GIS data mining and knowledge discovery functions. The specific achievement here is the integration of related artificial intellig ent technologies into GIS software to establish a conceptual spatial and temporal analysis framework. And, by using this framework to develop an artificial intelligent spatial and tempor al information analyst (ASIA) system which then is fully utilized in the existin g GIS package. This study of air pollutants forecasting provides a geographical practical case to prove the rationalization and justness of the conceptual tempo ral analysis framework.
基金Supported by Key Science-Technology Project ofHeilongjiang Province(GA010401-3)
文摘By rapid progress of network and storage technologies, a huge amount of electronic data such as Web pages and XML has been available on Internet. In this paper, we study a data-mining problem of discovering frequent ordered sub-trees in a large collection of XML data, where both of the patterns and the data are modeled by labeled ordered trees. We present an efficient algorithm of Ordered Subtree Miner (OSTMiner) based on two- layer neural networks with Hebb rule, that computes all ordered sub-trees appearing in a collection of XML trees with frequent above a user-specified threshold using a special structure EM-tree. In this algo- rithm, EM-tree is used as an extended merging tree to supply scheme information for efficient pruning and mining frequent sub-trees. Experiments results showed that OSTMiner has good response time and scales well.
文摘It is difficult if not impossible to appropriately and effectively select from among the vast pool of existing neural network machine learning predictive models for industrial incorporation or academic research exploration and enhancement. When all models outperform all the others under disparate circumstances, none of the models do. Selecting the ideal model becomes a matter of ill-supported opinion ungrounded on the extant real world environment. This paper proposes a novel grouping of the model pool grounded along a non-stationary real world data line into two groups: Permanent Data Learning and Reversible Data Learning. This paper further proposes a novel approach towards qualitatively and quantitatively demonstrating their significant differences based on how they alternatively approach dynamic and raw real world data vs static and prescient data mining biased laboratory data. The results across 2040 separate simulation runs using 15,600 data points in realistically operationally controlled data environments show that the two-group division is effective and significant with clear qualitative, quantitative and theoretical support. Results across the empirical and theoretical spectrum are internally and externally consistent yet demonstrative of why and how this result is non-obvious.
文摘This paper explores the use of soft decision trees [1] in basic reinforcement applications to examine the efficacy of using passive-expert like networks for optimal Q-Value learning on Artificial Neural Networks (ANN). The soft decision tree networks were built using the PyTorch machine learning and the OpenAi’s Gym environment frameworks. The conducted research study aimed at assessing the performance of soft decision tree networks on Cartpole as provided in the OpenAi Gym software package. The baseline performance metric that the soft decision tree networks were compared against was a simple Deep Neural Network using several linear layers with ReLU and Softmax activation functions for the input and output layers, respectively. All networks were trained using the Backpropagation algorithm provided generically by PyTorch’sAutograd module.
基金the Researchers Supporting Project Number(RSP2023R 509)King Saud University,Riyadh,Saudi ArabiaThis work was supported in part by the Higher Education Sprout Project from the Ministry of Education(MOE)and National Science and Technology Council,Taiwan,(109-2628-E-224-001-MY3)in part by Isuzu Optics Corporation.Dr.Shih-Yu Chen is the corresponding author.
文摘The medical community has more concern on lung cancer analysis.Medical experts’physical segmentation of lung cancers is time-consuming and needs to be automated.The research study’s objective is to diagnose lung tumors at an early stage to extend the life of humans using deep learning techniques.Computer-Aided Diagnostic(CAD)system aids in the diagnosis and shortens the time necessary to detect the tumor detected.The application of Deep Neural Networks(DNN)has also been exhibited as an excellent and effective method in classification and segmentation tasks.This research aims to separate lung cancers from images of Magnetic Resonance Imaging(MRI)with threshold segmentation.The Honey hook process categorizes lung cancer based on characteristics retrieved using several classifiers.Considering this principle,the work presents a solution for image compression utilizing a Deep Wave Auto-Encoder(DWAE).The combination of the two approaches significantly reduces the overall size of the feature set required for any future classification process performed using DNN.The proposed DWAE-DNN image classifier is applied to a lung imaging dataset with Radial Basis Function(RBF)classifier.The study reported promising results with an accuracy of 97.34%,whereas using the Decision Tree(DT)classifier has an accuracy of 94.24%.The proposed approach(DWAE-DNN)is found to classify the images with an accuracy of 98.67%,either as malignant or normal patients.In contrast to the accuracy requirements,the work also uses the benchmark standards like specificity,sensitivity,and precision to evaluate the efficiency of the network.It is found from an investigation that the DT classifier provides the maximum performance in the DWAE-DNN depending on the network’s performance on image testing,as shown by the data acquired by the categorizers themselves.
基金Supported by Science and Technology Plan of Mudanjiang City (G200920064)Teaching Reform Construction of Mudanjiang Normal University (10-xj11080)
文摘Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting model of agro-meteorological disaster grade was established by adopting the C4.5 classification algorithm of decision tree,which can forecast the direct economic loss degree to provide rational data mining model and obtain effective analysis results.
基金Taif University Researchers Supporting Project Number(TURSP-2020/73)Taif University,Taif,Saudi Arabia.
文摘Heart failure is now widely spread throughout the world.Heart disease affects approximately 48%of the population.It is too expensive and also difficult to cure the disease.This research paper represents machine learning models to predict heart failure.The fundamental concept is to compare the correctness of various Machine Learning(ML)algorithms and boost algorithms to improve models’accuracy for prediction.Some supervised algorithms like K-Nearest Neighbor(KNN),Support Vector Machine(SVM),Decision Trees(DT),Random Forest(RF),Logistic Regression(LR)are considered to achieve the best results.Some boosting algorithms like Extreme Gradient Boosting(XGBoost)and Cat-Boost are also used to improve the prediction using Artificial Neural Networks(ANN).This research also focuses on data visualization to identify patterns,trends,and outliers in a massive data set.Python and Scikit-learns are used for ML.Tensor Flow and Keras,along with Python,are used for ANN model train-ing.The DT and RF algorithms achieved the highest accuracy of 95%among the classifiers.Meanwhile,KNN obtained a second height accuracy of 93.33%.XGBoost had a gratified accuracy of 91.67%,SVM,CATBoost,and ANN had an accuracy of 90%,and LR had 88.33%accuracy.
基金supported by the National Natural Science Foundation of China under Grant 61762031the Science and Technology Major Project of Guangxi Province under Grant AA19046004+2 种基金the Natural Science Foundation of Guangxi under Grant 2021JJA170130the Innovation Project of Guangxi Graduate Education under Grant YCSW2022326the Research Project of Guangxi Philosophy and Social Science Planning under Grant 21FGL040。
文摘The recommendation system(RS)on the strength of Graph Neural Networks(GNN)perceives a user-item interaction graph after collecting all items the user has interacted with.Afterward the RS performs neighborhood aggregation on the graph to generate long-term preference representations for the user in quick succession.However,user preferences are dynamic.With the passage of time and some trend guidance,users may generate some short-term preferences,which are more likely to lead to user-item interactions.A GNN recommendation based on long-and short-term preference(LSGNN)is proposed to address the above problems.LSGNN consists of four modules,using a GNN combined with the attention mechanism to extract long-term preference features,using Bidirectional Encoder Representation from Transformers(BERT)and the attention mechanism combined with Bi-Directional Gated Recurrent Unit(Bi-GRU)to extract short-term preference features,using Convolutional Neural Network(CNN)combined with the attention mechanism to add title and description representations of items,finally inner-producing long-term and short-term preference features as well as features of items to achieve recommendations.In experiments conducted on five publicly available datasets from Amazon,LSGNN is superior to state-of-the-art personalized recommendation techniques.
文摘In the data retrieval process of the Data recommendation system,the matching prediction and similarity identification take place a major role in the ontology.In that,there are several methods to improve the retrieving process with improved accuracy and to reduce the searching time.Since,in the data recommendation system,this type of data searching becomes complex to search for the best matching for given query data and fails in the accuracy of the query recommendation process.To improve the performance of data validation,this paper proposed a novel model of data similarity estimation and clustering method to retrieve the relevant data with the best matching in the big data processing.In this paper advanced model of the Logarithmic Directionality Texture Pattern(LDTP)method with a Metaheuristic Pattern Searching(MPS)system was used to estimate the similarity between the query data in the entire database.The overall work was implemented for the application of the data recommendation process.These are all indexed and grouped as a cluster to form a paged format of database structure which can reduce the computation time while at the searching period.Also,with the help of a neural network,the relevancies of feature attributes in the database are predicted,and the matching index was sorted to provide the recommended data for given query data.This was achieved by using the Distributional Recurrent Neural Network(DRNN).This is an enhanced model of Neural Network technology to find the relevancy based on the correlation factor of the feature set.The training process of the DRNN classifier was carried out by estimating the correlation factor of the attributes of the dataset.These are formed as clusters and paged with proper indexing based on the MPS parameter of similarity metric.The overall performance of the proposed work can be evaluated by varying the size of the training database by 60%,70%,and 80%.The parameters that are considered for performance analysis are Precision,Recall,F1-score and the accuracy of data retrieval,the query recommendation output,and comparison with other state-of-art methods.