To realize the reuse of process design knowledge and improve the efficiency and quality of process design, a method for extracting thinking process rules for process design is proposed. An instance representation mode...To realize the reuse of process design knowledge and improve the efficiency and quality of process design, a method for extracting thinking process rules for process design is proposed. An instance representation model of the process planning reflecting the thinking process of techni- cians is established to achieve an effective representation of the process documents. The related process attributes are extracted from the model to form the related events. The manifold learning algorithm and clustering analysis are used to preprocess the process instance data. A rule extraction mechanism of process design is introduced, which is based on the related events after dimension reduction and clustering, and uses the association rule mining algorithm to realize the similar process information extraction in the same cluster. Through the vectorization description of the related events, the final process design rules are formed. Finally, an example is given to evaluate the method of process design rules extraction.展开更多
In metal cutting industry it is a common practice to search for optimal combination of cutting parameters in order to maximize the tool life for a fixed minimum value of material removal rate(MRR). After the advent ...In metal cutting industry it is a common practice to search for optimal combination of cutting parameters in order to maximize the tool life for a fixed minimum value of material removal rate(MRR). After the advent of high-speed milling(HSM) pro cess, lots of experimental and theoretical researches have been done for this purpose which mainly emphasized on the optimization of the cutting parameters. It is highly beneficial to convert raw data into a comprehensive knowledge-based expert system using fuzzy logic as the reasoning mechanism. In this paper an attempt has been presented for the extraction of the rules from fuzzy neural network(FNN) so as to have the most effective knowledge-base for given set of data. Experiments were conducted to determine the best values of cutting speeds that can maximize tool life for different combinations of input parameters. A fuzzy neural network was constructed based on the fuzzification of input parameters and the cutting speed. After training process, raw rule sets were extracted and a rule pruning approach was proposed to obtain concise linguistic rules. The estimation process with fuzzy inference showed that the optimized combination of fuzzy rules provided the estimation error of only 6.34 m/min as compared to 314 m/min of that of randomized combination of rule s.展开更多
For various reasons,many of the security programming rules applicable to specific software have not been recorded in official documents,and hence can hardly be employed by static analysis tools for detection.In this p...For various reasons,many of the security programming rules applicable to specific software have not been recorded in official documents,and hence can hardly be employed by static analysis tools for detection.In this paper,we propose a new approach,named SVR-Miner(Security Validation Rules Miner),which uses frequent sequence mining technique [1-4] to automatically infer implicit security validation rules from large software code written in C programming language.Different from the past works in this area,SVR-Miner introduces three techniques which are sensitive thread,program slicing [5-7],and equivalent statements computing to improve the accuracy of rules.Experiments with the Linux Kernel demonstrate the effectiveness of our approach.With the ten given sensitive threads,SVR-Miner automatically generated 17 security validation rules and detected 8 violations,5 of which were published by Linux Kernel Organization before we detected them.We have reported the other three to the Linux Kernel Organization recently.展开更多
Disease diagnosis is a challenging task due to a large number of associated factors.Uncertainty in the diagnosis process arises frominaccuracy in patient attributes,missing data,and limitation in the medical expert’s...Disease diagnosis is a challenging task due to a large number of associated factors.Uncertainty in the diagnosis process arises frominaccuracy in patient attributes,missing data,and limitation in the medical expert’s ability to define cause and effect relationships when there are multiple interrelated variables.This paper aims to demonstrate an integrated view of deploying smart disease diagnosis using the Internet of Things(IoT)empowered by the fuzzy inference system(FIS)to diagnose various diseases.The Fuzzy Systemis one of the best systems to diagnose medical conditions because every disease diagnosis involves many uncertainties,and fuzzy logic is the best way to handle uncertainties.Our proposed system differentiates new cases provided symptoms of the disease.Generally,it becomes a time-sensitive task to discriminate symptomatic diseases.The proposed system can track symptoms firmly to diagnose diseases through IoT and FIS smartly and efficiently.Different coefficients have been employed to predict and compute the identified disease’s severity for each sign of disease.This study aims to differentiate and diagnose COVID-19,Typhoid,Malaria,and Pneumonia.This study used the FIS method to figure out the disease over the use of given data related to correlating with input symptoms.MATLAB tool is utilised for the implementation of FIS.Fuzzy procedure on the aforementioned given data presents that affectionate disease can derive from the symptoms.The results of our proposed method proved that FIS could be utilised for the diagnosis of other diseases.This study may assist doctors,patients,medical practitioners,and other healthcare professionals in early diagnosis and better treat diseases.展开更多
This paper combines computational intelligence tools: neural network, fuzzylogic, and genetic algorithm to develop a data mining architecture (NFGDM), which discovers patternsand represents them in understandable form...This paper combines computational intelligence tools: neural network, fuzzylogic, and genetic algorithm to develop a data mining architecture (NFGDM), which discovers patternsand represents them in understandable forms. In the NFGDM, input data arepreprocesscd byfuzzification, the preprocessed data of input variables arc then used to train a radial basisprobabilistic neural network to classify the dataset according to the classes considered, A ruleextraction technique is then applied in order to extract explicit knowledge from the trained neuralnetworks and represent it m the form of fuzzy if-then rules. In the final stage, genetic algorithmis used as a rule-pruning module to eliminate those weak rules that are still in the rule bases.Comparison with some known neural network classifier, the architecture has fast learning speed, andit is characterized by the incorporation of the possibility information into the consequents ofclassification rules in human understandable forms. The experiments show that the NFGDM is moreefficient and more robust than traditional decision tree method.展开更多
Heavy renewable penetrations and high-voltage cross-regional transmission systems reduce the inertia and critical frequency stability of power systems after disturbances.Therefore,the power system operators should ens...Heavy renewable penetrations and high-voltage cross-regional transmission systems reduce the inertia and critical frequency stability of power systems after disturbances.Therefore,the power system operators should ensure the frequency nadirs after possible disturbances are within the set restriction,e.g.,0.20 Hz.Traditional methods utilize linearized and simplified control models to quantify the frequency nadirs and achieve frequency-constrained unit commitments(FCUCs).However,the simplified models are hard to depict the frequency responses of practical units after disturbances.Also,they usually neglect the regulations from battery storage.This paper achieves FCUCs with linear rules extracted from massive simulation results.We simulate the frequency responses on typical thermal-hydro-storage systems under diverse unit online conditions.Then,we extract the rules of frequency nadirs after disturbances merely with linear support vector machine to evaluate the frequency stability of power systems.The algorithm holds a high accuracy in a wide range of frequency restrictions.Finally,we apply the rules to three typical cases to show the influences of frequency constraints on unit commitments.展开更多
In the research of rule extraction from neural networks, fidelity describeshow well the rules mimic the behavior of a neural network while accuracy describes how well therules can be generalized. This paper identifies...In the research of rule extraction from neural networks, fidelity describeshow well the rules mimic the behavior of a neural network while accuracy describes how well therules can be generalized. This paper identifies the fidelity-acuracy dilemma. It argues todistinguish rule extraction using neural networks and rule extraction for neural networks accordingto their different goals, where fidelity and accuracy should be excluded from the rule qualityevaluation framework, respectively.展开更多
This paper discusses how to extract symbolic rules from trained artificial neural network (ANN) in domains involving classification using genetic algorithms (GA). Previous methods based on an exhaustive analysis of ne...This paper discusses how to extract symbolic rules from trained artificial neural network (ANN) in domains involving classification using genetic algorithms (GA). Previous methods based on an exhaustive analysis of network connections and output values have already been demonstrated to be intractable in that the scale-up factor increases with the number of nodes and connections in the network. Some experiments explaining effectiveness of the presented method are given as well.展开更多
Phishing is a technique under Social Engineering attacks which is most widely used to get user sensitive information,such as login credentials and credit and debit card information,etc.It is carried out by a person ma...Phishing is a technique under Social Engineering attacks which is most widely used to get user sensitive information,such as login credentials and credit and debit card information,etc.It is carried out by a person masquerading as an authentic individual.To protect web users from these attacks,various anti-phishing techniques are developed,but they fail to protect the user from these attacks in various ways.In this paper,we propose a novel technique to identify phishing websites effortlessly on the client side by proposing a novel browser architecture.In this system,we use the rule of extraction framework to extract the properties or features of a website using the URL only.This list consists of 30 different properties of a URL,which will later be used by the Random Forest Classification machine learning model to detect the authenticity of the website.A dataset consisting of 11,055 tuples is used to train the model.These processes are carried out on the client-side with the help of a redesigned browser architecture.Today Researches have come up with machine learning frameworks to detect phishing sites,but they are not in a state to be used by individuals having no technical knowledge.To make sure that these tools are accessible to every individual,we have improvised and introduced detection methods into the browser architecture named as‘Embedded Phishing Detection Browser’(EPDB),which is a novel method to preserve the existing user experience while improving the security.The newly designed browser architecture introduces a special segment to perform phishing detection operations in real-time.We have prototyped this technique to ensure maximum security,better accuracy of 99.36%in the identification of phishing websites in realtime.展开更多
In the quest for interpretable models,two versions of a neural network rule extraction algorithm were proposed and compared.The two algorithms are called the Piece-Wise Linear Artificial Neural Network(PWL-ANN)and enh...In the quest for interpretable models,two versions of a neural network rule extraction algorithm were proposed and compared.The two algorithms are called the Piece-Wise Linear Artificial Neural Network(PWL-ANN)and enhanced Piece-Wise Linear Artificial Neural Network(enhanced PWL-ANN)algorithms.The PWL-ANN algorithm is a decomposition artificial neural network(ANN)rule extraction algorithm,and the enhanced PWL-ANN algorithm improves upon the PWL-ANN algorithm and extracts multiple linear regression equations from a trained ANN model by approximating the hidden sigmoid activation functions using N-piece linear equations.In doing so,the algorithm provides interpretable models from the originally trained opaque ANN models.A detailed application case study illustrates how the generated enhanced-PWL-ANN models can provide understandable IF-THEN rules about a problem domain.Comparison of the results generated by the two versions of the PWL-ANN algorithm showed that in comparison to the PWL-ANN models,the enhanced-PWL-ANN models support improved fidelities to the originally trained ANN models.The results also showed that more concise rule sets could be generated using the enhanced-PWL-ANN algorithm.If a more simplified set of rules is desired,the enhanced-PWL-ANN algorithm can be combined with the decision tree approach.Potential application of the algorithms to domains related to petroleum engineering can help enhance understanding of the problems.展开更多
Phishing is a technique under Social Engineering attacks which is most widely used to get user sensitive information,such as login credentials and credit and debit card information,etc.It is carried out by a person ma...Phishing is a technique under Social Engineering attacks which is most widely used to get user sensitive information,such as login credentials and credit and debit card information,etc.It is carried out by a person masquerading as an authentic individual.To protect web users from these attacks,various anti-phishing techniques are developed,but they fail to protect the user from these attacks in various ways.In this paper,we propose a novel technique to identify phishing websites effortlessly on the client side by proposing a novel browser architecture.In this system,we use the rule of extraction framework to extract the properties or features of a website using the URL only.This list consists of 30 different properties of a URL,which will later be used by the Random Forest Classification machine learning model to detect the authenticity of the website.A dataset consisting of 11,055 tuples is used to train the model.These processes are carried out on the client-side with the help of a redesigned browser architecture.Today Researches have come up with machine learning frameworks to detect phishing sites,but they are not in a state to be used by individuals having no technical knowledge.To make sure that these tools are accessible to every individual,we have improvised and introduced detection methods into the browser architecture named as‘Embedded Phishing Detection Browser’(EPDB),which is a novel method to preserve the existing user experience while improving the security.The newly designed browser architecture introduces a special segment to perform phishing detection operations in real-time.We have prototyped this technique to ensure maximum security,better accuracy of 99.36% in the identification of phishing websites in realtime.展开更多
Neural network is widely used in stock price forecasting,but it lacks interpretability because of its“black box”characteristics.In this paper,L1-orthogonal regularization method is used in the GRU model.A decision t...Neural network is widely used in stock price forecasting,but it lacks interpretability because of its“black box”characteristics.In this paper,L1-orthogonal regularization method is used in the GRU model.A decision tree,GRU-DT,was conducted to represent the prediction process of a neural network,and some rule screening algorithms were proposed to find out significant rules in the prediction.In the empirical study,the data of 10 different industries in China’s CSI 300 were selected for stock price trend prediction,and extracted rules were compared and analyzed.And the method of technical index discretization was used to make rules easy for decision-making.Empirical results show that the AUC of the model is stable between 0.72 and 0.74,and the value of F1 and Accuracy are stable between 0.68 and 0.70,indicating that discretized technical indicators can predict the short-term trend of stock price effectively.And the fidelity of GRU-DT to the GRU model reaches 0.99.The prediction rules of different industries have some commonness and individuality.展开更多
Support vector machines(SVMs) are supervised learning models traditionally employed for classification and regression analysis. In classification analysis, a set of training data is chosen, and each instance in the tr...Support vector machines(SVMs) are supervised learning models traditionally employed for classification and regression analysis. In classification analysis, a set of training data is chosen, and each instance in the training data is assigned a categorical class. An SVM then constructs a model based on a separating plane that maximizes the margin between different classes. Despite being one of the most popular classification models because of its strong performance empirically, understanding the knowledge captured in an SVM remains difficult. SVMs are typically applied in a black-box manner where the details of parameter tuning, training, and even the final constructed model are hidden from the users. This is natural since these details are often complex and difficult to understand without proper visualization tools. However, such an approach often brings about various problems including trial-and-error tuning and suspicious users who are forced to trust these models blindly.The contribution of this paper is a visual analysis approach for building SVMs in an open-box manner.Our goal is to improve an analyst's understanding of the SVM modeling process through a suite of visualization techniques that allow users to have full interactive visual control over the entire SVM training process.Our visual exploration tools have been developed to enable intuitive parameter tuning, training datamanipulation, and rule extraction as part of the SVM training process. To demonstrate the efficacy of our approach, we conduct a case study using a real-world robot control dataset.展开更多
On the basis of data mining and neural network, this paper proposes a general framework of the neural network expert system and discusses the key techniques in this kind of system. We apply these ideas on agricultural...On the basis of data mining and neural network, this paper proposes a general framework of the neural network expert system and discusses the key techniques in this kind of system. We apply these ideas on agricultural expert system to find some unknown useful knowledge and get some satisfactory results.展开更多
This paper presents the application of a neural network rule extraction algorithm,called the piecewise linear artificial neural network or PWL-ANN algorithm,on a carbon capture process system dataset.The objective of ...This paper presents the application of a neural network rule extraction algorithm,called the piecewise linear artificial neural network or PWL-ANN algorithm,on a carbon capture process system dataset.The objective of the application is to enhance understanding of the intricate relationships among the key process parameters.The algorithm extracts rules in the form of multiple linear regression equations by approximating the sigmoid activation functions of the hidden neurons in an artificial neural network(ANN).The PWL-ANN algorithm overcomes the weaknesses of the statistical regression approach,in which accuracies of the generated predictive models are often not satisfactory,and the opaqueness of the ANN models.The results show that the generated PWL-ANN models have accuracies that are as high as the originally trained ANN models of the four datasets of the carbon capture process system.An analysis of the extracted rules and the magnitude of the coefficients in the equations revealed that the three most significant parameters of the CO_(2) production rate are the steam flow rate through reboiler,reboiler pressure,and the CO_(2) concentration in the flue gas.展开更多
文摘To realize the reuse of process design knowledge and improve the efficiency and quality of process design, a method for extracting thinking process rules for process design is proposed. An instance representation model of the process planning reflecting the thinking process of techni- cians is established to achieve an effective representation of the process documents. The related process attributes are extracted from the model to form the related events. The manifold learning algorithm and clustering analysis are used to preprocess the process instance data. A rule extraction mechanism of process design is introduced, which is based on the related events after dimension reduction and clustering, and uses the association rule mining algorithm to realize the similar process information extraction in the same cluster. Through the vectorization description of the related events, the final process design rules are formed. Finally, an example is given to evaluate the method of process design rules extraction.
基金supported by International Science and Technology Cooperation project (Grant No. 2008DFA71750)
文摘In metal cutting industry it is a common practice to search for optimal combination of cutting parameters in order to maximize the tool life for a fixed minimum value of material removal rate(MRR). After the advent of high-speed milling(HSM) pro cess, lots of experimental and theoretical researches have been done for this purpose which mainly emphasized on the optimization of the cutting parameters. It is highly beneficial to convert raw data into a comprehensive knowledge-based expert system using fuzzy logic as the reasoning mechanism. In this paper an attempt has been presented for the extraction of the rules from fuzzy neural network(FNN) so as to have the most effective knowledge-base for given set of data. Experiments were conducted to determine the best values of cutting speeds that can maximize tool life for different combinations of input parameters. A fuzzy neural network was constructed based on the fuzzification of input parameters and the cutting speed. After training process, raw rule sets were extracted and a rule pruning approach was proposed to obtain concise linguistic rules. The estimation process with fuzzy inference showed that the optimized combination of fuzzy rules provided the estimation error of only 6.34 m/min as compared to 314 m/min of that of randomized combination of rule s.
基金National Natural Science Foundation of China under Grant No.60873213,91018008 and 61070192Beijing Science Foundation under Grant No. 4082018Shanghai Key Laboratory of Intelligent Information Processing of China under Grant No. IIPL-09-006
文摘For various reasons,many of the security programming rules applicable to specific software have not been recorded in official documents,and hence can hardly be employed by static analysis tools for detection.In this paper,we propose a new approach,named SVR-Miner(Security Validation Rules Miner),which uses frequent sequence mining technique [1-4] to automatically infer implicit security validation rules from large software code written in C programming language.Different from the past works in this area,SVR-Miner introduces three techniques which are sensitive thread,program slicing [5-7],and equivalent statements computing to improve the accuracy of rules.Experiments with the Linux Kernel demonstrate the effectiveness of our approach.With the ten given sensitive threads,SVR-Miner automatically generated 17 security validation rules and detected 8 violations,5 of which were published by Linux Kernel Organization before we detected them.We have reported the other three to the Linux Kernel Organization recently.
文摘Disease diagnosis is a challenging task due to a large number of associated factors.Uncertainty in the diagnosis process arises frominaccuracy in patient attributes,missing data,and limitation in the medical expert’s ability to define cause and effect relationships when there are multiple interrelated variables.This paper aims to demonstrate an integrated view of deploying smart disease diagnosis using the Internet of Things(IoT)empowered by the fuzzy inference system(FIS)to diagnose various diseases.The Fuzzy Systemis one of the best systems to diagnose medical conditions because every disease diagnosis involves many uncertainties,and fuzzy logic is the best way to handle uncertainties.Our proposed system differentiates new cases provided symptoms of the disease.Generally,it becomes a time-sensitive task to discriminate symptomatic diseases.The proposed system can track symptoms firmly to diagnose diseases through IoT and FIS smartly and efficiently.Different coefficients have been employed to predict and compute the identified disease’s severity for each sign of disease.This study aims to differentiate and diagnose COVID-19,Typhoid,Malaria,and Pneumonia.This study used the FIS method to figure out the disease over the use of given data related to correlating with input symptoms.MATLAB tool is utilised for the implementation of FIS.Fuzzy procedure on the aforementioned given data presents that affectionate disease can derive from the symptoms.The results of our proposed method proved that FIS could be utilised for the diagnosis of other diseases.This study may assist doctors,patients,medical practitioners,and other healthcare professionals in early diagnosis and better treat diseases.
基金Supported by the National Research Foundation for the Doctoral Program of Higher Education of China (20030487032)
文摘This paper combines computational intelligence tools: neural network, fuzzylogic, and genetic algorithm to develop a data mining architecture (NFGDM), which discovers patternsand represents them in understandable forms. In the NFGDM, input data arepreprocesscd byfuzzification, the preprocessed data of input variables arc then used to train a radial basisprobabilistic neural network to classify the dataset according to the classes considered, A ruleextraction technique is then applied in order to extract explicit knowledge from the trained neuralnetworks and represent it m the form of fuzzy if-then rules. In the final stage, genetic algorithmis used as a rule-pruning module to eliminate those weak rules that are still in the rule bases.Comparison with some known neural network classifier, the architecture has fast learning speed, andit is characterized by the incorporation of the possibility information into the consequents ofclassification rules in human understandable forms. The experiments show that the NFGDM is moreefficient and more robust than traditional decision tree method.
基金supported by the research project from China Three Gorges Corporation(No.202103386).
文摘Heavy renewable penetrations and high-voltage cross-regional transmission systems reduce the inertia and critical frequency stability of power systems after disturbances.Therefore,the power system operators should ensure the frequency nadirs after possible disturbances are within the set restriction,e.g.,0.20 Hz.Traditional methods utilize linearized and simplified control models to quantify the frequency nadirs and achieve frequency-constrained unit commitments(FCUCs).However,the simplified models are hard to depict the frequency responses of practical units after disturbances.Also,they usually neglect the regulations from battery storage.This paper achieves FCUCs with linear rules extracted from massive simulation results.We simulate the frequency responses on typical thermal-hydro-storage systems under diverse unit online conditions.Then,we extract the rules of frequency nadirs after disturbances merely with linear support vector machine to evaluate the frequency stability of power systems.The algorithm holds a high accuracy in a wide range of frequency restrictions.Finally,we apply the rules to three typical cases to show the influences of frequency constraints on unit commitments.
文摘In the research of rule extraction from neural networks, fidelity describeshow well the rules mimic the behavior of a neural network while accuracy describes how well therules can be generalized. This paper identifies the fidelity-acuracy dilemma. It argues todistinguish rule extraction using neural networks and rule extraction for neural networks accordingto their different goals, where fidelity and accuracy should be excluded from the rule qualityevaluation framework, respectively.
文摘This paper discusses how to extract symbolic rules from trained artificial neural network (ANN) in domains involving classification using genetic algorithms (GA). Previous methods based on an exhaustive analysis of network connections and output values have already been demonstrated to be intractable in that the scale-up factor increases with the number of nodes and connections in the network. Some experiments explaining effectiveness of the presented method are given as well.
文摘Phishing is a technique under Social Engineering attacks which is most widely used to get user sensitive information,such as login credentials and credit and debit card information,etc.It is carried out by a person masquerading as an authentic individual.To protect web users from these attacks,various anti-phishing techniques are developed,but they fail to protect the user from these attacks in various ways.In this paper,we propose a novel technique to identify phishing websites effortlessly on the client side by proposing a novel browser architecture.In this system,we use the rule of extraction framework to extract the properties or features of a website using the URL only.This list consists of 30 different properties of a URL,which will later be used by the Random Forest Classification machine learning model to detect the authenticity of the website.A dataset consisting of 11,055 tuples is used to train the model.These processes are carried out on the client-side with the help of a redesigned browser architecture.Today Researches have come up with machine learning frameworks to detect phishing sites,but they are not in a state to be used by individuals having no technical knowledge.To make sure that these tools are accessible to every individual,we have improvised and introduced detection methods into the browser architecture named as‘Embedded Phishing Detection Browser’(EPDB),which is a novel method to preserve the existing user experience while improving the security.The newly designed browser architecture introduces a special segment to perform phishing detection operations in real-time.We have prototyped this technique to ensure maximum security,better accuracy of 99.36%in the identification of phishing websites in realtime.
文摘In the quest for interpretable models,two versions of a neural network rule extraction algorithm were proposed and compared.The two algorithms are called the Piece-Wise Linear Artificial Neural Network(PWL-ANN)and enhanced Piece-Wise Linear Artificial Neural Network(enhanced PWL-ANN)algorithms.The PWL-ANN algorithm is a decomposition artificial neural network(ANN)rule extraction algorithm,and the enhanced PWL-ANN algorithm improves upon the PWL-ANN algorithm and extracts multiple linear regression equations from a trained ANN model by approximating the hidden sigmoid activation functions using N-piece linear equations.In doing so,the algorithm provides interpretable models from the originally trained opaque ANN models.A detailed application case study illustrates how the generated enhanced-PWL-ANN models can provide understandable IF-THEN rules about a problem domain.Comparison of the results generated by the two versions of the PWL-ANN algorithm showed that in comparison to the PWL-ANN models,the enhanced-PWL-ANN models support improved fidelities to the originally trained ANN models.The results also showed that more concise rule sets could be generated using the enhanced-PWL-ANN algorithm.If a more simplified set of rules is desired,the enhanced-PWL-ANN algorithm can be combined with the decision tree approach.Potential application of the algorithms to domains related to petroleum engineering can help enhance understanding of the problems.
文摘Phishing is a technique under Social Engineering attacks which is most widely used to get user sensitive information,such as login credentials and credit and debit card information,etc.It is carried out by a person masquerading as an authentic individual.To protect web users from these attacks,various anti-phishing techniques are developed,but they fail to protect the user from these attacks in various ways.In this paper,we propose a novel technique to identify phishing websites effortlessly on the client side by proposing a novel browser architecture.In this system,we use the rule of extraction framework to extract the properties or features of a website using the URL only.This list consists of 30 different properties of a URL,which will later be used by the Random Forest Classification machine learning model to detect the authenticity of the website.A dataset consisting of 11,055 tuples is used to train the model.These processes are carried out on the client-side with the help of a redesigned browser architecture.Today Researches have come up with machine learning frameworks to detect phishing sites,but they are not in a state to be used by individuals having no technical knowledge.To make sure that these tools are accessible to every individual,we have improvised and introduced detection methods into the browser architecture named as‘Embedded Phishing Detection Browser’(EPDB),which is a novel method to preserve the existing user experience while improving the security.The newly designed browser architecture introduces a special segment to perform phishing detection operations in real-time.We have prototyped this technique to ensure maximum security,better accuracy of 99.36% in the identification of phishing websites in realtime.
基金National Defense Science and Technology Innovation Special ZoneProject (No. 18-163-11-ZT-002-045-04).
文摘Neural network is widely used in stock price forecasting,but it lacks interpretability because of its“black box”characteristics.In this paper,L1-orthogonal regularization method is used in the GRU model.A decision tree,GRU-DT,was conducted to represent the prediction process of a neural network,and some rule screening algorithms were proposed to find out significant rules in the prediction.In the empirical study,the data of 10 different industries in China’s CSI 300 were selected for stock price trend prediction,and extracted rules were compared and analyzed.And the method of technical index discretization was used to make rules easy for decision-making.Empirical results show that the AUC of the model is stable between 0.72 and 0.74,and the value of F1 and Accuracy are stable between 0.68 and 0.70,indicating that discretized technical indicators can predict the short-term trend of stock price effectively.And the fidelity of GRU-DT to the GRU model reaches 0.99.The prediction rules of different industries have some commonness and individuality.
基金supported in part by the National Basic Research Program of China (973 Program, No. 2015CB352503)the Major Program ofNational Natural Science Foundation of China (No. 61232012)the National Natural Science Foundation of China (No. 61422211)
文摘Support vector machines(SVMs) are supervised learning models traditionally employed for classification and regression analysis. In classification analysis, a set of training data is chosen, and each instance in the training data is assigned a categorical class. An SVM then constructs a model based on a separating plane that maximizes the margin between different classes. Despite being one of the most popular classification models because of its strong performance empirically, understanding the knowledge captured in an SVM remains difficult. SVMs are typically applied in a black-box manner where the details of parameter tuning, training, and even the final constructed model are hidden from the users. This is natural since these details are often complex and difficult to understand without proper visualization tools. However, such an approach often brings about various problems including trial-and-error tuning and suspicious users who are forced to trust these models blindly.The contribution of this paper is a visual analysis approach for building SVMs in an open-box manner.Our goal is to improve an analyst's understanding of the SVM modeling process through a suite of visualization techniques that allow users to have full interactive visual control over the entire SVM training process.Our visual exploration tools have been developed to enable intuitive parameter tuning, training datamanipulation, and rule extraction as part of the SVM training process. To demonstrate the efficacy of our approach, we conduct a case study using a real-world robot control dataset.
文摘On the basis of data mining and neural network, this paper proposes a general framework of the neural network expert system and discusses the key techniques in this kind of system. We apply these ideas on agricultural expert system to find some unknown useful knowledge and get some satisfactory results.
基金The first author is grateful for the scholarships and generous support from the Faculty of Graduate Studies and Research,University of Regina and from the Canada Research Chair Program.
文摘This paper presents the application of a neural network rule extraction algorithm,called the piecewise linear artificial neural network or PWL-ANN algorithm,on a carbon capture process system dataset.The objective of the application is to enhance understanding of the intricate relationships among the key process parameters.The algorithm extracts rules in the form of multiple linear regression equations by approximating the sigmoid activation functions of the hidden neurons in an artificial neural network(ANN).The PWL-ANN algorithm overcomes the weaknesses of the statistical regression approach,in which accuracies of the generated predictive models are often not satisfactory,and the opaqueness of the ANN models.The results show that the generated PWL-ANN models have accuracies that are as high as the originally trained ANN models of the four datasets of the carbon capture process system.An analysis of the extracted rules and the magnitude of the coefficients in the equations revealed that the three most significant parameters of the CO_(2) production rate are the steam flow rate through reboiler,reboiler pressure,and the CO_(2) concentration in the flue gas.