Association rule learning(ARL)is a widely used technique for discovering relationships within datasets.However,it often generates excessive irrelevant or ambiguous rules.Therefore,post-processing is crucial not only f...Association rule learning(ARL)is a widely used technique for discovering relationships within datasets.However,it often generates excessive irrelevant or ambiguous rules.Therefore,post-processing is crucial not only for removing irrelevant or redundant rules but also for uncovering hidden associations that impact other factors.Recently,several post-processing methods have been proposed,each with its own strengths and weaknesses.In this paper,we propose THAPE(Tunable Hybrid Associative Predictive Engine),which combines descriptive and predictive techniques.By leveraging both techniques,our aim is to enhance the quality of analyzing generated rules.This includes removing irrelevant or redundant rules,uncovering interesting and useful rules,exploring hidden association rules that may affect other factors,and providing backtracking ability for a given product.The proposed approach offers a tailored method that suits specific goals for retailers,enabling them to gain a better understanding of customer behavior based on factual transactions in the target market.We applied THAPE to a real dataset as a case study in this paper to demonstrate its effectiveness.Through this application,we successfully mined a concise set of highly interesting and useful association rules.Out of the 11,265 rules generated,we identified 125 rules that are particularly relevant to the business context.These identified rules significantly improve the interpretability and usefulness of association rules for decision-making purposes.展开更多
Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship ...Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes.However,the creation of fuzzy rules typically depends on expert knowledge,which may not fully leverage the label information in training data and may be subjective.To address this issue,a novel fuzzy rule oversampling approach is developed based on the learning vector quantization(LVQ)algorithm.In this method,the label information of the training data is utilized to determine the antecedent part of If-Then fuzzy rules by dynamically dividing attribute intervals using LVQ.Subsequently,fuzzy rules are generated and adjusted to calculate rule weights.The number of new samples to be synthesized for each rule is then computed,and samples from the minority class are synthesized based on the newly generated fuzzy rules.This results in the establishment of a fuzzy rule oversampling method based on LVQ.To evaluate the effectiveness of this method,comparative experiments are conducted on 12 publicly available imbalance datasets with five other sampling techniques in combination with the support function machine.The experimental results demonstrate that the proposed method can significantly enhance the classification algorithm across seven performance indicators,including a boost of 2.15%to 12.34%in Accuracy,6.11%to 27.06%in G-mean,and 4.69%to 18.78%in AUC.These show that the proposed method is capable of more efficiently improving the classification performance of imbalanced data.展开更多
In this paper,we propose an equal interval range approximation and expandinglearning rule for multi-layer perceptrons applied in pattern recognitions.Compared with tra-ditional BP algorithm,this learning rule requires...In this paper,we propose an equal interval range approximation and expandinglearning rule for multi-layer perceptrons applied in pattern recognitions.Compared with tra-ditional BP algorithm,this learning rule requires the output activations interval between themaximum target output node and other nodes to exceed a given equal interval range for eachtraining input pattern,thus it can train networks faster in much lower calculation cost andmay avoid the occurrences ot reversed target output and overlearning,hence it can improve thenetwork’s generalization abilities in pattern recognitions.Through gradually expanding of theinterval range,this learning rule can also enable the network to learn its targets more accuratelyin less additional training iterations.Finally,we apply this algorithm in network training inEEG detection,and the experimental results have shown the above advantages of the proposedalgorithm.展开更多
An operating rule classification system based on learning classifier system (LCS), which learns through credit assignment (bucket brigade algorithm, BBA) and rule discovery (genetic algorithm, GA), is establishe...An operating rule classification system based on learning classifier system (LCS), which learns through credit assignment (bucket brigade algorithm, BBA) and rule discovery (genetic algorithm, GA), is established to extract water-supply reservoir operating rules. The proposed system acquires an online identification rate of 95% for training samples and an offline rate of 85% for testing samples in a case study. The performances of the rule classification system are discussed from the rationality of the obtained rules, the impact of training samples on rule extraction, and a comparison between the rule classification system and the artificial neural network (ANN). The results indicate that the LCS is feasible and effective for the system to obtain the reservoir supply operating rules.展开更多
An efficient calibration algorithm for an ambulatory audiometric test system is proposed. This system utilizes a personal digital assistant (PDA) device to generate the correct sound pressure level (SPL) from an audio...An efficient calibration algorithm for an ambulatory audiometric test system is proposed. This system utilizes a personal digital assistant (PDA) device to generate the correct sound pressure level (SPL) from an audiometric transducer such as an earphone. The calibrated sound intensities for an audio-logical examination can be obtained in terms of the sound pressure levels of pure-tonal sinusoidal signals in eight-banded frequency ranges (250, 500, 1 000, 2 000, 3 000, 4 000, 6 000 and 8 000 Hz), and with mapping of the input sound pressure levels by the weight coefficients that are tuned by the delta learning rule. With this scheme, the sound intensities, which evoke eight-banded sound pressure levels by 5 dB steps from a minimum of 25 dB to a maximum of 80 dB, can be generated without volume displacement. Consequently, these sound intensities can be utilized to accurately determine the hearing threshold of a subject in the ambulatory audiometric testing environment.展开更多
In the mobile learning system,it is important to adapt to mobile devices.Most of mobile learning systems are not quickly suitable for mobile devices.In order to provide adaptive mobile services,the approach for adapta...In the mobile learning system,it is important to adapt to mobile devices.Most of mobile learning systems are not quickly suitable for mobile devices.In order to provide adaptive mobile services,the approach for adaptation is proposed in this paper.Firstly,context of mobile devices and its influence on mobile learning system are analized and business rules based on these analysis are presented.Then,using the approach,the mobile learning system is constructed.The example implies this approach can adapt the mobile service to the mobile devices flexibly.展开更多
AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a...AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks(BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRL_p. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRL_p to other stateof-the-art classifiers commonly used in biomedicine.RESULTS We evaluated the degree of incorporation of prior knowledge into BRL_p, with simulated data by measuring the Graph Edit Distance between the true datagenerating model and the model learned by BRL_p. We specified the true model using informative structurepriors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRL_p caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve(AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor(EGFR) gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRL_p model. This relevant background knowledge also led to a gain in AUC.CONCLUSION BRL_p enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.展开更多
To overcome the limitation that complex data types with noun attributes cannot be processed by rank learning algorithms, a new rank learning algorithm is designed. In the learning algorithm based on the decision tree,...To overcome the limitation that complex data types with noun attributes cannot be processed by rank learning algorithms, a new rank learning algorithm is designed. In the learning algorithm based on the decision tree, the splitting rule of the decision tree is revised with a new definition of rank impurity. A new rank learning algorithm, which can be intuitively explained, is obtained and its theoretical basis is provided. The experimental results show that in the aspect of average rank loss, the ranking tree algorithm outperforms perception ranking and ordinal regression algorithms and it also has a faster convergence speed. The rank learning algorithm based on the decision tree is able to process categorical data and select relative features.展开更多
Based on the least-square minimization a computationally efficient learning algorithm for the Principal Component Analysis(PCA) is derived. The dual learning rate parameters are adaptively introduced to make the propo...Based on the least-square minimization a computationally efficient learning algorithm for the Principal Component Analysis(PCA) is derived. The dual learning rate parameters are adaptively introduced to make the proposed algorithm providing the capability of the fast convergence and high accuracy for extracting all the principal components. It is shown that all the information needed for PCA can be completely represented by the unnormalized weight vector which is updated based only on the corresponding neuron input-output product. The convergence performance of the proposed algorithm is briefly analyzed.The relation between Oja’s rule and the least squares learning rule is also established. Finally, a simulation example is given to illustrate the effectiveness of this algorithm for PCA.展开更多
<span style="font-family:Verdana;"> <p class="MsoNormal"> <span lang="EN-US" style="" color:black;"="">Recently, the life of worldwide human bei...<span style="font-family:Verdana;"> <p class="MsoNormal"> <span lang="EN-US" style="" color:black;"="">Recently, the life of worldwide human beings has been endangering by the spreading of </span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">pneu</span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">- </span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">monia</span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">-</span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">causing virus, such as Coronavirus, COVID-19, and H1N1. To develop effective </span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">drugs against Coronavirus, knowledge of protein subcellular localization is prerequisite. In 2019, a predictor called “pLoc_bal-mEuk” was developed for identifying the subcellular localization of eukaryotic proteins. Its predicted results are significantly better than its counterparts, particularly for those proteins that may simultaneously occur or move between two or more subcellular location sites. However, more efforts are definitely needed to further improve its power since pLoc_bal-mEuk was still not trained by a “deep learning”, a very powerful technique developed recently. The present study was devoted to incorporating the “deep</span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">- </span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">learning” technique and develop</span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">ed</span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;"> a new predictor called “pLoc_Deep-mEuk”. The global absolute true rate achieved by the new predictor is over 81% and its local accuracy is over 90%. Both are overwhelmingly superior to its counterparts. Moreover, a user-friendly web-</span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;"> </span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">server for the new predictor has been well established at <a href="http://www.jci-bioinfo.cn/pLoc_Deep-mEuk/">http://www.jci-bioinfo.cn/pLoc_Deep-mEuk/</a>, by which the majority of experimental scientists can easily get their desired data.</span> </p> </span>展开更多
<p class="MsoNormal"> <span lang="EN-US" style="" color:black;"="">The recent worldwide spreading of pneumonia-causing virus, such as Coronavirus, </span>...<p class="MsoNormal"> <span lang="EN-US" style="" color:black;"="">The recent worldwide spreading of pneumonia-causing virus, such as Coronavirus, </span><span "="" style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">COVID-19, and H1N1, has been endangering the life of human beings all around the world. In order to really understand the biological process within a cell level and provide useful clues to develop antiviral drugs, information of virus protein subcellular localization is vitally important. In view of this, a CNN based virus protein subcellular localization predictor called “pLoc_Deep-mVirus” was developed. The predictor is particularly useful in dealing with the multi-sites systems in which some proteins may simultaneously occur in two or more different organelles that are the current focus of pharmaceutical industry. The global absolute true rate achieved by the new predictor is over 97% and its local accuracy is over 98%. Both are transcending other existing state-of-the-art predictors significantly. It has not escaped our notice that the deep-learning treatment can be used to deal with many other biological systems as well. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at <a href="http://www.jci-bioinfo.cn/pLoc_Deep-mVirus/">http://www.jci-bioinfo.cn/pLoc_Deep-mVirus/</a>.</span> </p>展开更多
Recently, the life of human beings around the entire world has been endangering by the spreading of pneumonia-causing virus, such as Coronavirus, COVID-19, and H1N1. To develop effective drugs against Coronavirus, kno...Recently, the life of human beings around the entire world has been endangering by the spreading of pneumonia-causing virus, such as Coronavirus, COVID-19, and H1N1. To develop effective drugs against Coronavirus, knowledge of protein subcellular localization is indispensable. In 2019, a predictor called “pLoc_bal-mHum” was developed for identifying the subcellular localization of human proteins. Its predicted results are significantly better than its counterparts, particularly for those proteins that may simultaneously occur or move between two or more subcellular location sites. However, more efforts are definitely needed to further improve its power since pLoc_bal-mHum was still not trained by a “deep learning”, a very powerful technique developed recently. The present study was devoted to incorporate the “deep-learning” technique and develop a new predictor called “pLoc_Deep-mHum”. The global absolute true rate achieved by the new predictor is over 81% and its local accuracy is over 90%. Both are overwhelmingly superior to its counterparts. Moreover, a user-friendly web-server for the new predictor has been well established at http://www.jci-bioinfo.cn/pLoc_Deep-mHum/, which will become a very useful tool for fighting pandemic coronavirus and save the mankind of this planet.展开更多
Current coronavirus pandemic has endangered mankind life. The reported cases are increasing exponentially. Information of plant protein subcellular localization can provide useful clues to develop antiviral drugs. To ...Current coronavirus pandemic has endangered mankind life. The reported cases are increasing exponentially. Information of plant protein subcellular localization can provide useful clues to develop antiviral drugs. To cope with such a catastrophe, a CNN based plant protein subcellular localization predictor called “pLoc_Deep-mPlant” was developed. The predictor is particularly useful in dealing with the multi-sites systems in which some proteins may simultaneously occur in two or more different organelles that are the current focus of pharmaceutical industry. The global absolute true rate achieved by the new predictor is over 95% and its local accuracy is about 90%?-?100%. Both have substantially exceeded the?other existing state-of-the-art predictors. To maximize the convenience for most?experimental scientists, a user-friendly web-server for the new predictor has been established?at?http://www.jci-bioinfo.cn/pLoc_Deep-mPlant/, by which the majority of experimental?scientists can easily obtain their desired data without the need to go through the?mathematical details.展开更多
The recent worldwide spreading of pneumonia-causing virus, such as Coronavirus, COVID-19, and H1N1, has been endangering the life of human beings all around the world. In order to really understand the biological proc...The recent worldwide spreading of pneumonia-causing virus, such as Coronavirus, COVID-19, and H1N1, has been endangering the life of human beings all around the world. In order to really understand the biological process within a cell level and provide useful clues to develop antiviral drugs, information of Gram negative bacterial protein subcellular localization is vitally important. In view of this, a CNN based protein subcellular localization predictor called “pLoc_Deep-mGnet” was developed. The predictor is particularly useful in dealing with the multi-sites systems in which some proteins may simultaneously occur in two or more different organelles that are the current focus of pharmaceutical industry. The global absolute true rate achieved by the new predictor is over 98% and its local accuracy is around 94% - 100%. Both are transcending other existing state-of-the-art predictors significantly. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_Deep-mGneg/, which will become a very useful tool for fighting pandemic coronavirus and save the mankind of this planet.展开更多
文摘Association rule learning(ARL)is a widely used technique for discovering relationships within datasets.However,it often generates excessive irrelevant or ambiguous rules.Therefore,post-processing is crucial not only for removing irrelevant or redundant rules but also for uncovering hidden associations that impact other factors.Recently,several post-processing methods have been proposed,each with its own strengths and weaknesses.In this paper,we propose THAPE(Tunable Hybrid Associative Predictive Engine),which combines descriptive and predictive techniques.By leveraging both techniques,our aim is to enhance the quality of analyzing generated rules.This includes removing irrelevant or redundant rules,uncovering interesting and useful rules,exploring hidden association rules that may affect other factors,and providing backtracking ability for a given product.The proposed approach offers a tailored method that suits specific goals for retailers,enabling them to gain a better understanding of customer behavior based on factual transactions in the target market.We applied THAPE to a real dataset as a case study in this paper to demonstrate its effectiveness.Through this application,we successfully mined a concise set of highly interesting and useful association rules.Out of the 11,265 rules generated,we identified 125 rules that are particularly relevant to the business context.These identified rules significantly improve the interpretability and usefulness of association rules for decision-making purposes.
基金funded by the National Science Foundation of China(62006068)Hebei Natural Science Foundation(A2021402008),Natural Science Foundation of Scientific Research Project of Higher Education in Hebei Province(ZD2020185,QN2020188)333 Talent Supported Project of Hebei Province(C20221026).
文摘Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes.However,the creation of fuzzy rules typically depends on expert knowledge,which may not fully leverage the label information in training data and may be subjective.To address this issue,a novel fuzzy rule oversampling approach is developed based on the learning vector quantization(LVQ)algorithm.In this method,the label information of the training data is utilized to determine the antecedent part of If-Then fuzzy rules by dynamically dividing attribute intervals using LVQ.Subsequently,fuzzy rules are generated and adjusted to calculate rule weights.The number of new samples to be synthesized for each rule is then computed,and samples from the minority class are synthesized based on the newly generated fuzzy rules.This results in the establishment of a fuzzy rule oversampling method based on LVQ.To evaluate the effectiveness of this method,comparative experiments are conducted on 12 publicly available imbalance datasets with five other sampling techniques in combination with the support function machine.The experimental results demonstrate that the proposed method can significantly enhance the classification algorithm across seven performance indicators,including a boost of 2.15%to 12.34%in Accuracy,6.11%to 27.06%in G-mean,and 4.69%to 18.78%in AUC.These show that the proposed method is capable of more efficiently improving the classification performance of imbalanced data.
文摘In this paper,we propose an equal interval range approximation and expandinglearning rule for multi-layer perceptrons applied in pattern recognitions.Compared with tra-ditional BP algorithm,this learning rule requires the output activations interval between themaximum target output node and other nodes to exceed a given equal interval range for eachtraining input pattern,thus it can train networks faster in much lower calculation cost andmay avoid the occurrences ot reversed target output and overlearning,hence it can improve thenetwork’s generalization abilities in pattern recognitions.Through gradually expanding of theinterval range,this learning rule can also enable the network to learn its targets more accuratelyin less additional training iterations.Finally,we apply this algorithm in network training inEEG detection,and the experimental results have shown the above advantages of the proposedalgorithm.
文摘An operating rule classification system based on learning classifier system (LCS), which learns through credit assignment (bucket brigade algorithm, BBA) and rule discovery (genetic algorithm, GA), is established to extract water-supply reservoir operating rules. The proposed system acquires an online identification rate of 95% for training samples and an offline rate of 85% for testing samples in a case study. The performances of the rule classification system are discussed from the rationality of the obtained rules, the impact of training samples on rule extraction, and a comparison between the rule classification system and the artificial neural network (ANN). The results indicate that the LCS is feasible and effective for the system to obtain the reservoir supply operating rules.
基金supported by the grant of the Korean Ministry of Education, Science and Technology (The Regional Core Research Program/Chungbuk BIT Research-Oriented University Consortium)
文摘An efficient calibration algorithm for an ambulatory audiometric test system is proposed. This system utilizes a personal digital assistant (PDA) device to generate the correct sound pressure level (SPL) from an audiometric transducer such as an earphone. The calibrated sound intensities for an audio-logical examination can be obtained in terms of the sound pressure levels of pure-tonal sinusoidal signals in eight-banded frequency ranges (250, 500, 1 000, 2 000, 3 000, 4 000, 6 000 and 8 000 Hz), and with mapping of the input sound pressure levels by the weight coefficients that are tuned by the delta learning rule. With this scheme, the sound intensities, which evoke eight-banded sound pressure levels by 5 dB steps from a minimum of 25 dB to a maximum of 80 dB, can be generated without volume displacement. Consequently, these sound intensities can be utilized to accurately determine the hearing threshold of a subject in the ambulatory audiometric testing environment.
文摘In the mobile learning system,it is important to adapt to mobile devices.Most of mobile learning systems are not quickly suitable for mobile devices.In order to provide adaptive mobile services,the approach for adaptation is proposed in this paper.Firstly,context of mobile devices and its influence on mobile learning system are analized and business rules based on these analysis are presented.Then,using the approach,the mobile learning system is constructed.The example implies this approach can adapt the mobile service to the mobile devices flexibly.
基金Supported by National Institute of General Medical Sciences of the National Institutes of Health,No.R01GM100387
文摘AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks(BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRL_p. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRL_p to other stateof-the-art classifiers commonly used in biomedicine.RESULTS We evaluated the degree of incorporation of prior knowledge into BRL_p, with simulated data by measuring the Graph Edit Distance between the true datagenerating model and the model learned by BRL_p. We specified the true model using informative structurepriors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRL_p caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve(AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor(EGFR) gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRL_p model. This relevant background knowledge also led to a gain in AUC.CONCLUSION BRL_p enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.
基金The Planning Program of Science and Technology of Hunan Province (No05JT1039)
文摘To overcome the limitation that complex data types with noun attributes cannot be processed by rank learning algorithms, a new rank learning algorithm is designed. In the learning algorithm based on the decision tree, the splitting rule of the decision tree is revised with a new definition of rank impurity. A new rank learning algorithm, which can be intuitively explained, is obtained and its theoretical basis is provided. The experimental results show that in the aspect of average rank loss, the ranking tree algorithm outperforms perception ranking and ordinal regression algorithms and it also has a faster convergence speed. The rank learning algorithm based on the decision tree is able to process categorical data and select relative features.
基金Supported by the National Natural Science Foundation of Chinathe Science foundation of Guangxi Educational Administration
文摘Based on the least-square minimization a computationally efficient learning algorithm for the Principal Component Analysis(PCA) is derived. The dual learning rate parameters are adaptively introduced to make the proposed algorithm providing the capability of the fast convergence and high accuracy for extracting all the principal components. It is shown that all the information needed for PCA can be completely represented by the unnormalized weight vector which is updated based only on the corresponding neuron input-output product. The convergence performance of the proposed algorithm is briefly analyzed.The relation between Oja’s rule and the least squares learning rule is also established. Finally, a simulation example is given to illustrate the effectiveness of this algorithm for PCA.
文摘<span style="font-family:Verdana;"> <p class="MsoNormal"> <span lang="EN-US" style="" color:black;"="">Recently, the life of worldwide human beings has been endangering by the spreading of </span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">pneu</span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">- </span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">monia</span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">-</span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">causing virus, such as Coronavirus, COVID-19, and H1N1. To develop effective </span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">drugs against Coronavirus, knowledge of protein subcellular localization is prerequisite. In 2019, a predictor called “pLoc_bal-mEuk” was developed for identifying the subcellular localization of eukaryotic proteins. Its predicted results are significantly better than its counterparts, particularly for those proteins that may simultaneously occur or move between two or more subcellular location sites. However, more efforts are definitely needed to further improve its power since pLoc_bal-mEuk was still not trained by a “deep learning”, a very powerful technique developed recently. The present study was devoted to incorporating the “deep</span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">- </span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">learning” technique and develop</span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">ed</span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;"> a new predictor called “pLoc_Deep-mEuk”. The global absolute true rate achieved by the new predictor is over 81% and its local accuracy is over 90%. Both are overwhelmingly superior to its counterparts. Moreover, a user-friendly web-</span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;"> </span><span style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">server for the new predictor has been well established at <a href="http://www.jci-bioinfo.cn/pLoc_Deep-mEuk/">http://www.jci-bioinfo.cn/pLoc_Deep-mEuk/</a>, by which the majority of experimental scientists can easily get their desired data.</span> </p> </span>
文摘<p class="MsoNormal"> <span lang="EN-US" style="" color:black;"="">The recent worldwide spreading of pneumonia-causing virus, such as Coronavirus, </span><span "="" style="font-variant-ligatures:normal;font-variant-caps:normal;orphans:2;text-align:start;widows:2;-webkit-text-stroke-width:0px;text-decoration-style:initial;text-decoration-color:initial;word-spacing:0px;">COVID-19, and H1N1, has been endangering the life of human beings all around the world. In order to really understand the biological process within a cell level and provide useful clues to develop antiviral drugs, information of virus protein subcellular localization is vitally important. In view of this, a CNN based virus protein subcellular localization predictor called “pLoc_Deep-mVirus” was developed. The predictor is particularly useful in dealing with the multi-sites systems in which some proteins may simultaneously occur in two or more different organelles that are the current focus of pharmaceutical industry. The global absolute true rate achieved by the new predictor is over 97% and its local accuracy is over 98%. Both are transcending other existing state-of-the-art predictors significantly. It has not escaped our notice that the deep-learning treatment can be used to deal with many other biological systems as well. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at <a href="http://www.jci-bioinfo.cn/pLoc_Deep-mVirus/">http://www.jci-bioinfo.cn/pLoc_Deep-mVirus/</a>.</span> </p>
文摘Recently, the life of human beings around the entire world has been endangering by the spreading of pneumonia-causing virus, such as Coronavirus, COVID-19, and H1N1. To develop effective drugs against Coronavirus, knowledge of protein subcellular localization is indispensable. In 2019, a predictor called “pLoc_bal-mHum” was developed for identifying the subcellular localization of human proteins. Its predicted results are significantly better than its counterparts, particularly for those proteins that may simultaneously occur or move between two or more subcellular location sites. However, more efforts are definitely needed to further improve its power since pLoc_bal-mHum was still not trained by a “deep learning”, a very powerful technique developed recently. The present study was devoted to incorporate the “deep-learning” technique and develop a new predictor called “pLoc_Deep-mHum”. The global absolute true rate achieved by the new predictor is over 81% and its local accuracy is over 90%. Both are overwhelmingly superior to its counterparts. Moreover, a user-friendly web-server for the new predictor has been well established at http://www.jci-bioinfo.cn/pLoc_Deep-mHum/, which will become a very useful tool for fighting pandemic coronavirus and save the mankind of this planet.
文摘Current coronavirus pandemic has endangered mankind life. The reported cases are increasing exponentially. Information of plant protein subcellular localization can provide useful clues to develop antiviral drugs. To cope with such a catastrophe, a CNN based plant protein subcellular localization predictor called “pLoc_Deep-mPlant” was developed. The predictor is particularly useful in dealing with the multi-sites systems in which some proteins may simultaneously occur in two or more different organelles that are the current focus of pharmaceutical industry. The global absolute true rate achieved by the new predictor is over 95% and its local accuracy is about 90%?-?100%. Both have substantially exceeded the?other existing state-of-the-art predictors. To maximize the convenience for most?experimental scientists, a user-friendly web-server for the new predictor has been established?at?http://www.jci-bioinfo.cn/pLoc_Deep-mPlant/, by which the majority of experimental?scientists can easily obtain their desired data without the need to go through the?mathematical details.
文摘The recent worldwide spreading of pneumonia-causing virus, such as Coronavirus, COVID-19, and H1N1, has been endangering the life of human beings all around the world. In order to really understand the biological process within a cell level and provide useful clues to develop antiviral drugs, information of Gram negative bacterial protein subcellular localization is vitally important. In view of this, a CNN based protein subcellular localization predictor called “pLoc_Deep-mGnet” was developed. The predictor is particularly useful in dealing with the multi-sites systems in which some proteins may simultaneously occur in two or more different organelles that are the current focus of pharmaceutical industry. The global absolute true rate achieved by the new predictor is over 98% and its local accuracy is around 94% - 100%. Both are transcending other existing state-of-the-art predictors significantly. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_Deep-mGneg/, which will become a very useful tool for fighting pandemic coronavirus and save the mankind of this planet.