摘要
With the explosive growth in the number of pro- tein sequences generated in the postgenomic age, research into identifying cytokines from proteirls and detecting their biochemical mechanisms becomes increasingly important. Unfortunately, the identification of cytokines from proteins is challenging due to a lack of understanding of the struc- ture space provided by the proteins and the fact that only a small number of cytokines exists in massive proteins. In view of fact that a proteins sequence is conceptually similar to a mapping of words to meaning, n-gram, a type of probabilistic language model, is explored to extract features for proteins. The second challenge focused on in this work is genetic algo- rithms, a search heuristic that mimics the process of natural selection, that is utilized to develop a classifier for overcom- ing the protein imbalance problem to generate precise pre- diction of cytokines in proteins. Experiments carded on im- balanced proteins data set show that our methods outperform traditional algorithms in terms of the prediction ability.
With the explosive growth in the number of pro- tein sequences generated in the postgenomic age, research into identifying cytokines from proteirls and detecting their biochemical mechanisms becomes increasingly important. Unfortunately, the identification of cytokines from proteins is challenging due to a lack of understanding of the struc- ture space provided by the proteins and the fact that only a small number of cytokines exists in massive proteins. In view of fact that a proteins sequence is conceptually similar to a mapping of words to meaning, n-gram, a type of probabilistic language model, is explored to extract features for proteins. The second challenge focused on in this work is genetic algo- rithms, a search heuristic that mimics the process of natural selection, that is utilized to develop a classifier for overcom- ing the protein imbalance problem to generate precise pre- diction of cytokines in proteins. Experiments carded on im- balanced proteins data set show that our methods outperform traditional algorithms in terms of the prediction ability.