The influenza virus changes its antigenicity frequently due to rapid mutations, leading to immune escape and failure of vaccination. Rapid determination of the influenza antigenicity could help identify the antigenic ...The influenza virus changes its antigenicity frequently due to rapid mutations, leading to immune escape and failure of vaccination. Rapid determination of the influenza antigenicity could help identify the antigenic variants in time. Here, we built a stacked auto-encoder (SAE) model for predicting the antigenic variant of human influenza A(H3N2) viruses based on the hemagglutinin (HA) protein sequences. The model achieved an accuracy of 0.95 in five-fold cross-validations, better than the logistic regression model did. Further analysis of the model shows that most of the active nodes in the hidden layer reflected the combined contribution of multiple residues to antigenic variation. Besides, some features (residues on HA protein) in the input layer were observed to take part in multiple active nodes, such as residue 189, 145 and 156, which were also reported to mostly determine the antigenic variation of influenza A(H3N2) viruses. Overall,this work is not only useful for rapidly identifying antigenic variants in influenza prevention, but also an interesting attempt in inferring the mechanisms of biological process through analysis of SAE model, which may give some insights into interpretation of the deep learning展开更多
文摘The influenza virus changes its antigenicity frequently due to rapid mutations, leading to immune escape and failure of vaccination. Rapid determination of the influenza antigenicity could help identify the antigenic variants in time. Here, we built a stacked auto-encoder (SAE) model for predicting the antigenic variant of human influenza A(H3N2) viruses based on the hemagglutinin (HA) protein sequences. The model achieved an accuracy of 0.95 in five-fold cross-validations, better than the logistic regression model did. Further analysis of the model shows that most of the active nodes in the hidden layer reflected the combined contribution of multiple residues to antigenic variation. Besides, some features (residues on HA protein) in the input layer were observed to take part in multiple active nodes, such as residue 189, 145 and 156, which were also reported to mostly determine the antigenic variation of influenza A(H3N2) viruses. Overall,this work is not only useful for rapidly identifying antigenic variants in influenza prevention, but also an interesting attempt in inferring the mechanisms of biological process through analysis of SAE model, which may give some insights into interpretation of the deep learning