Aiming to deficiency of the filter and wrapper feature selection methods, anew method based on composite method of filter and wrapper method is proposed. First the methodfilters original features to form a feature sub...Aiming to deficiency of the filter and wrapper feature selection methods, anew method based on composite method of filter and wrapper method is proposed. First the methodfilters original features to form a feature subset which can meet classification correctness rate,then applies wrapper feature selection method select optimal feature subset. A successful techniquefor solving optimization problems is given by genetic algorithm (GA). GA is applied to the problemof optimal feature selection. The composite method saves computing time several times of the wrappermethod with holding the classification accuracy in data simulation and experiment on bearing faultfeature selection. So this method possesses excellent optimization property, can save more selectiontime, and has the characteristics of high accuracy and high efficiency.展开更多
We report a bioinformatic analysis of the datasets of sequences of all ten genes from the 2009 H1N1 influenza A pandemic in the state of Wisconsin. The gene with the greatest summed information entropy was found to be...We report a bioinformatic analysis of the datasets of sequences of all ten genes from the 2009 H1N1 influenza A pandemic in the state of Wisconsin. The gene with the greatest summed information entropy was found to be the hemagglutinin (HA) gene. Based upon the viral ID identifier of the HA gene sequence, the sequences of all of the genes were sorted into two subsets, depending upon whether the nucleotide occupying the position of maximum entropy, position 658 of the HA sequence, was either A or U. It was found that the information entropy (H) distributions of subsets differed significantly from each other, from H distributions of randomly generated subsets and from the H distributions of the complete datasets of each gene. Mutual information (MI) values facilitated identification of nine nucleotide positions, distributed over seven of the influenza genes, at which the nucleotide subsets were disjoint, or almost disjoint. Nucleotide frequencies at these nine positions were used to compute mutual information values that subsequently served as weighting factors for edges in a graph net-work. Seven of the nucleotide positions in the graph network are sites of synonymous mutations. Three of these sites of synonymous mutation are within a single gene, the M1 gene, which occupied the position of greatest graph centrality. It is proposed that these bioinformatic and network graph results may reflect alterations in M1-mediated viral packaging and exteriorization, known to be susceptible to synonymous mutations.展开更多
基金This project is supported by Scientific Research Foundation of National Defence of China (No.41319040202).
文摘Aiming to deficiency of the filter and wrapper feature selection methods, anew method based on composite method of filter and wrapper method is proposed. First the methodfilters original features to form a feature subset which can meet classification correctness rate,then applies wrapper feature selection method select optimal feature subset. A successful techniquefor solving optimization problems is given by genetic algorithm (GA). GA is applied to the problemof optimal feature selection. The composite method saves computing time several times of the wrappermethod with holding the classification accuracy in data simulation and experiment on bearing faultfeature selection. So this method possesses excellent optimization property, can save more selectiontime, and has the characteristics of high accuracy and high efficiency.
文摘We report a bioinformatic analysis of the datasets of sequences of all ten genes from the 2009 H1N1 influenza A pandemic in the state of Wisconsin. The gene with the greatest summed information entropy was found to be the hemagglutinin (HA) gene. Based upon the viral ID identifier of the HA gene sequence, the sequences of all of the genes were sorted into two subsets, depending upon whether the nucleotide occupying the position of maximum entropy, position 658 of the HA sequence, was either A or U. It was found that the information entropy (H) distributions of subsets differed significantly from each other, from H distributions of randomly generated subsets and from the H distributions of the complete datasets of each gene. Mutual information (MI) values facilitated identification of nine nucleotide positions, distributed over seven of the influenza genes, at which the nucleotide subsets were disjoint, or almost disjoint. Nucleotide frequencies at these nine positions were used to compute mutual information values that subsequently served as weighting factors for edges in a graph net-work. Seven of the nucleotide positions in the graph network are sites of synonymous mutations. Three of these sites of synonymous mutation are within a single gene, the M1 gene, which occupied the position of greatest graph centrality. It is proposed that these bioinformatic and network graph results may reflect alterations in M1-mediated viral packaging and exteriorization, known to be susceptible to synonymous mutations.