This paper first applies the sequential cluster method to set up the classification standard of infectious disease incidence state based on the fact that there are many uncertainty characteristics in the incidence cou...This paper first applies the sequential cluster method to set up the classification standard of infectious disease incidence state based on the fact that there are many uncertainty characteristics in the incidence course.Then the paper presents a weighted Markov chain,a method which is used to predict the future incidence state.This method assumes the standardized self-coefficients as weights based on the special characteristics of infectious disease incidence being a dependent stochastic variable.It also analyzes the characteristics of infectious diseases incidence via the Markov chain Monte Carlo method to make the long-term benefit of decision optimal.Our method is successfully validated using existing incidents data of infectious diseases in Jiangsu Province.In summation,this paper proposes ways to improve the accuracy of the weighted Markov chain,specifically in the field of infection epidemiology.展开更多
This paper presents another necessary condition about the optimum parti-tion on a finite set of samples. From this condition, a corresponding generalized sequential hao f k-means (GSHKM) clustering algorithm is built ...This paper presents another necessary condition about the optimum parti-tion on a finite set of samples. From this condition, a corresponding generalized sequential hao f k-means (GSHKM) clustering algorithm is built and many well-known clustering algorithms are found to be included in it. Under some assumptions the well-known MacQueen's SHKM (Sequential Hard K-Means)algorithm, FSCL (Frequency Sensitive Competitive Learning) algorithm and RPCL (Rival Penalized Competitive Learning) algorithm are derived. It is shown that FSCL in fact still belongs to the kind of GSHKM clustering algth rithm and is more suitable for producing means of K-partition of sample data,which is illustrated by numerical experiment. Meanwhile, some improvements on these algorithms are also given.展开更多
Protein sequence motifs extraction is an important field of bioinformatics since its relevance to the structural analysis. Two major problems are related to this field:(1) searching the motifs within the same prote...Protein sequence motifs extraction is an important field of bioinformatics since its relevance to the structural analysis. Two major problems are related to this field:(1) searching the motifs within the same protein family; and(2) assuming a window size for the motifs search. This work proposes the Hierarchically Clustered Hidden Markov Model(HC-HMM) approach, which represents the behavior and structure of proteins in terms of a Hidden Markov Model chain and hierarchically clusters each chain by minimizing distance between two given chains' structure and behavior. It is well known that HMM can be utilized for clustering, however, methods for clustering on Hidden Markov Models themselves are rarely studied. In this paper, we developed a hierarchical clustering based algorithm for HMMs to discover protein sequence motifs that transcend family boundaries with no assumption on the length of the motif. This paper carefully examines the effectiveness of this approach for motif extraction on 2593 proteins that share no more than 25% sequence identity. Many interesting motifs are generated.Three example motifs generated by the HC-HMM approach are analyzed and visualized with their tertiary structure.We believe the proposed method provides a unique protein sequence motif extraction strategy. The related data mining fields using Hidden Markova Model may also benefit from this clustering on HMM themselves approach.展开更多
基金supported in part by"National S&T Major Project Foundation of China"(2009ZX10004-904)Universities Natural Science Foundation of Jiangsu Province(09KJB330004),National Science Foundation Grant DMS-9971405National Institutes of Health Contract N01-HV-28183
文摘This paper first applies the sequential cluster method to set up the classification standard of infectious disease incidence state based on the fact that there are many uncertainty characteristics in the incidence course.Then the paper presents a weighted Markov chain,a method which is used to predict the future incidence state.This method assumes the standardized self-coefficients as weights based on the special characteristics of infectious disease incidence being a dependent stochastic variable.It also analyzes the characteristics of infectious diseases incidence via the Markov chain Monte Carlo method to make the long-term benefit of decision optimal.Our method is successfully validated using existing incidents data of infectious diseases in Jiangsu Province.In summation,this paper proposes ways to improve the accuracy of the weighted Markov chain,specifically in the field of infection epidemiology.
文摘This paper presents another necessary condition about the optimum parti-tion on a finite set of samples. From this condition, a corresponding generalized sequential hao f k-means (GSHKM) clustering algorithm is built and many well-known clustering algorithms are found to be included in it. Under some assumptions the well-known MacQueen's SHKM (Sequential Hard K-Means)algorithm, FSCL (Frequency Sensitive Competitive Learning) algorithm and RPCL (Rival Penalized Competitive Learning) algorithm are derived. It is shown that FSCL in fact still belongs to the kind of GSHKM clustering algth rithm and is more suitable for producing means of K-partition of sample data,which is illustrated by numerical experiment. Meanwhile, some improvements on these algorithms are also given.
文摘Protein sequence motifs extraction is an important field of bioinformatics since its relevance to the structural analysis. Two major problems are related to this field:(1) searching the motifs within the same protein family; and(2) assuming a window size for the motifs search. This work proposes the Hierarchically Clustered Hidden Markov Model(HC-HMM) approach, which represents the behavior and structure of proteins in terms of a Hidden Markov Model chain and hierarchically clusters each chain by minimizing distance between two given chains' structure and behavior. It is well known that HMM can be utilized for clustering, however, methods for clustering on Hidden Markov Models themselves are rarely studied. In this paper, we developed a hierarchical clustering based algorithm for HMMs to discover protein sequence motifs that transcend family boundaries with no assumption on the length of the motif. This paper carefully examines the effectiveness of this approach for motif extraction on 2593 proteins that share no more than 25% sequence identity. Many interesting motifs are generated.Three example motifs generated by the HC-HMM approach are analyzed and visualized with their tertiary structure.We believe the proposed method provides a unique protein sequence motif extraction strategy. The related data mining fields using Hidden Markova Model may also benefit from this clustering on HMM themselves approach.