摘要
针对传统模式挖掘方法挖掘生物序列会生成大量不必要的短而且无用的模式,导致效率降低,在多支持度思想的基础上提出了基于邻近频繁模式段的模式挖掘算法JBioPM。首先,产生邻近短频繁模式段,然后组合这些短频繁模式段,产生新的长频繁模式。通过实验分析,该方法在相似性很强的序列数据库中比BioPM算法效率高。通过对真实的蛋白质序列家族库的处理,证明该算法能有效处理生物序列数据。
Traditional algorithms face efficiency problem because of generating a huge number of unnecessary and useless short pattern in the process of mining.To attack these problems,a novel mining algorithm called JBioPM (Joined Biology sequence Pattern Mining)is presented based on joined frequent pattern segment approach and multi-supports ideology,First, the joined short frequent pattern segments are produced.Then, longer frequent patterns can be obtained by combining the above segments.The experiment shows JBioPM has better performance than BioPM.Through dealing with the real protein family database, it is proved that the algorithm can deal with biology sequence data efficiently.
作者
常磊玲
朱春鹤
CHANG Lei-ling, ZHU Chun-he (Information Engineering College,Shanghai Maritime University,Shanghai 200135,China)
出处
《电脑知识与技术》
2010年第7期5140-5142,共3页
Computer Knowledge and Technology
关键词
相邻频繁模式段
模式组合
生物序列
模式挖掘
数据挖掘
生物信息学
joined frequent pattern segment
pattern combination
biological sequence
pattern mining
data mining
bioinformatics