摘要
对比模式挖掘是序列模式挖掘的一个重要分支,带有密度约束的对比模式有助于生物学家发现生物序列中的特殊因子的分布情况。为此,文中提出了MPDG(Mining distinguishing sequence Patterns based on Density and Gap constraint)算法,该算法应用网树结构挖掘满足密度约束和间隙约束的对比模式,在仅需扫描一遍序列库的情况下,该算法可计算当前模式的所有超模式的支持度,从而提高挖掘效率。最后,在真实蛋白质数据集上进行实验,实验结果验证了MPDG算法的有效性。
Distinguishing patterns mining is an important branch of sequence patterns mining,and distinguishing patterns with density constraint can help biologists to find the distribution of special factors on biological sequences.This paper proposed an algorithm,named MPDG(Mining distinguishing sequence Patterns based on Density and Gap constraint),which employs Nettree data structure to mine the distinguishing patterns satisfying the density and gap constraints.The algorithm is efficient since it calculates all super-patterns’supports of current pattern with one-way scanning the sequence database.Experimental results on real protein datasets verify the effectiveness of MPDG.
作者
魏芹双
武优西
刘靖宇
朱怀忠
WEI Qin-shuang;WU You-xi ;LIU Jing-yu;ZHU Huai-zhong(School of Computer Science and Engineering,Hebei University of Technology,Tianjin 300401,China;Hebei Province Key Laboratory of Big Data Calculation,Tianjin 300401,China)
出处
《计算机科学》
CSCD
北大核心
2018年第4期252-256,共5页
Computer Science
基金
国家自然科学基金(61673159)
河北省自然科学基金(F2016202145)
黑龙江省自然科学基金(F2017019)
河北省科技计划项目(15210325)
河北省教育厅青年基金(QN2014192)资助
关键词
模式挖掘
对比模式
密度约束
网树
Pattern mining
Distinguishing pattern
Density constraint
Nettree