期刊文献+

面向对比序列模式发现的独立精确置换检验算法 被引量:3

Independent Exact Permutation Testing Algorithm for Distinguishing Sequential Pattern Discovery
下载PDF
导出
摘要 传统的对比序列模式挖掘算法存在一定数量的假阳性对比序列模式,其提供的错误信息会干扰后续任务的决策。设计一种IEP-DSP算法过滤假阳性对比序列模式。运用spade方法和WRAcc对比性度量找到候选对比序列模式和所有置换数据集合中的对比序列模式,通过模拟置换过程,使用独立精确置换检验方法为不同长度的模式建立独立精确零分布,并计算每个候选对比序列模式的精确p-value,运用错误发现率度量将各个长度的假阳性对比序列模式数量控制在置信度为α的统计显著水平下。在真实数据集和仿真数据集上的实验结果表明,IEP-DSP算法够过滤掉大量的假阳性对比序列模式,相比基于统计显著性检验的方法能保留更多的真对比序列模式,验证了独立精确置换检验相较于标准置换检验的优越性。 Traditional distinguishing sequential pattern mining algorithms usually generate a number of false positive patterns in their results,which hinder the subsequent decisions of tasks.To address the problem,a method named IEP-DSP for filtering out false positive patterns is proposed.The method employs the spade algorithm and the WRAcc measure to produce the distinguishing sequential patterns to be tested and the distinguishing sequential patterns that exist in permutated sequential data sets.Through the simulated permutation process,the independent exact permutation testing method is used to establish independent exact null distributions for patterns with different length,and the exact p-value of the tested patterns can be calculated from these null distributions.The False Discovery Rate(FDR)measure is used to control the number of false positive distinguishing patterns with different length under a confidence levelα.The experimental results on real data sets and simulated data sets show that the IEP-DSP algorithm can eliminate a large number of false positive distinguishing patterns while keeping more real distinguishing sequential patterns.At the same time,the advantage of independent exact permutation testing over standard permutation testing is proved.
作者 吴军 欧阳艾嘉 张琳 WU Jun;OUYANG Aijia;ZHANG Lin(School of Information Engineering,Zunyi Normal University,Zunyi,Guizhou 563000,China)
出处 《计算机工程》 CAS CSCD 北大核心 2021年第8期45-53,61,共10页 Computer Engineering
基金 国家自然科学基金(61662090) 贵州省教育厅青年科技人才成长项目(黔教合KY字[2017]250) 贵州省教育厅工程研究中心项目(黔教合KY字[2016]018) 贵州省科技厅联合基金(黔科合LH字[2017]7069)。
关键词 数据挖掘 模式发现 对比序列模式挖掘 统计显著性检验 独立精确置换检验 data mining pattern discovery distinguishing sequential pattern mining statistical significance testing independent exact permutation testing
  • 相关文献

参考文献4

二级参考文献31

  • 1吕锋,张炜玮.4种序列模式挖掘算法的特性研究[J].武汉理工大学学报,2006,28(2):57-60. 被引量:14
  • 2Agrawal R, Srikant R. Mining sequential patterns//Proceedings of the 11th International Conference on Data Engineering. Taipei, China, 1995:3-14.
  • 3Zaki M J. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, 2001, 42(1-2): 31-60.
  • 4Yan X, Han J, Afshar R. CloSpan: Mining closed sequential patterns in large datahases//Proeeedings of the 3rd SIAM International Conference on Data Mining. San Francisco, USA, 2003:166-177.
  • 5Ji X, Bailey J, Dong G. Mining minimal distinguishing subsequence patterns with gap constraints. Knowledge and Information Systems, 2007, 11(3): 259-286.
  • 6Zhang M, Kao B, Cheung D W, Yip K Y. Mining periodic patterns with gap requirement from sequences. ACM Trans- actions on Knowledge Discovery from Data, 2007, 1(2): 7.
  • 7Pei J, Wang H, Liu J, et al. Discovering frequent closed partial orders from strings. IEEE Transaction on Knowledge and Data Engineering, 2006, 18(11) : 1467-1481.
  • 8Shah C C, Zhu X, Khoshgoftaar T M, Beyer J. Contrast pattern mining with gap constraints for peptide folding prediction//Proceedings of the 21st International Florida Artificial Intelligence Research Society Conference. Coconut Grove, USA, 2008:95-100.
  • 9Wang X, Duan L, Dong G, et al. Efficient mining of density- aware distinguishing sequential patterns with gap constraints //Proceedings of the 19th International Conference on Data- base Systems for Advanced Applications. Bali, Indonesia, 2014:372-387.
  • 10Dong G, Pei J. Sequence Data Mining. Heidelberg: Springer, 2007.

共引文献17

同被引文献18

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部