期刊文献+

基于影响度的统计显著序列模式挖掘算法 被引量:1

Statistically significant sequential patterns mining algorithm under influence degree
下载PDF
导出
摘要 针对传统序列模式挖掘算法中支持度不能如实体现序列模式兴趣度以及未对报告的序列模式进行质量评估的问题,提出一个基于影响度的统计显著序列模式挖掘算法ISSPM。首先,递归地挖掘出所有满足兴趣度约束的序列模式;然后,使用项集置换方法构建这些序列模式的置换检验零分布;最后,通过该零分布计算出被评估的序列模式的统计度量值,并从上述序列模式中找到所有统计显著序列模式。真实序列记录集合上的实验结果表明,ISSPM算法相较于PSPM、SPDL和PSDSP算法挖掘到的序列模式数量更少但兴趣度更强;仿真序列记录集合上的实验结果表明,ISSPM算法报告的结果中假阳性序列模式数量平均占比为3.39%,且该算法的嵌入模式的发现率均不低于66.7%,明显优于上述3个对比算法。可见,ISSPM算法报告的统计显著序列模式能够体现序列记录集合中更有价值的信息,同时根据这些信息做出的进一步分析和决策也更加可靠。 Aiming at the problems that the degree of support is not a good indicator for the interestingness of sequential patterns and the quality of reported sequential patterns is not evaluated in traditional sequential patterns mining algorithms,a statistically significant sequential patterns mining algorithm under influence degree,calling ISSPM(Influence-based Significant Sequential Patterns Mining),was proposed. Firstly,all sequential patterns meeting the interestingness constraint were mined recursively. Then,the itemset permuting method was introduced to construct permutation test null distribution for these sequential patterns. Finally,the statistical measures of the evaluated sequential patterns were calculated from this distribution,and all statistically significant sequential patterns were found from the above sequential patterns. In the experiments with the PSPM(Prefix-projected Sequential Patterns Mining),SPDL(Sequential Patterns Discovering under Leverage)and PSDSP(Permutation Strategies for Discovering Sequential Patterns)algorithms on the real-world sequential record datasets,ISSPM algorithm reports fewer but more interesting sequential patterns. Experimental results on the synthetic sequential record datasets show that the average proportion of the false positive sequential patterns reported by the ISSPM algorithm is 3. 39%,and the discovery rate of embedded patterns of this algorithm is not less than 66. 7%,which are significantly better than those of the above three algorithms to compare. It can be seen that the statistically significant sequential patterns reported by ISSPM algorithm can reflect more valuable information in sequential record datasets,and the decisions made based on the information are more reliable.
作者 吴军 欧阳艾嘉 张琳 WU Jun;OUYANG Aijia;ZHANG Lin(School of Information Engineering,Zunyi Normal University,Zunyi Guizhou 563006,China)
出处 《计算机应用》 CSCD 北大核心 2022年第9期2713-2721,共9页 journal of Computer Applications
基金 国家自然科学基金资助项目(62066049) 遵义市联合资金项目(遵市科合HZ字(2022)123)。
关键词 数据挖掘 序列模式挖掘 兴趣度度量 统计显著模式 置换检验 data mining sequential pattern mining interestingness measure statistically significant pattern permutation test
  • 相关文献

参考文献4

二级参考文献21

  • 1吕锋,张炜玮.4种序列模式挖掘算法的特性研究[J].武汉理工大学学报,2006,28(2):57-60. 被引量:14
  • 2Benjamini Y,Hochberg Y.Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B Statistical Methodology . 1995
  • 3Bradley Efron.Microarrays, Empirical Bayes and the Two-Groups Model. Statistical Science . 2008
  • 4Guedj Mickael,Robin Stephane,Celisse Alain,Nuel Gregory.Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation. BMC Bioinformatics . 2009
  • 5Huber PJ.Projection pursuit. The Annals of Statistics . 1985
  • 6Xiao-Li Meng.Posterior Predictive $p$-Values. The Annals of Statistics . 1994
  • 7Qin Wen,Liu Yong,Jiang Tianzi,Yu Chunshui.The development of visual areas depends differently on visual experience. PloS one . 2013
  • 8Yoav Benjamini,Yosef Hochberg.On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics. Journal of Educational and Behavioral Statistics . 2000
  • 9Benjamini Y,Yekutieli D.The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics . 2001
  • 10John D. Storey.The positive false discovery rate: a Bayesian interpretation and the q-value. The Annals of Statistics . 2003

共引文献10

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部