摘要
传统的对比序列模式挖掘算法存在一定数量的假阳性对比序列模式,其提供的错误信息会干扰后续任务的决策。设计一种IEP-DSP算法过滤假阳性对比序列模式。运用spade方法和WRAcc对比性度量找到候选对比序列模式和所有置换数据集合中的对比序列模式,通过模拟置换过程,使用独立精确置换检验方法为不同长度的模式建立独立精确零分布,并计算每个候选对比序列模式的精确p-value,运用错误发现率度量将各个长度的假阳性对比序列模式数量控制在置信度为α的统计显著水平下。在真实数据集和仿真数据集上的实验结果表明,IEP-DSP算法够过滤掉大量的假阳性对比序列模式,相比基于统计显著性检验的方法能保留更多的真对比序列模式,验证了独立精确置换检验相较于标准置换检验的优越性。
Traditional distinguishing sequential pattern mining algorithms usually generate a number of false positive patterns in their results,which hinder the subsequent decisions of tasks.To address the problem,a method named IEP-DSP for filtering out false positive patterns is proposed.The method employs the spade algorithm and the WRAcc measure to produce the distinguishing sequential patterns to be tested and the distinguishing sequential patterns that exist in permutated sequential data sets.Through the simulated permutation process,the independent exact permutation testing method is used to establish independent exact null distributions for patterns with different length,and the exact p-value of the tested patterns can be calculated from these null distributions.The False Discovery Rate(FDR)measure is used to control the number of false positive distinguishing patterns with different length under a confidence levelα.The experimental results on real data sets and simulated data sets show that the IEP-DSP algorithm can eliminate a large number of false positive distinguishing patterns while keeping more real distinguishing sequential patterns.At the same time,the advantage of independent exact permutation testing over standard permutation testing is proved.
作者
吴军
欧阳艾嘉
张琳
WU Jun;OUYANG Aijia;ZHANG Lin(School of Information Engineering,Zunyi Normal University,Zunyi,Guizhou 563000,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2021年第8期45-53,61,共10页
Computer Engineering
基金
国家自然科学基金(61662090)
贵州省教育厅青年科技人才成长项目(黔教合KY字[2017]250)
贵州省教育厅工程研究中心项目(黔教合KY字[2016]018)
贵州省科技厅联合基金(黔科合LH字[2017]7069)。
关键词
数据挖掘
模式发现
对比序列模式挖掘
统计显著性检验
独立精确置换检验
data mining
pattern discovery
distinguishing sequential pattern mining
statistical significance testing
independent exact permutation testing