摘要
事件日志采样作为近年来流程挖掘领域一个新的研究热点,旨在提高流程挖掘任务的效率,如模型发现、合规性检查、流程预测等。然而目前已有的采样方法不能很好地保证挖掘模型的质量,且针对大规模事件日志的采样效率低。任务紧邻关系作为事件日志中行为描述的基本单元,在各类流程挖掘任务中起到了关键作用。鉴于此,提出了一个通用的面向紧邻关系重发现的事件日志采样方法,该方法可保证紧邻关系的重发现性。为了验证该采样方法的有效性,将其应用于提高已有模型挖掘算法的效率,为了对挖掘模型质量定量评估,提出了基于流程树的模型相似度方法。所提出的采样方法已在开源流程挖掘工具平台ProM6和PM4PY实现,基于12个公开事件日志数据集,将所提出的面向紧邻关系重发现的采样方法与已有方法从模型挖掘质量方面进行了定量比较,实验结果表明所提方法可以在保证模型质量的前提下,大幅提高模型发现效率。
As a new research hotspot in the field of process mining in recent years,event log sampling aims to improve the efficiency of process mining tasks,such as model discovery,conformance checking,process prediction,etc.However,the existing sampling methods cannot guarantee the quality of the mining model well,and the sampling efficiency for large-scale event logs is low.As the basic unit of behavior description in event logs,task directly-follows relation plays a key role in various process mining tasks.So a general sampling method towards directly-follows relation rediscoverability was proposed,which could ensure the directly-follows relation rediscoverability.To verify the effectiveness of this sampling method,it was applied to improve the efficiency of model mining.To quantitatively evaluate the quality of mining models,a model similarity evaluation based on process tree was proposed.The sampling method had been implemented in the open source process mining tool platform ProM6 and PM4PY platform.Based on 12 public event log datasets,a quantitative comparison was made between the proposed sampling method and existing sampling methods in terms of model mining quality.Experiments showed that the proposed event log sampling method towards directly-follows rediscoverability could greatly improves the log sampling efficiency on the premise of ensuring the quality of model.
作者
苏轩
刘聪
闻立杰
孟晓亮
李彩虹
曾庆田
SU Xuan;LIU Cong;WEN Lijie;MENG Xiaoliang;LI Caihong;ZENG Qingtian(School of Computer Science and Technology,Shandong University of Technology,Zibo 255000,China;College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao 266590,China;School of Software,Tsinghua University,Beijing 100084,China)
出处
《计算机集成制造系统》
EI
CSCD
北大核心
2024年第8期2832-2843,共12页
Computer Integrated Manufacturing Systems
基金
国家自然科学基金资助项目(62472264)
山东省泰山学者工程专项基金资助项目(ts20190936,tsqn201909109)
山东省自然科学基金优秀青年基金资助项目(No.ZR2021YQ45)
山东省高等学校青创科技计划创新团队项目(2021KJ031)。
关键词
事件日志采样
紧邻关系重发现
质量评估
模型相似度
event log sampling
directly-follows relation rediscoverbility
quality measure
model similarity