摘要
信息系统收集了大量的业务过程事件日志,过程发现旨在从事件日志中发现过程模型。但面对高度灵活的环境,简单地应用已有的过程发现技术通常会产生不可理解的过程模型(即意大利面模型)问题。轨迹聚类方法允许分解现有的事件日志,从而有效地解决这一问题。现有的轨迹聚类方法有很多,如基于向量空间方法的聚类、基于上下文感知的聚类、基于模型的序列聚类等,通过不同的轨迹聚类方法得到的聚类效果也存在差异。评价聚类效果有很多指标,如基于模型挖掘质量的F-Measure,但已有的聚类评估指标效率低下且操作复杂,不具备简洁性和高效性。本研究提出一种基于日志相似度的轨迹聚类评估方法,通过比较聚类子日志之间的相似程度来衡量聚类日志的质量。通过对仿真事件日志和真实事件日志的实验分析表明,所提出的评估方法为轨迹聚类方法提供了一种良好的参考标准。
The information system collects a large number of business process event logs,and the process discovery aims to discover process models from the event logs.In face of a highly flexible environment,simply applying exis-ting process discovery techniques usually produces highly incomprehensible process models(Spaghetti model).The trace clustering method allows the decomposition of the existing event logs and can thus effectively solve this problem.There are many existing trace clustering methods,such as clustering based on vector space model,context aware clustering,and clustering based on pattern sequence.The clustering effects obtained by different trace clustering methods are also different.There are many indicators for evaluating clustering effects,such as F-Measure based on model mining quality,but the existing clustering evaluation indicators are inefficient and complex,and do not have simplicity and high efficiency.This paper proposes a trace clustering evaluation method based on log similarity,which measures the quality of clustered logs by comparing the similarity between clustered sub-logs.The experimental analysis of the simulated event logs and the real event logs shows that the evaluation method proposed in this paper provides a good reference standard for the trace clustering method.
作者
张帅鹏
李会玲
李婷
徐兴荣
刘聪
ZHANG Shuaipeng;LI Huiling;LI Ting;XU Xingrong;LIU Cong(College of Computer Science and Technology,Shandong University of Technology,Zibo,Shandong 255000,China)
出处
《山东科技大学学报(自然科学版)》
CAS
北大核心
2021年第5期107-115,共9页
Journal of Shandong University of Science and Technology(Natural Science)
基金
国家自然科学基金项目(61902222)
山东省泰山学者工程专项基金资助项目(tsqn201909109)
山东理工大学引进高层次人才启动项目。
关键词
轨迹聚类
过程模型
日志相似度
质量评估
trace clustering
process model
log similarity
quality measure