摘要
为提升模型发现的质量,可以利用日志划分将原始日志数据划分为多个子日志。现有日志划分的评价方法基本采用有标的方式来衡量划分的质量,而实际生活中很难获取到有标的日志数据。为此,提出划分熵作为无标日志划分的衡量标准。首先,定义轨迹变体用于刻画每个子日志的分布情况。其次,提出内部熵和外部熵来分别刻画子日志的内聚度和差异性。然后,利用惩罚因子对盲目迎合评价指标的划分方法进行惩罚。最后,将以上内容进行融合,形成划分熵的表达式。实验结果表明了所提方法的可行性。
To improve the process discovery,log partitionis is used to divide the raw log data into multiple sub-logs.The existing methods for evaluating log partition are with ground truth,but it is difficult to obtain the marked log data in real life.For this reason,the partition entropy was proposed as a measure of log partition evaluation without ground truth.The trace variants were defined to depict the distribution of each sub-log.The internal entropy and external entropy were proposed to respectively describe the cohesion and divergence among those sub-logs.The penalty factor was used to punish some evaluation methods those blindly catering to the standard of high cohesion and low coupling.The equation of partition entropy was proposed based on internal entropy,external entropy and penalty factor.Experimental results showed the feasibility of the proposed method.
作者
林雷蕾
杨良
闻立杰
周华
王建民
LIN Leilei;YANG Liang;WEN Lijie;ZHOU Hua;WANG Jianming(School of Software,Tsinghua University,Beijing 100084,China;Inspur General Software Ltd.,Co.,Jinan 250101,China;School of Big Data and Intelligence Engineering,Southwest Forestry University,Kunming 650224,China)
出处
《计算机集成制造系统》
EI
CSCD
北大核心
2020年第6期1483-1491,共9页
Computer Integrated Manufacturing Systems
基金
国家重点研发计划资助项目(2017YFA0700605)
国家自然科学基金资助项目(61472207,71690231)
北京信息科学与技术国家研究中心资助项目。
关键词
过程挖掘
日志划分
信息熵
轨迹聚类
process mining
log partition
information entropy
trace clustering