基于增量式流处理的自适应群体划分方法被引量：2

An Approach to Constructing Adaptive Crowd Groups Based on Incremental Stream Processing

下载PDF

导出

摘要群体有效划分是实现群智协同的关键性基础问题.然而,在具有大规模数据流的协同应用中,由于用户数据的复杂关联性、流式处理的信息片段性和用户特征的动态变化性,造成了概念漂移现象的发生,严重制约了群体划分的准确性和适应性.如何实现群体划分在发生概念漂移时自适应地动态更新成为一个亟待解决的问题.针对这一问题,本文提出了适应概念漂移的群体划分模型,覆盖了信息建模、群体划分、群体模式提取、迭代优化四个阶段,可基于流处理引擎为群智协同应用提供有效的群体划分和数据过滤分发.该方法首先基于关联数据建立用户全生命周期信息模型以统一多源数据特征;进而融合多维语义,通过增量聚类实现群体的初步划分;然后借助基于事件语义的模糊挖掘提取群体的行为模式;最后通过行为模式的对比匹配实现群体划分模型的迭代优化,从而形成语义及行为高内聚的群体划分模型,实现群智动态汇聚.本文采用诊疗流程协同推荐作为典型案例,通过实验验证了方法对群体划分准确性和适应性的提升,说明本文方法能有效支撑适应概念漂移的协同应用. Constructing crowd groups is a key technology in crowd intelligence collaboration to efficiently manage crowd intelligence resources,and is one of the basic issues in crowd intelligence collaboration research.However,in applications with large-scale data streams,it has been severely constrained in the accuracy and adaptivity due to the concept drift.In particular,there are three main types of concept drift in crowd grouping based on data streams,namely user feature drift,group pattern drift and group model evolutionary drift.To solve these problems,a crowd grouping model that adapts to these concept drifts is urgently needed,which can comprehensive acquire user characteristics,reasonably extract group patterns and timely update groups.However,the following challenges are faced in constructing an adaptive crowd grouping model:The heterogeneity and complex relevance of user-generated data makes it difficult to describe user characteristics.The limited width of stream processing time window makes it difficult to extract group patterns.And the dynamic change of user characteristics makes it difficult to optimize the group model.In order to realize the dynamic adaptation of crowd grouping to concept drifts,this paper proposes a crowd grouping method based on incremental stream processing.First,a full-lifecycle data model for users is established on linked data for multi-source feature representation.Leveraging the unified semantics of multi-dimensional data,the incremental clustering is implemented for initial crowd classification.Then,based on the semantics of event sequences,fuzzy data mining is implemented to extract group behavioral patterns.Through comparing and matching of behavioral patterns among groups,the crowd grouping model is iteratively optimized.Highly semantic-cohesive and process-cohesive user groups are formed to support efficient collaborative missions.To sum up,the main contribution of this paper is to build a crowd grouping model that can adapt to the concept drift,which covers the four core stages of information modelling,crowd classification,group pattern extraction and iterative optimization.Based on stream processing engines,the model provides effective user groups and supports data filtering and distribution.Specifically,it includes:(1)A semantic data model towards the full life-cycle of users is designed.It provides unified semantics to systematic and behavioral data,which provides sensibility to catch user feature drifts.It provides a simple and comprehensive feature reference to adaptive crowd grouping.(2)A multi-dimensional feature-oriented user clustering method is proposed.Based on the semantic features of users and associated entities,an incremental clustering model is constructed based on the idea of Single-pass.It provides support for crowd grouping to adapt to the drift of user groups.(3)A behavior pattern extraction method based on event semantics and fuzzy mining is proposed.It integrates similar events,mining frequent patterns,and provides an effective group feature extraction method for crowd grouping.(4)An incremental optimization method for constructed crowd groups is implemented.Based on the characteristics of group behavior patterns,the dynamic evolution of the crowd grouping model is implemented.A typical case study on the collaborative recommendation of medical processes is implemented.The experiments on the case demonstrated the improvement of the accuracy and the adaptability of crowd grouping.The approach is proved to better support intelligent crowd-collaborating applications to adapt to concept drifts.

作者于晗蔡鸿明张翼飞姜丽红 YU Han;CAI Hong-Ming;ZHANG Yi-Fei;JIANG Li-Hong(School of Software,Shanghai Jiao Tong University,Shanghai 200240)

机构地区上海交通大学软件学院

出处《计算机学报》 EI CSCD 北大核心 2020年第12期2337-2351,共15页 Chinese Journal of Computers

基金国家自然科学基金(61972243)资助.

关键词群体画像概念漂移数据流聚类增量学习群智协同关联数据 crowd portrait concept drift data stream clustering incremental learning crowd collaboration linked data

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]