摘要
随着图数据收集技术在许多科学领域的发展,对图数据分类已成为机器学习和数据挖掘领域的重要课题.目前已经提出许多图分类方法.其中,一些图分类方法采用3步来构筑分类模型;一些图分类方法采用2步来构筑分类模型.这些方法在挖掘频繁子图或特征子图时,只考虑到子图的结构信息,而没有考虑到子图的嵌入信息.为此,在L-CCAM子图编码的基础上,提出了一种基于嵌入集的图分类方法.该方法采用基于类别信息的特征子图选择策略,不但考虑了子图的结构信息,而且在频繁子图挖掘过程中充分利用嵌入信息——嵌入集,通过一步即直接选择特征子图以及生成分类规则.实验结果表明:在对化合物数据分类时,在分类精度上该方法优于采用3步的图分类方法;在运行效率上该方法优于采用2步和3步的图数据分类方法.
With the development of highly efficient graph data collection technology in many scientific application fields, classification of graph data becomes an important topic in the machine learning and data mining community. At present, many graph classification approaches have been proposed. Some of the graph classification approaches take three steps, which are mining frequent subgraphs, selecting feature subgraphs from mined frequent suhgraphs, and constructing classification model by frequent subgraphs. Some other graph classification approaches take two steps, which are mining discriminative subgraphs directly from graph data and learning classification model by discriminative subgraphs. However, during mining frequent subgraphs or discriminative subgraphs, these approaches only take advantage of the structural information of the pattern, and do not consider the embedding information. In fact, in some efficient subgraph mining algorithms, the embedding information of a pattern can he maintained. We propose a graph classification approach, in which we employ a novel subgraph encoding approach with category label and adopt a feature subgraph selection strategy based on category information. Meanwhile, during mining frequent subgraphs, we make full use of embedding sets to select the feature subgraphs and by only one step we are able to generate classification rules. Experiment results show that the proposed approach is effective and feasible for classifying chemical compounds.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2012年第11期2311-2319,共9页
Journal of Computer Research and Development
基金
国家自然科学基金项目(61033010
61070005)
广东省自然科学基金项目(S2011020001182)
广东省科技计划基金项目(2009A080207005
2009B090300450
2010A040303004)
关键词
频繁子图
图分类
图挖掘
特征选择
嵌入集
数据挖掘
frequent subgraph pattern
graph classification
graph mining
feature selection
embedding set
data mining