期刊文献+

一种新的基于嵌入集的图分类方法 被引量:5

A Novel Graph Classification Approach Based on Embedding Sets
下载PDF
导出
摘要 随着图数据收集技术在许多科学领域的发展,对图数据分类已成为机器学习和数据挖掘领域的重要课题.目前已经提出许多图分类方法.其中,一些图分类方法采用3步来构筑分类模型;一些图分类方法采用2步来构筑分类模型.这些方法在挖掘频繁子图或特征子图时,只考虑到子图的结构信息,而没有考虑到子图的嵌入信息.为此,在L-CCAM子图编码的基础上,提出了一种基于嵌入集的图分类方法.该方法采用基于类别信息的特征子图选择策略,不但考虑了子图的结构信息,而且在频繁子图挖掘过程中充分利用嵌入信息——嵌入集,通过一步即直接选择特征子图以及生成分类规则.实验结果表明:在对化合物数据分类时,在分类精度上该方法优于采用3步的图分类方法;在运行效率上该方法优于采用2步和3步的图数据分类方法. With the development of highly efficient graph data collection technology in many scientific application fields, classification of graph data becomes an important topic in the machine learning and data mining community. At present, many graph classification approaches have been proposed. Some of the graph classification approaches take three steps, which are mining frequent subgraphs, selecting feature subgraphs from mined frequent suhgraphs, and constructing classification model by frequent subgraphs. Some other graph classification approaches take two steps, which are mining discriminative subgraphs directly from graph data and learning classification model by discriminative subgraphs. However, during mining frequent subgraphs or discriminative subgraphs, these approaches only take advantage of the structural information of the pattern, and do not consider the embedding information. In fact, in some efficient subgraph mining algorithms, the embedding information of a pattern can he maintained. We propose a graph classification approach, in which we employ a novel subgraph encoding approach with category label and adopt a feature subgraph selection strategy based on category information. Meanwhile, during mining frequent subgraphs, we make full use of embedding sets to select the feature subgraphs and by only one step we are able to generate classification rules. Experiment results show that the proposed approach is effective and feasible for classifying chemical compounds.
出处 《计算机研究与发展》 EI CSCD 北大核心 2012年第11期2311-2319,共9页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61033010 61070005) 广东省自然科学基金项目(S2011020001182) 广东省科技计划基金项目(2009A080207005 2009B090300450 2010A040303004)
关键词 频繁子图 图分类 图挖掘 特征选择 嵌入集 数据挖掘 frequent subgraph pattern graph classification graph mining feature selection embedding set data mining
  • 相关文献

参考文献16

  • 1Jin Ning, Young C, Wang Wei. GAIA: Graph classification using evolutionary computation [C] //Proc of the 2010 Int Conf on Management of Data. New York: ACM, 2010: 879 -890.
  • 2Deshpande M, Kuramochi M, Karypis G. Frequent sub- structure-based approaches for classifying chemical compounds [J]. IEEE Trans on Knowledge and Data Engineering, 2005, 17(8) : 1036-1050.
  • 3刘勇,李建中,朱敬华.一种新的基于频繁闭显露模式的图分类方法[J].计算机研究与发展,2007,44(7):1169-1176. 被引量:10
  • 4Inokuchi A, Washio T, Motoda H. An apriori-based algorithm for mining frequent substructures from graph data [G]// LNCS 1910: Principles of Data Mining and Knowledge Discovery(PKDD2000). Berlin: Springer, 2000:13-23.
  • 5Kuramochi M, Karypis G. Frequent subgraph discovery [C] //Proc of the 2001 IEEE Int Conf on Data Mining (ICDM01). Piscataway, NJ: IEEE, 2001:313-320.
  • 6Yan X, Han J. gSpan: Graph based substructure pattern mining [C] //Proc of the 2002 Int Conf on Data Mining (ICDM02). Piseataway, NJ: IEEE, 2002 : 721-724.
  • 7Huan J, Wang W, Prins J. Efficient mining of frequen subgraph in the presence of isomorphism [C] //Proc of the 3rd IEEE Int Conf on Data Mining (ICDM03). Piscataway, NJ: IEEE, 2003: 549-552.
  • 8Yan Xifeng, Han Jiawei. CloseGraph: Mining closed frequent graph patters[C] //Proc of the 9th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2008:286-295.
  • 9刘勇,李建中,高宏.从图数据库中挖掘频繁跳跃模式[J].软件学报,2010,21(10):2477-2493. 被引量:10
  • 10Yan Xifeng, Cheng Hong, Han Jiawei, et al. Mining significant graph patterns by leap search [C] //Proc of the 2008 Int Conf on Management of data(SIGMOD08). New York: ACM, 2008: 433-444.

二级参考文献39

  • 1汪卫,周皓峰,袁晴晴,楼宇波,施伯乐.基于图论的频繁模式挖掘[J].计算机研究与发展,2005,42(2):230-235. 被引量:17
  • 2Inokuchi A,Washio T,Motoda H.An apriori-based algorithm for mining frequent substructures from graph data.In:Cheng M,Yu PS,Liu B,eds.Proc.of the 4th European Conf.on Principles of Data Mining and Knowledge Discovery.Lyon:Springer-Verlag,2000.13-23.
  • 3Kuramochi M,Karypis G.Frequent subgraph discovery.In:Cercone N,Lin TY,Wu X,eds.Proc.of the 1st IEEE Int'l Conf.on Data Mining.San Jose:IEEE Computer Society,2001.313-320.
  • 4Yan X,Han J.gSpan:Graph-Based substructure pattern mining.In:Aggrawal R,Dittrich K,Ngu AH,eds.Proc.of the 2nd IEEE Int'l Conf.on Data Mining.Maebashi:IEEE Computer Society,2002.721-724.
  • 5Borgelt C,Berhold MR.Mining molecular fragments:Finding relevant substructures of molecules.In:Aggrawal R,Dittrich K,Ngu AH,eds.Proc.of the 2nd IEEE Int'l Conf.on Data Mining.Maebashi:IEEE Computer Society,2002.51-58.
  • 6Huan J,Wang W,Prins J.Efficient mining of frequent subgraphs in the presence of isomorphism.In:Wu X,Tuzhilin A,eds.Proc.of the 3rd IEEE Int'l Conf.Data Mining.Melbourne:IEEE Computer Society,2003.549-552.
  • 7Nijssen S,Kok JN.A quickstart in frequent structure mining can make a difference.In:Kim W,Kohavi R,Gehrke J,DuMouchel W,eds.Proc.of the 10th ACM SIGKDD Int'l Conf.on Knowledge Discovery and Data Mining.Seattle:ACM,2004.647-652.
  • 8Yan X,Han J.Closegraph:Mining closed frequent graph patterns.In:Getoor L,Senator TE,Domingos P,Faloutsos C,eds.Proc.of the 9th ACM SIGKDD Int'l Conf.on Knowledge Discovery and Data Mining.Washington:ACM,2003.286-295.
  • 9Huan J,Wang W,Prins J,Yang J.Spin:Mining maximal frequent subgraphs from graph databases.In:Kim W,Kohavi R,Gehrke J,DuMouchel W,eds.Proc.of the 10th ACM SIGKDD Int'l Conf.on Knowledge Discovery and Data Mining.Seattle:ACM,2004.581-586.
  • 10Thomas LT,Valluri SR,Karlapalem K.Margin:Maximal frequent subgraph mining.In:Proc.of the 6th IEEE Int'l Conf.Data Mining.Hong Kong:IEEE Computer Society,2006.1097-1101.http://www.computer.org/portal/web/csdl/doi/10.1109/ ICDM.2006.102.

共引文献18

同被引文献53

  • 1汪卫,周皓峰,袁晴晴,楼宇波,施伯乐.基于图论的频繁模式挖掘[J].计算机研究与发展,2005,42(2):230-235. 被引量:17
  • 2刘勇,李建中,朱敬华.一种新的基于频繁闭显露模式的图分类方法[J].计算机研究与发展,2007,44(7):1169-1176. 被引量:10
  • 3李先通,李建中,高宏.一种高效频繁子图挖掘算法[J].软件学报,2007,18(10):2469-2480. 被引量:35
  • 4Deshpande M, Kuramochi M, Karypis G. Frequent sub- structure based approaches for classifying chemical com- pounds[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(8): 1036-1050.
  • 5Horvath T, Garner T, Wrobel S. Cyclic pattern kernels for predictive graph mining[C]// Proceedings of the tenth ACM SIGKDD international conference on Knowl- edge discovery and data mining. Seattle:Association for Computing Machinery, 2004.. 158-167.
  • 6Cheng Hong, Yan Xifeng, Han Jiawei. Discriminative frequent pattern nalysis for effective classification[C]// IEEE 23rd International Conference on Data Enginering. Istanbul: IEEE Computer Society, 2007: 716-725.
  • 7Borgelt C, Berthold M R, Patterson D E. Molecular fragment mining for drug discovery [G] //Symbolic and Quantitative Approaches to Reasoning with Uncertainty. Berlin: Springer, 2005 : 1002-1013.
  • 8Guralnik V, Karypis G. A scalable algorithm for clustering sequential data [C] //Proc of the 1st IEEE Int Conf on Data Mining. Piscataway, NJ: IEEE, 2001:179-186.
  • 9Yan X, Yu P S, Han J. Graph indexing: A frequent structure-based approach [C] //Proc of the 17th ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2004: 335-346.
  • 10Liu Y, Jiang X, Chen H, et al. Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network [G] //Advanced Parallel Processing Technologies. Berlin: Springer, 2009: 341-355.

引证文献5

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部