期刊文献+

基于gSpan算法的未知化合物毒性预测 被引量:4

Predicting the toxicity of unknown chemicals based on gSpan algorithm
下载PDF
导出
摘要 gSpan算法是一种基于频繁图的挖掘算法。该算法基于无候选人产生的频繁子图,在图中建立字典序标号,将每个图映射为最小DFS code,再采用深度优先搜索策略挖掘频繁连接子图。与前人算法相比,该算法在生成候选子图时,冗余子图的产生量大大减少;在计算候选子图支持度时避免了大量重复扫描数据库,性能卓越。该文的贡献是将gSpan算法应用在挖掘与已知毒性化合物具有相同子结构的化合物研究工作中,进行未知化合物的毒性预测,对相关领域应用发展具有重要意义。 The graph-based substructure pattern mining algorithm called gSpan, which discovers frequent substructures without candidate generation, is introduced first, gSpan builds a new lexicographic order among graphs and maps each graph to a unique minimum DFS code as its canonical label. Based on this lexicographic order, gSpan adopts the depth-first search strategy to mine frequent con- nected subgraphs efficiently. When it is applied to the chemical compound dataset Chemical_340, gSpan can find the compound set with same substructure as a chemical compound with toxicity, and then the toxicity of unknown chemicals can be predicted. This study is important to the related fields.
出处 《合肥工业大学学报(自然科学版)》 CAS CSCD 北大核心 2007年第10期1278-1280,共3页 Journal of Hefei University of Technology:Natural Science
关键词 频繁子图 毒性预测 化合物 frequent subgraph toxicity prediction chemical compound
  • 相关文献

参考文献8

  • 1Zaki M J. Efficiently mining frequent trees in a forest[C]// Proc 2002 Int'l Conf Knowledge Discovery and Data Mining Edmonton. Alberta, Canada, 2002 : 135- 139.
  • 2Kuramochi M, Karypis G. Frequent subgraph discovery [C]//. Proc 2001 Int'l Conf IEEE International Conference on Data Mining. San Jose, USA, 2001:313-320.
  • 3Agrawal R, Srikant R. Fast algorithms for mining association rules[C]//. Proc 1994 Int'l Conf Very Large Data Bases. Santiago,1994:487 499.
  • 4Cormen T H, Leiserson C E. Introduction to Algorithms [M]. 2nd ed. Massachusettes: MIT Press, 2001 : 12- 14.
  • 5Yan Xifeng, Han Jiawei. gSpan:graph-based substructure pattern mining, Technical Report IUCDCS-R-2002-2296 [R]. Illinois, 2002 : 721 - 722.
  • 6Pei J, Han J. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth[C]//Proc 2001 Int'l Conf International Conference on Data Engineering. Heidelberg,2001 : 215-224.
  • 7Asai T, Abe K. Efficient substructure discovery from large semistructured data[C]//Proc 2002 Int'l Conf SIAM International Conference on Data Mining. Washington, 2002:225 -227.
  • 8Sun L, Zhang X. Efficient frequent pattern mining on Web logs[C]//Proc 2004 Int'l Conf Asia Pacific Web Conference. Hangzhou, 2004: 533- 542.

同被引文献36

引证文献4

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部