期刊文献+

基于簇核心的XML结构聚类方法 被引量:4

XML Structural Clustering Based on Cluster-Core
下载PDF
导出
摘要 随着XML技术的不断应用和推广,XML结构聚类技术在XML管理与挖掘中扮演着重要角色.针对目前XML结构聚类算法聚类不准确、效率低、对数据输入次序敏感的不足,提出簇核心的概念,并指出在动态环境下,对簇核心加以正确维护可以支持增量式聚类.在此基础上设计了一套有效的XML结构聚类算法COXClustering,该算法涵盖静态聚类和增量式聚类,静态聚类提取子树作为特征合理反映XML结构之间的相似性,并利用簇核心快速分类的特点提高聚类效率,利用簇核心正交的特点降低对数据输入次序的敏感性;增量式聚类根据当前增加的XML文档动态调整簇核心,从而自适应地指导增量式聚类.理论分析和实验表明该算法静态聚类效率高、聚类质量好、能够有效屏蔽输入次序的敏感性,增量式聚类将聚类速度大幅度提升,聚类质量接近静态聚类质量. With the increasing applications and developments of XML, XML structural clustering plays an important role both in management and in mining of XML documents. Although many XML structural clustering algorithms are proposed, they are ineffective, inefficient and sensitive to input order in practice. In addition, they can't satisfy incremental clustering under some certain background. This paper addresses these problems by proposing a novel concept--cluster-core, and points out that incremental clustering can be supported if the cluster-cores are mantained correctly in dynamic environment. An effective XML structural clustering algorithm, COXClustering, is presented, which covers static clustering and incremental clustering. In static clustering, COXClustering extracts sub-trees to measure similarity between XML structures, and it utilizes classification to improve clustering efficiency and reduces sensitivity to input order by the orthogonality of cluster-cores. In incremental clustering, it dynamically adjusts cluster-cores based on current added XML documents, and then guides incremental clustering through both instant adjustment and batch adjustment adaptively. Finally, a comprehensive experiment on both synthetic and real dataset is conducted to show that COXClustering is capable of improving clustering efficiency and quality, as well as being insensitive to input order in static clustering. The experiment also shows that incremental clustering highly speeds up clustering and the quality of incremental clustering is close to that of static clustering.
出处 《计算机研究与发展》 EI CSCD 北大核心 2011年第11期2161-2176,共16页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60172012)
关键词 XML结构聚类 簇核心 特征关联度 输入次序敏感性 增量式聚类 XML structural clustering cluster-core feature association degree sensitivity of inputorder incremental clustering
  • 相关文献

参考文献32

  • 1XML Core Working Group. Extensible Markup Language (XML) 1. 0 (Third Edition), W3C Recommendation'04 EEB/OL]. (2004-02-04) [2010-12-18]. http://www, w3. org/TR/2OO4/REC xml-20040204.
  • 2Kozielski M. Improving the results and performance of clustering bit-encoded XML documents[C]//Proc of the 6th IEEE Int Conf on Data Mining- Workshops(ICDMW'06). Piscataway, NJ: IEEE, 2006:60-64.
  • 3DBLP XML Records [EB/OL]. 2001 E2010 12-181. http:// www. acre. org/sigmod/dblp/db/index, html, 2001.
  • 4Crescenzi V, Mecca G, Merialdo P. RoadRunner: Towards automatic data extraction from large web sites[C] //Proc of the 27th Very Large Data Bases Conf (VLDB'01). San Francisco: Morgan Kawfmann, 2001:109-118.
  • 5Tekli J, Chbeir R, Yetongnon K. An overview on XML similarity= Background, current trends and future directions [J]. Computer Science Review, 2009, 8(3): 151-173.
  • 6Guillaume D, Murtagh F. Clustering of XML Documents [J]. Computer Physics Communications, 2000, 127 (2/3) : 215-227.
  • 7Lian W, Cheung D W, Mamoulis N, et al. An efficient and scalable algorithm for clustering XML documents by structure [J]. IEEE Trans on Knowledge and Data Engineering, 2004, 16(1):82-96.
  • 8Yoon J P, Raghavan V, Chakilam V. Bitmap indexing based clustering and retrieval of XML documents [C] //Proc of ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval. New York: ACM, 2001.
  • 9Leung H P, Chung F L, Chan S C F, et al. XML document clustering using common XPath [C]//Proc of the 2005 International Workshop on Challenges in Web Information Retrieval and Integration (IEEE WIRI'05). Piseataway, NJ: IEEE, 2005. 97-96.
  • 10Zhang K, Shasha D. Simple fast algorithms for the editing distance between trees and related problems[J]. SIAM Journal Computing, 1989, 18(6): 1245-1262.

二级参考文献19

  • 1[1]Wong VWS, Leung CM. Location management for next generation personal communication networks. IEEE Network, 2000,14(5):18~24.
  • 2[2]Han JW, Kambr M. Data Mining Concepts and Techniques. Beijing: Higher Education Press, 2001. 335~393.
  • 3[3]Ng R, Han J. Efficient and effective clustering method for spatial data mining. In: Bocca JB, Jarke M, Zaniolo C, eds. Proc. of the 20th Int'l Conf. on Very Large Data Bases. San Fransisco: Morgan Kaufmann Publisheers, 1994. 144~155.
  • 4[4]Guha S, Rastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. In: Haas LM, Tiwary A, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 1998.73~84.
  • 5[5]Guha S, Rastogi R, Shim K, ROCK: A robust clustering algorithm for categorical attributes. In: Proc. of the 15th Int'l Conf. on Data Engineering. IEEE Computer Society, 1999. 512~521.
  • 6[6]Karypis G, Han E-H, Kumar V. Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer, 1999,32(8):68~75.
  • 7[7]Estivill-Castro V, Lee I. AMOEBA: Hierarchical clustering based on spatial proximity using delaunay diagram. In: Forer P, Yeh AGO, He J, eds. Proc. of the 9th Int'l Symposium on Spatial Data Handling. Hong Kong: Study Group on Geographical Information Science of the International Geographical Union, 2000. 7a.26~7a.41.
  • 8[8]Ester M, Kriegel HP, Sander J, Xu X. A density based algorithm for discovering clusters in large spatial databases with noise. In:Simoudis E, Han JW, Fayyad UM, eds. Proc. of the 2nd Int'l Conf. on Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996. 226~231.
  • 9[9]Ma S, Wang TJ, Tang SW, Yang DQ, Gao J. A new fast clustering algorithm based on reference and density. In: Dong GZ, Tang CJ,Wang W, eds. Proc. of the WAIM Conf. Heidelberg: Springer-Verlag, 2003.214~225.
  • 10[10]Wang W, Yang J, Muntz R. STING+: An approach to active spatial data mining. In: Proc. of the 15th Int'l Conf. on Data Engineering. IEEE Computer Society, 1999. 119~125.

共引文献4

同被引文献74

  • 1王桐,刘大昕.一种新的混合XML文档聚类方法[J].哈尔滨工程大学学报,2007,28(6):697-701. 被引量:7
  • 2IDC. Worldwide quarterly mobile phone tracker [EB/OL]. [2013-01-20]. http://www, idc. com/getdoe, jsp?containerld = prUS24108913.
  • 3Engadget. Google play hits 25 billion app downloads[EB/ OL]. (2012-09- 16) [2013-01-20]. http://www, engadget. com[2012[O9]26]google-play-hits-25 billion app-downloads/.
  • 4网秦.2012上半年全球手机安全报告[EB/OL].[2013-01-20].http://on.nq.com/neirong/2012shang.pdf.
  • 5Wisniewski R. Brut. alll @ gmail, com. android apktool [CP/OL]. [ 2013-01-20 ]. https://code, google, corn/p/ android-apktool/.
  • 6Gruver B. jesusfreke @ jesusfreke, corn, small [CP/OL]. [2013- 01- 20]. http://code, google, corn/p/small/.
  • 7Google. DDMS [CP/OL]. ]2013-01 -20]. http://developer. android, com]guide/developing/debugging/ddms, htrnl.
  • 8Dupuy E. JD-GUI [CP/OL]. [2013-01-20]. http://java. decompiler, free. fr/.
  • 9Panxiaobo. pxb1988 @ gmail, corn, yyjdelete @ gmail, com. dex2jar [CP/OL]. [2013-01-20]. http://code, google, corn/p/ dex2jar/.
  • 10Shabtai A, Kanonov U, Elovici Y, et al. "Andromaly": A behavioral malware detection framework for android devices [J]. Journal of Intelligent Information System, 2012, 38 (1): 161-190.

引证文献4

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部