基于Bagging的XML文档集成聚类研究被引量：1

Study of XML documents ensemble clustering based on Bagging

下载PDF

导出

摘要将集成学习方法应用到XML文档聚类中来改进传统聚类算法的不足。提出一种标签与路径相结合的XML文档向量模型,基于这个模型,首先对原始文档集进行多次抽样,在新文档集上进行K均值聚类,然后对得到的聚类中心集合进行层次聚类。在人工数据集和真实数据集上的实验表明,该算法在召回率和精确率上优于K均值算法,并且增强了其鲁棒性。 A method of ensemble learning is applied in XML documents clustering in order to improve the clustering performance.A novel vector model based on tag-path of XML documents is proposed and the documents are mapped to the model.The original datasets is sampled into several Bootstrap datasets,K-means algorithm is first run on each of the Bootstrap datasets,then hierarchical clustering algorithm is run on the sets of K-means clusters centers.The experimental result on the synthetic and real datasets shows that this algorithm is superior to the K-means algorithm on recall rate and precision rate,and enhances the robust of K-means algorithm.

作者赵斌张永胜

机构地区山东师范大学信息科学与工程学院

出处《计算机工程与应用》 CSCD 北大核心 2009年第14期138-140,共3页 Computer Engineering and Applications

基金山东省自然科学基金No.Y2007G16 山东省青年科学家科研奖励基金(No.2006BS01020) 山东省科技攻关计划No.2005GG4210002~~

关键词集成学习可扩展标记语言(XML) 文档聚类 BAGGING算法 ensemble learning eXtensive Markup Language（XML） document clustering Bagging algorithm

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献7

1Zhang K,Shasha D.Simple fast algorithms for the editing distance between trees and related problems[J].SIAM J Comput, 1989,18(6): 1245 - 1262.
2Lian W,Cheung D W,Mamoulis N,et al.An efficient and scalable algorithm for clustering XML documents by structure [J].IEEE Transactions on Knowledge and Data Engineering,2004,16(1).
3Doueet A,Myka H A.Naive clustering of a large XML document eolleetion[C]//Proe 1st Annual Workshop of the Initiative for the Evaluation of XML Retrieval(INEX'02),Sehbss Dagstuhl Germany, 2002.
4Breiman L.Bagging predietors[J].Machine Learing, 1996,24(2): 123-140.
5Freund Y,Schapire R E.Expefiments with a new boosting algorithm[C]//Proceedings of the 13th International Conference on Machine Learning,San Francisco,Bail,Italy, 1996:148-156.
6George M,Richard B.Introduction to wordNet:An on-line lexical data-base[J].Intemational Journal of Lexicography, 1993,3(4):.235-312.
7Dudoit S,Fridlyand J.Bagging to improve the accuracy of clustering procedure[J].Bioinformatics,2003,19(9): 1090-1099.

同被引文献4

1王玲,薄列峰,焦李成.密度敏感的半监督谱聚类[J].软件学报,2007,18(10):2412-2422. 被引量：94
2龚德全.30年来贵州世居民族岁时节日文化研究综述[J].贵州民族学院学报（哲学社会科学版）,2008(3):116-120. 被引量：6
3何峰,姜守旭,王宏志.基于子树匹配的相似xml连接方法的研究[J].智能计算机与应用,2011,1(4):1-3. 被引量：2
4于亚君,姜瑛.一种XML的树匹配改进方法[J].计算机工程与应用,2012,48(20):177-181. 被引量：4

引证文献1

1任廷艳,罗刚.XML聚类在少数民族节日文化挖掘中的应用[J].软件导刊,2015,14(12):140-141. 被引量：1

二级引证文献1

1苏瑞竹,闫静雅,欧阳剑.传统节日文化标签系统及隐性知识挖掘[J].图书情报工作,2020,64(2):124-132. 被引量：2

1李睿.二进制数据与xml文档的集成方法研究和应用[J].电子制作,2013,21(9X):75-75.
2徐德智,何芳,吴敏,陈再良.二进制数据的XML集成方法研究与实现[J].计算机应用研究,2004,21(9):37-39. 被引量：5
3荣芳,杨成韫.CASE工具在信息系统开发中的应用研究[J].广东自动化与信息工程,2001,22(2):43-46.
4张泳,王全凤.基于BIM的建设项目文档集成管理系统开发[J].武汉理工大学学报（信息与管理工程版）,2008,30(4):616-620. 被引量：8
5胡华,宋荷庆.面向软件Agent的XML文档集成[J].小型微型计算机系统,2002,23(7):835-838. 被引量：1
6孙霞,程宏斌.基于模式的XML文档相似度算法[J].计算机工程,2010,36(21):54-56. 被引量：2
7张丹华.基于.NET的GML文档集成和查询接口设计[J].科技创新导报,2011,8(28):13-13.
8罗凌..NET中XML与二进制数据集成技术的研究和应用[J].微计算机信息,2007,23(04X):221-223. 被引量：3
9潘有能.XML文档自动聚类研究[J].情报学报,2006,25(2):215-220. 被引量：16
10冯少荣,潘炜炜,林子雨.基于改进k-medoids算法的XML文档聚类[J].计算机工程,2015,41(9):56-62. 被引量：4

计算机工程与应用

2009年第14期

浏览历史

内容加载中请稍等...

基于Bagging的XML文档集成聚类研究被引量：1

参考文献7

同被引文献4

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于Bagging的XML文档集成聚类研究 被引量：1

参考文献7

同被引文献4

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于Bagging的XML文档集成聚类研究被引量：1