期刊文献+

一种面向软件配置管理制品的层次分类方法

Hierarchical Categorization for Artifacts of Configuration Management Tool
下载PDF
导出
摘要 配置管理工具(configuration management tool,简称CMT)作为运维自动化的组成部分,是实现开发运维一体化(development and operations,简称Dev Ops)的重要支撑技术.当前,互联网开源社区中存在数量众多的CMT脚本制品,但是缺乏有效的层次分类管理,给快速检索和高效利用CMT脚本制品带来困难.针对该问题,提出一种面向CMT制品的基于在线非结构化描述文档分析的层次分类方法.该方法利用标签共现性关系(tag co-occurrence)建立层次类别体系,基于描述属性特征,实现对CMT制品的层次分类器;并使用混合的样本划分方式针对数据倾斜问题进行了改进.对超过11 000例训练数据和1 000例测试数据进行实验,结果表明:改进的样本划分方式得到的最佳查准率、查全率、调和平均值分别达到0.81、0.88、0.85,较传统方式查全率提高0.15,调和平均值提高0.06.该结果验证了层次分类方法的有效性. Configuration management tool (CMT), as an essential part of automated system operations, is an important technique to achieve DevOps (development and operations). There are a large amount of reusable CMT artifacts in the internet-scale open source communities and repositories. However, the lack of effective hierarchical categorization leads to the difficulties of effective retrieval and usage of those artifacts. This paper addresses the issue by proposing a hierarchical categorization method for CMT artifacts based on their online unstructured descriptions. This method firstly constructs a category system based on the co-occurrences of tags, and then designs the classifiers based on the features of CMT artifacts, including name and description. To improve the effectiveness of classifications affected by the unbalanced data set, the method builds a hybrid model to divide the sample data. Finally, extensive experiments are carriedout to evaluate the method on more than 11000 CMT artifacts. The results show that this improved method based on hybrid model achieves up to 0.81 precision, 0.88 recall and 0.85 F-measure. Comparing to traditional approaches, the recall and F-measure of CMT artifacts classification improve significantly. The effectiveness of this method is verified.
出处 《软件学报》 EI CSCD 北大核心 2017年第6期1389-1404,共16页 Journal of Software
基金 国家自然科学基金(61402453) 国家重点研发计划(2016YFB1000803)~~
关键词 CMT制品 层次分类 开源社区 开发运维一体化(DevOps) CMT artifact hierarchical categorization open source community development and operations (DevOps)
  • 相关文献

参考文献1

二级参考文献44

  • 1袁时金,李荣陆,周水庚,胡运发.层次化中文文档分类[J].通信学报,2004,25(11):55-63. 被引量:6
  • 2凌云,刘军,王勋.多层次web文本分类[J].情报学报,2005,24(6):684-689. 被引量:12
  • 3谭金波.一种改进的文档层次分类方法[J].现代图书情报技术,2007(2):56-59. 被引量:3
  • 4Silla C N, Freitas A A. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 2010, 22(1-2): 31-72.
  • 5Guan Hu, Zhou Jing-Yu, Guo Min-Yi. A class-feature-cen- troid classifier for text categorization//Proceedings of the 18th international conference on World Wide Web. Madrid, Spain, 2009:201-210.
  • 6Wang Xiao-Lin, Zhao Hai, Lu Bao-Liang. Enhance K Nea- rest neighbor algorithm for large-scale multi-labeled hierar- chical classification//Proceedings of the 2011 European Con- ference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Athens, Greece, 2011: 58-66.
  • 7Zhang Cong-Le, Xue Gui-Rong, YongZu et al. Web-scale classification with Naive Bayes//Proceedings of the 18th In- ternational Conference on World Wide Web. Madrid, Spain, 2009 : 1083-1084.
  • 8Labrou Y, Finin T W. Yahoo! as an ontology: Using Yahoo! Categories to describe documents//Proceedings of the 8th International Conference on Information and Knowl- edge Management. Kansas City, USA, 1999: 180-187.
  • 9Christophe Brouard. ECHO at the LSHTC pascal challenge 2//Proceedings of the 2011 European Conference on Machine Learning and Principles and Practice of Knowledge Diseovery in Databases. Athens, Greece, 2011:49-57.
  • 10Madani O, Huang Jian. Large-scale many-class prediction via flat teehniques//Proeeedings of the Large-Seale Hierar- ehieal Classification Workshop in the 32nd European Confer- ence on Information Retrial. Milton Keynes, UK, 2010:1-6.

共引文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部