基于自然语言处理的通用信息模型自动调试被引量：1

Automated debug for common information model defect using natural language processing algorithm

下载PDF

导出

摘要通用信息模型(CIM)是工业界的一种公开标准,并已实现于很多产品中,大量的bug被发现和修复。为了减少了人工查找错误根源所需的时间和精力,提出一种基于自然语言处理的方法对CIM的bug进行自动调试。首先使用最大熵模型对已解决bug的文档描述进行分词,然后基于构建的词典使用simHash找出那些重复性很大的已修复的bug,最后使用文档处理的方法分析客户提供的trace找出问题所在和解决方法。实验结果取得了87.5%准确率,表明了该方法的有效性。 Common Information Model （CIM） is an open industrial standard, which has been implemented in products of many companies. Meanwhile, there are lots of bugs being reported and fixed. In order to reduce the cost time and effort of finding the root cause, in this paper, a method to debug automatically was proposed based on natural language processing algorithm. It firstly segmented those sentences using maximum entropy model, then used simHash to find the most similar fixed bug based on specifically constructed dictionary, finally used text mining to find the root cause and solution via analyzing the trace provided by customer. The experimental result achieves 87.5% accuracy, which shows its effectiveness.

作者项炜

机构地区乐山师范学院计算机科学学院乐山师范学院智能信息处理及应用实验室

出处《计算机应用》 CSCD 北大核心 2013年第5期1446-1449,共4页 journal of Computer Applications

基金四川省教育厅青年基金资助项目(11ZB134)

关键词通用信息模型自然语言处理最大熵模型调试文档处理 Common Information Model （CIM） natural language processing maximum entropy model debug text processing

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献13

1LIBLIT B. The cooperative bug isolation project[ EB/OL]. [ 2003 - 10 -09]. http://www, es. wise. edu/cbi/.
2WOOD M. A dynamic approach to statistical debugging: building program specific models with neural networks[ D]. Georgia: Georgia Institute of Technology, 2007.
3ZHENG A X, JORDAN M I, LIBLIT B, et al. Statistical debug- ging: simultaneous identification of multiple bugs[ C]//Proceedings of the 23rd International Conference on Machine Learning. New York: ACM Press, 2006:1105 - 1112.
4ZHENG A, JORDAN M, LIBLIT B, et al. Statistical debugging of sampled programs[ C]// Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2004:501 -510.
5LIU C, "YAN X F, FE1 L. Sober: statistical model-based bug locali- zation[ J]. Symposium on the Foundations of Software Engineering, 2006, 30(5) : 286 -295.
6LIBLIT B, NAIK M, ZHENG A X, et al. Scalable statistical bug i- solation[ C]// PLDI '05: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM Press, 2005:15 -26.
7HANGAL S, LAM M S. Tracking down software bugs using auto- matic anomaly detection[ C]// Proceedings of the 24th International Conference on Software Engineering. Washington, DC: IEEE Com- puter Society, 2002:291 - 301.
8LIU C, FEI L, YAN X, HAN J, et al. Statistical debugging: a hy- pothesis testing-based approach[ J]. IEEE Transactions on Software Engineering, 2006, 10(3) : 831 -848.
9ANDRZEJEWSKI D, MULHERN A, LIBLIT B, et al. Statistical debugging using latent topic models[ C]// 18th European Confer- ence on Machine Learning. Berlin: Springer-Verlag, 2007:6 - 17.
10BERGET A L, DELLA PIETRA V J, DELLA PIETRA S A. Maxi- mum entropy approach to natural language processing[ J]. Computa- tional Linguistics, 1998, 22(1) : 39 -71.

二级参考文献19

1全昌勤,何婷婷,姬东鸿,刘辉.从搭配知识获取最优种子的词义消歧方法[J].中文信息学报,2005,19(1):30-35. 被引量：13
2ZANZONI A,MONTECCHI-PALAZZI L,QUONDAM M,et al.MINT:A molecular INTeraction database[J].FEBS Letters,2002,513(1):135-140.
3BADER G,BETEL D,HOGUE C.Bind—the biomolecular interac-tion network database[J].Nucleic Acids Research,2003,31(1):248-250.
4XENARIOS I,RICH D W,SALWINSKI L,et al.DIP:The data-base of interacting proteins[J].Nucleic Acids Research,2000,28(1):289-291.
5BUNESCU R,MOONEY R,RAMANI A.Integrating co-occurrencestatistics with information extraction for robust retrieval of protein in-teractions from Medline[C]//BioNLP'06:Proceedings of the Work-shop on Linking Natural Language Processing and Biology:TowardsDeeper Biological Literature Analysis.Stroudsburg:Association forComputational Linguistics,2006:49-56.
6FUNDEL K,KUFFER R,ZIMMER R.RelEx-relation extraction u-sing dependency parse trees[J].Bioinformatics,2006,23(3):365–371.
7NIELSEN L A.Extracting protein-protein interactions using simplecontextual features[C]//BioNLP'06:Proceedings of the Workshopon Linking Natural Language Processing and Biology:Towards Dee-per Biological Literature Analysis.Stroudsburg:Association forComputational Linguistics,2006:120-121.
8MIYAO Y,SAETRE R,SAGAE K,et al.Task-oriented evaluationof syntactic parsers and their representations[EB/OL].[2011-05-01].http://www.aclweb.org/anthology-new/P/P08/P08-1006.pdf.
9BUNESCU R C,MOONEY R J.A shortest path dependency kernelfor relation extraction[C]//HLT'05:Proceedings of the Conferenceon Human Language Technology and Empirical Methods in NaturalLanguage Processing.Stroudsburg:Association for ComputationalLinguistics,2005:724-731.
10AIROLA A,PYYSALO S,BJRNE J,et al.All-paths graph ker-nel for protein-protein interaction extraction with evaluation of cross-corpus learning[J].BMC Bioinformatics,2008,9(Suppl 11):S2.

共引文献5

1刘莉,谈文蓉.统计学习方法在语义消歧中的应用研究[J].西南民族大学学报（自然科学版）,2007,33(1):193-196.
2李新福,赵蕾蕾,何海斌,李芳.使用Logistic回归模型进行中文文本分类[J].计算机工程与应用,2009,45(14):152-154. 被引量：10
3刘商飞,张志祥.基于改进的Bayes判别法的中文多义词消歧[J].计算机与数字工程,2009,37(10):32-35. 被引量：3
4王健,刘敏捷,林鸿飞.基于多特征与多分类器融合的PPIE方法[J].计算机工程,2015,41(11):207-212. 被引量：1
5李晗佶,陈海庆.国内机器翻译研究动态科学知识图谱分析(2007—2016)——基于语言学类与计算机科学类期刊的词频对比统计[J].西安外国语大学学报,2018,26(2):99-104. 被引量：16

同被引文献13

1佘正炜,钱松荣.基于神经网络的情感词汇自动获得方法[J].微型电脑应用,2011(11):33-36. 被引量：1
2朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量：326
3Hatzivassiloglou V, McKeown K R.Predi-cting the Semantic Orientation of Adjectives[C].Proc.of the 35th Annual Meetingof the Association for Computational Linguistics.Madrid,Spain:[s.n.],1997:174-181.
4Peter D Turney.Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsuper-vised Classification of Reviews[C].Proceedings of the 40th Annual Meeting of the Association forComputational Linguistics ( ACL), Philadelphia, PA, USA.2002:417-424.
5Turney P, Littman M L.Measuring Praise and Criticism:Inference of Semantic Orientation from Association[J].ACM Trans.on Information Systems, 2003,21(4):315-346.
6Hannah D, Macdonald C.Experiments in Blog and EnterpriseTracks with Terrier[A].USA:NIST Special Publication,2007:1-10.
7陈建美,林鸿飞,杨志豪.基于语法的情感词汇自动获取[J].智能系统学报,2009,4(2):100-106. 被引量：26
8祁瑞华,杨德礼,胡润波.基于特征缺失补偿最大熵模型的文本分类[J].情报杂志,2010,29(5):141-143. 被引量：2
9孙瑞娜,古丽拉.阿东别克.哈萨克语基本名词短语自动识别研究与实现[J].中文信息学报,2010,24(6):114-119. 被引量：11
10冯冠军,禹龙,田生伟.基于CRFs自动构建维吾尔语情感词语料库[J].现代图书情报技术,2011(3):17-21. 被引量：6

引证文献1

1孙瑞娜,刘继,钟磊.面向网络舆情的哈萨克语情感词汇自动获取[J].情报杂志,2015,34(1):169-173. 被引量：2

二级引证文献2

1徐娜,唐海芳,张煊.基于ISM方法的舆情管理人员胜任力结构模型分析[J].情报杂志,2017,36(3):104-108. 被引量：5
2王娟丽.网络社会公共危机影响因素的实证分析[J].图书馆,2017(5):40-46. 被引量：3

1肖骁,李建华.基于通用信息模型入侵检测系统的管理[J].信息安全与通信保密,2003,25(11):34-36.
2李文艳,乔立红.产品通用信息模型及其标准化研究[J].航空标准化与质量,2007(4):33-37. 被引量：1
3车忠志,初洪龙.基于CIM模型的信息交换平台的设计与实现[J].通信技术,2008,41(10):177-179. 被引量：2
4王秀文,冯硕,曲海平,许鲁.蓝鲸存储系统通用管理架构的研究与设计[J].计算机工程,2008,34(21):4-6.
5车忠志,初洪龙.电费计算服务平台的分层设计[J].科技成果管理与研究,2008(11):82-85.
6池水明,阚歆炜,张旻.基于Simhash的SQL注入漏洞检测技术研究[J].计算机时代,2014(3):3-5. 被引量：3
7伟.谁说个小就无才看有道词典三板斧[J].电脑迷,2007,0(21):13-13.
8TC7传送、接入与承载网管理工作组讨论分组传送网（PTN）网络管理技术要求系列行业标准[J].现代传输,2011(5):41-41.
9权超健,刘献礼,朱鹏,王义文,焦环宇,冯超,梅恒,林长友.光电编码器输出信号自动调试技术研究[J].测控技术,2014,33(3):38-41. 被引量：3
10刘冬,杨丽徙,张雅,李强.通用信息模型在电力信息化建设中的应用[J].微计算机信息,2007(33):15-16. 被引量：4

计算机应用

2013年第5期

浏览历史

内容加载中请稍等...

基于自然语言处理的通用信息模型自动调试被引量：1

参考文献13

二级参考文献19

共引文献5

同被引文献13

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于自然语言处理的通用信息模型自动调试 被引量：1

参考文献13

二级参考文献19

共引文献5

同被引文献13

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于自然语言处理的通用信息模型自动调试被引量：1