摘要
通用信息模型(CIM)是工业界的一种公开标准,并已实现于很多产品中,大量的bug被发现和修复。为了减少了人工查找错误根源所需的时间和精力,提出一种基于自然语言处理的方法对CIM的bug进行自动调试。首先使用最大熵模型对已解决bug的文档描述进行分词,然后基于构建的词典使用simHash找出那些重复性很大的已修复的bug,最后使用文档处理的方法分析客户提供的trace找出问题所在和解决方法。实验结果取得了87.5%准确率,表明了该方法的有效性。
Common Information Model (CIM) is an open industrial standard, which has been implemented in products of many companies. Meanwhile, there are lots of bugs being reported and fixed. In order to reduce the cost time and effort of finding the root cause, in this paper, a method to debug automatically was proposed based on natural language processing algorithm. It firstly segmented those sentences using maximum entropy model, then used simHash to find the most similar fixed bug based on specifically constructed dictionary, finally used text mining to find the root cause and solution via analyzing the trace provided by customer. The experimental result achieves 87.5% accuracy, which shows its effectiveness.
出处
《计算机应用》
CSCD
北大核心
2013年第5期1446-1449,共4页
journal of Computer Applications
基金
四川省教育厅青年基金资助项目(11ZB134)
关键词
通用信息模型
自然语言处理
最大熵模型
调试
文档处理
Common Information Model (CIM)
natural language processing
maximum entropy model
debug
text processing