摘要
识别软件中的关键实体对于人们理解软件,控制和降低维护费用具有重要意义.然而现有的工作基本都是针对关键类识别的,针对关键包、方法/属性等的研究甚少;同时现有的工作也未能揭示关键类与软件外部质量属性间的关系.为丰富现有的工作,本文提出了一种基于加权PageRank算法的关键包识别方法.该方法用加权有向软件网络模型抽象包粒度软件系统,提出新度量PR(PackageRank)从结构角度量度节点重要性,并引入加权的PageRank算法计算该度量值.数据实验部分以六个开源Java软件为例,分析了包的PR值与常用复杂网络中心性指标(介数中心性、接近中心性、度数中心性等)间的相关性;使用加权的SIR(Susceptible-Infectious-Recovered)模型分析了PR所识别关键包的传播影响,并与其它相关方法进行比较,验证了本文方法的有效性;最后,以其中两个软件为例,分析了包的PR值与包可理解性间的关系,进一步验证了本文方法的有效性.
Identifying key entities has many implications for software understanding and controlling and reducing maintenance costs. However the existing methods only focus on identifying key classes. Little work has been done on the identification of key entities at the other levels. Further the existing work also failed to reveal the relationships between key classes and external quality attributes. In this paper,we introduce a novel method IDEEP( IDEntifying k Ey Packages using weighted PageRank algorithm) to identify the key packages. IDEEP uses a weighted and directed software network to describe packages and their dependencies,proposes a new metric PR( PackageRank) to quantify the package importance,and introduces a weighted PageRank algorithm to compute PR values. Our experiments are carried out on six Java software systems. First we analyze the correlation between PR values and other centrality metrics such as betweenness,closeness and degree. Second we use a weighted version of the susceptible-infectious-recovered model to examine the spreading influence of each node. The results show that our method is better than other six methods. Further,we reveal the relationships between key packages and their understandability and show that the key packages identified by our method are more meaningful from a software engineering perspective.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2014年第11期2174-2183,共10页
Acta Electronica Sinica
基金
国家973重点基础研究发展计划(No.2014CB340401)
国家自然科学基金(No.61202048
No.61273216
No.61272111)
浙江省自然科学基金(No.LQ12F02011
No.LY13F020010)
软件工程国家重点实验室开放基金(No.SKLSE-2012-09-21)