摘要
软件可跟踪性作为软件的一项重要能力,其目的是通过在不同的软件制品之间建立跟踪链,捕获、链接、追踪每一个重要的软件制品.近年来,将信息检索、自然语言处理、机器学习以及深度学习等技术用于软件跟踪链的创建、维护和验证,大大减少了开发人员手动处理跟踪链的成本,因此受到学术界和工业界的广泛关注.在本文中,我们着重从软件跟踪链的自动化创建、维护和验证等方面着手,对近十年来研究进展进行梳理和总结.主要内容包括:(1)统计并分析软件跟踪链创建、维护和验证的自动化方法和技术;(2)对软件跟踪链的应用研究进行总结;(3)汇总了当前软件跟踪链相关技术评估研究和工具支持;(4)从技术难点中归纳得出目前跟踪链相关自动化技术所存在的关键问题,围绕跟踪软件的复杂性、跟踪链的粒度问题、精度问题、类型受限问题、验证效率问题、应用规模和时间问题以及工具评估不全面问题这七个部分,阐述了上述问题的可能解决思路和未来发展趋势.
As an important software capability,software traceability aims to capture,link and trace each crucial software artifact via constructing the traceability links between them.The study of software traceability covers many aspects,including traceability modelling,traceability assessment and traceability implementation.Traceability links interconnect software artifacts with each other and use the resulting associative networks to resolve issues with software products and their development processes.Traceability links provide critical support for many software engineering activities,including impact analysis,software verification,test case selection,compliance verification,system security assurance and defect detection.Traceability links refer to a specific relationship between a pair of software artifacts,one of which is the source artifact and the other is the target artifact.They records various relationships between software artifacts such as dependencies,influences,and causal relationships.The direction of traceability links can be oneway or two-way.Various traceability links can help software developers to understand,develop and manage systems both efficiently and effectively.At the same time,traceability links can help people involved in all phases of software development activities to accomplish their development tasks.Requirements traceability links,as the most widely used traceability links,enable the construction and maintenance of traceability links between requirements and other software artifacts.Moreover,traceability links also include the establishment of links between code and tests,design and code,models and code,defects and code,and so on.In recent years,the creation,maintenance and validation of traceability links with information retrieval,natural language processing,machine learning,and deep learning can reduce the manual handling cost of traceability links by developers,and therefore have received extensive attentions from academia and industry.There are also some works reviewing software traceability links approaches and techniques.In this paper,we focus on the automation techniques of the creation,maintenance and validation of traceability links so as to sort out and summarize the research progress in the past ten years.The main content includes the statistics and analysis of approaches and techniques for automated creation,maintenance and validation of traceability links,the application research of traceability links,the state-of-the-art traceability links related evaluation research and tools support,and the key problems of the current traceability links techniques.The problems are summarized from the technical difficulties around seventh parts:the complexity of the tracing software,the granularity problem,the unsatisfying accuracy,the type limitation,the validation efficiency,the application scale and time,and the incomprehensive evaluation of traceability links..Besides,several possible solution ideas and future development trends of the problems are elaborated,including the construction of horizontal traceability links between software artifacts,the scalable and configurable automation techniques of traceability links,the integration of traditional approaches with artificial intelligence techniques,the creation of multiple types of traceability links using intermediary artifacts,the interactive verification of traceability links,the real-time retrieval of traceability links and open sourcing of related source codes.This review also reveals that:(1)The creation of traceability links has received a lot of academic attentions,but the research on the maintenance,verification,and application of traceability links needs more attention;(2)Requirementsto-code links are the most concerned type of researchers,followed by requirements-to-design and design-to-code,while other traceability links such as code/data-to-model and screenshot-to-defect are also starting to enter the vision of researchers;(3)With the continuous development of artificial intelligence(AI),AI-based techniques such as Naive Bayes,SVM,Bert,Doc2Vec,RNN have been widely used in the creation,maintenance and verification of traceability links;(4)In the creation of traceability links,it is difficult to achieve good results by relying only on information retrieval and artificial intelligence techniques.Information retrieval,machine learning or deep learning techniques should be combined with traditional heuristics,model-based methods and so on,to make up for the deficiencies in AI technologies and traditional methods to further improve the quality of traceability links;(5)Research on traceability links automation techniques in complex environments,crossplatform and cross-language should be on the agenda in the future.
作者
汪烨
胡坤
姜波
夏鑫
唐贤书
WANG Ye;HU Kun;JIANG Bo;XIA Xin;TANG Xian-Shu(School of Computer Science and Technology,Zhejiang Gongshang University,Hangzhou 310018;Software Engineering Application Technology Lab,Huawei,Hangzhou 310007)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2023年第9期1919-1946,共28页
Chinese Journal of Computers
基金
浙江省自然科学基金项目(LY21F020011,LY20F020027,LY19F020003)
电商可信交易关键技术研究及应用-基于跨境支付大数据的电商可信交易关键技术研究与应用(2021C01162)资助。
关键词
软件跟踪链
机器学习
人工智能
深度学习
自然语言处理
software traceability links
machine learning
artificial intelligence
deep learning
natural language processing