摘要
结合Word2Vec的Skip-gram模型在提取复杂软件需求文档中细微语义差异方面的优势,提出了一种基于Tri-Training半监督学习的非功能性需求分类方法,旨在应对软件需求工程领域中标记样本数量有限的挑战,从而解决非功能性需求分类性能下降的问题。与传统应用于完全冗余视图或单一分类器的半监督学习算法不同,半监督学习Tri-training算法通过用自举抽样产生的3个不同的标记数据集初始化3个不同的分类器,利用三个分类器以多数投票规则来产生伪标记数据,从而解除对训练集的限制,提高分类框架的通用性和可用性。将本文方法应用于涵盖多个工业领域的PROMISE软件需求数据集中,结果表明,基于Tri-Training半监督学习的非功能性需求分类方法在不同标记比例的数据集上具有良好的分类性能,特别是在标记数据不足的情况下,相比于监督学习和其他半监督学习算法,该方法在召回率和F1值上具有显著优势。
We combine the advantages of the Word2Vec Skip-gram model in extracting subtle semantic differences from complex software requirement documents and propose a non-functional requirements method based on Tri-Training semi-supervised learning.This approach addresses the challenge of limited labeled samples in software requirements engineering,thus mitigating the performance degradation in non-functional requirement classification.Unlike traditional semi-supervised learning algorithms applied to entirely redundant views or a single classifier,the semi-supervised Tri-Training algorithm initializes three distinct classifiers with three different labeled datasets generated through bootstrapping.It employs the majority voting rule among these classifiers to produce pseudo-labeled data,thereby mitigating constraints on the training set and augmenting the generality and applicability of the classification framework.The method described in this paper is applied to the PROMISE software requirements dataset covering multiple industrial domains.The results demonstrate that the non-functional requirement classification method based on Tri-Training semi-supervised learning exhibits commendable classification performance across datasets with various labeled proportions,particularly under conditions of insufficient labeled data. Compared to supervised learning and other semi-supervised learning algorithms, this method shows significantrecall and F1 score advantages.
作者
宋百灵
何彦众
张泽贤
曾诚
俞嘉怡
刘进
胡文华
SONG Bailing;HE Yanzhong;ZHANG Zexian;ZENG Cheng;YU Jiayi;LIU Jin;HU Wenhua(School of Computer Science and Artificial Intelligence,Wuhan University of Technology,Wuhan 430070,Hubei,China;School of Artificial Intelligence,Hubei University,Wuhan 430062,Hubei,China;School of Computer Science,Wuhan University,Wuhan 430072,Hubei,China)
出处
《武汉大学学报(理学版)》
CAS
CSCD
北大核心
2024年第3期367-375,共9页
Journal of Wuhan University:Natural Science Edition
基金
国家自然科学基金(62202350)
湖北省重点研发计划项目(2021BAA188)。