摘要
随着面向服务体系结构(SOA)技术的发展,Web服务的数量增长迅速。正确高效地对Web服务进行聚类或分类,能够有效地提高服务发现质量以及促进服务组合效率。然而,现有的Web服务建模方法(如LDA主题模型)难以从稀疏的Web服务数据中获得精确有效的信息用于Web服务聚类。针对这个问题,提出了一种融合多维信息的Web服务表征方法(MISR)。首先,将高斯混合模型和Word2Vec算法相结合生成包含Web服务功能主题信息和语义信息的词向量表征。然后,抽取出Web服务中包含的标签-词汇信息、流行度以及Web服务共现信息,结合前一步生成的向量生成包含多维信息的Web服务表征向量。最后,在Web服务聚类和Web服务分类两个任务上对MISR方法的有效性进行验证。在真实数据集上进行WebAPI服务聚类实验,实验结果表明,相比于LDA、Word2Vec、Doc2Vec、WT-LDA、HDP-SOM、GWSC,提出的方法在Micro-F1值上有38.8%、54.5%、15.3%、33.3%、44.7%、9.7%的提升。
With the development of service-oriented architecture(SOA)technology,the amount of Web service is increasing.Clustering or classifying Web services correctly are an effective way to improve the quality of Web service discovery and the efficiency of Web service composition.However,the existing Web service modeling methods(such as latent Dirichlet allocation topic model)are difficult to obtain accurate and effective Web service representation from a sparse Web service dataset for Web service clustering.To solve this problem,this paper proposes a multi-dimensional information-based Web service representation method(MISR).First,it generates word vectors which contain topic and semantic information implicit in Web service description with Gaussian mixture model and Word2Vec.Then,the MISR algorithm combines tag-word relationship,popularity,and co-occurrence information together for generating multi-dimensional information Web service representation.Web service clustering and Web service classification are used for evaluating the effectiveness of MISR.Based on a real-world dataset of API services,the experiment results show that compared with LDA,Word2Vec,Doc2Vec,WT-LDA,HDP-SOM,GWSC,the proposed method has 38.8%,54.5%,15.3%,33.3%,44.7%,9.7%improvement in Micro-F1 value.
作者
张祥平
刘建勋
肖巧翔
曹步清
ZHANG Xiangping;LIU Jianxun;XIAO Qiaoxiang;CAO Buqing(Hunan Key Lab for Services Computing and Novel Software Technology,Hunan University of Science and Tech nology,Xiangtan,Hunan 411201,China;School of Computer Science and Engineering,Hunan University of Science and Technology,Xiangtan,Hunan 411201,China)
出处
《计算机科学与探索》
CSCD
北大核心
2022年第7期1561-1569,共9页
Journal of Frontiers of Computer Science and Technology
基金
国家重点研发计划(2020YFB1707602)。