期刊文献+

一种面向领域的Web服务语义聚类方法 被引量:5

Web Service Semantic Clustering Method Oriented Domain
下载PDF
导出
摘要 目前,互联网中发布的Web服务大都通过自然语言进行描述,这种非结构化的描述方式为机器进行自动分析与处理带来了极大的困难.如何提高服务发现的效率和精确率,已成为服务计算领域的研究热点之一.服务聚类是服务发现的重要支撑技术,通过将语义相似的服务加以聚类和组织,有助于改进服务发现的效果.当前的服务聚类技术主要采用LDA(潜式狄里克雷分布)和K-means等模型在同一领域下进行工作,利用这些方法进行服务聚类时还存在一定的局限性,例如,未充分利用词汇间的语义关系进行降维,从而导致服务发现的效果不够理想.针对该问题,本文使用神经网络模型(word2vec模型)获得服务描述中的同义词表并生成领域特征词集,来最大限度的降低服务特征向量维度;在此基础上,提出S-LDA(Semantic Latent Dirichlet Allocation)模型对同一领域的服务进行聚类,由此构建了一个面向领域的Web服务聚类框架(Domain Semantic aided Web Service Clustering,DSWSC).在ProgrammableWeb网站上发布的服务数据集开展的实验表明,与LDA和K-means等方法相比,本文方法在熵、聚类纯度和F指标上均取得了明显效果,有助于提高服务搜索的准确率. Currently,most of the Web services published in the Internet are described by natural language,this kind of unstructured descriptions brings difficulties in automatic analysis and processing. Howto improve the efficiency and accuracy of service discovery has become a hot topic in the field of service computing. Service clustering is an important fundamental technology for service discovery.It is helpful to improve the effectiveness of service discovery by clustering and organizing semantic similar services. The current service clustering technology mainly adopts LDA( Latent Dirichlet Allocation) and K-means models. There is still some limitations when using these methods for service clustering,e. g.,they are unable to reduce dimension by using lexical semantic relations. To solve this problem,this paper firstly creates synonyms for service descriptions by the neural network model( word2 vec model),and then uses the decision tree classifier to classify service domains. Afterwards,an improved S-LDA( Semantic Latent Dirichlet Allocation) model is proposed to cluster semantic similar services. In this way,a domain-oriented service semantic clustering method( DSWSC) is proposed. Experiments conducted on the service data set published on the Programming Web showthat our approach outperforms LDA and K-means methods in entropy,clustering purity and F-measure,which can be helpful to improve the accuracy in service discovery.
作者 赵一 李昭 陈鹏 何泾沙 何克清 ZHAO Yi;LI Zhao;CHEN Peng;HE Jing-sha;HE Ke-qing(College of Computer and Information,China Three Gorges University,Yichang 443002,China;School of Computer Science,Wuhan University,Wuhan 430072,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2019年第1期81-88,共8页 Journal of Chinese Computer Systems
基金 国家重点研发计划项目(2016YFC0802500 2016YFB0800403)资助 国家自然科学基金项目(61562073)资助 三峡大学人才专项经费项目(8000303)资助
关键词 语义潜式狄里克雷分布 Word2vec web服务聚类 semantic latent dirichlet allocation Word2vec Web services clustering
  • 相关文献

参考文献9

二级参考文献73

  • 1刘云峰,齐欢,代建民,王小平.中文信息的潜在语义分析[J].华南理工大学学报(自然科学版),2004,32(z1):107-111. 被引量:5
  • 2赵岩,王晓龙,刘秉权,关毅.融合聚类触发对特征的最大熵词性标注模型[J].计算机研究与发展,2006,43(2):268-274. 被引量:20
  • 3Baeza-Yates R A,Ribeiro-Neto B.Modern Information Retrieval[M].Reading,MA:Addison-Wesley,1999.
  • 4Aas K,Eikvil L.Text Categorisation:A Survey[M].Norwegian:ACM Computing Center,1999.
  • 5Rui Xu,Wunsch D.Survey of clustering algorithms[J].IEEE Trans on Neural Networks,2005,16(3):645-678.
  • 6Chowdhury A,Frieder O,Grossman D,et al.Collection statistics for fast duplicate document detection[J].ACM Trans on Information System,2002,20(2):171-191.
  • 7Broder A Z,Glassman S C,Manasse M S,et al.Syntactic clustering of the Web[J].Computer Networks,1997,29(8-13):1157-1166.
  • 8Theobald M,Siddharth J,Paepcke A.SpotSigs:Robust and efficient near duplicate detection in large Web collections[C] //Proc of the 31st SIGIR Conf on Research and Development in Information Retrieval.New York:ACM,2008:563-570.
  • 9Hung Chim,Deng Xiaotie.A new suffix tree similarity measure for document clustering[C] //Proc of the 16th Int Conf on World Wide Web.New York:ACM,2007:121-130.
  • 10Jain A,Murty M,Flynn P.Data clustering:A review[J].ACM Computer Surveys,1999,31(3):264-323.

共引文献118

同被引文献51

引证文献5

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部