A sheer number of techniques and web resources are available for software engineering practice and this number continues to grow.Discovering semantically similar or related technical terms and web resources offers the...A sheer number of techniques and web resources are available for software engineering practice and this number continues to grow.Discovering semantically similar or related technical terms and web resources offers the opportunity to design appealing services to facilitate information retrieval and information discovery.In this study,we extract technical terms and web resources from a community of question and answer(Q&A)discussions and propose an approach based on a neural language model to learn the semantic representations of technical terms and web resources in a joint low-dimensional vector space.Our approach maps technical terms and web resources to a semantic vector space based only on the surrounding technical terms and web resources of a technical term(or web resource)in a discussion thread,without the need for mining the text content of the discussion.We apply our approach to Stack Overflow data dump of March 2018.Through both quantitative and qualitative analyses in the clustering,search,and semantic reasoning tasks,we show that the learnt technical-term and web-resource vector representations can capture the semantic relatedness of technical terms and web resources,and they can be exploited to support various search and semantic reasoning tasks,by means of simple K-nearest neighbor search and simple algebraic operations on the learnt vector representations in the embedding space.展开更多
基金the National Natural Science Foundation of China(No.61872232)。
文摘A sheer number of techniques and web resources are available for software engineering practice and this number continues to grow.Discovering semantically similar or related technical terms and web resources offers the opportunity to design appealing services to facilitate information retrieval and information discovery.In this study,we extract technical terms and web resources from a community of question and answer(Q&A)discussions and propose an approach based on a neural language model to learn the semantic representations of technical terms and web resources in a joint low-dimensional vector space.Our approach maps technical terms and web resources to a semantic vector space based only on the surrounding technical terms and web resources of a technical term(or web resource)in a discussion thread,without the need for mining the text content of the discussion.We apply our approach to Stack Overflow data dump of March 2018.Through both quantitative and qualitative analyses in the clustering,search,and semantic reasoning tasks,we show that the learnt technical-term and web-resource vector representations can capture the semantic relatedness of technical terms and web resources,and they can be exploited to support various search and semantic reasoning tasks,by means of simple K-nearest neighbor search and simple algebraic operations on the learnt vector representations in the embedding space.