期刊文献+

Measuring Similarity of Academic Articles with Semantic Profile and Joint Word Embedding 被引量:11

Measuring Similarity of Academic Articles with Semantic Profile and Joint Word Embedding
原文传递
导出
摘要 Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models. Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models.
出处 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2017年第6期619-632,共14页 清华大学学报(自然科学版(英文版)
基金 supported by the Foundation of the State Key Laboratory of Software Development Environment(No.SKLSDE-2015ZX-04)
关键词 document semantic similarity text understanding semantic enrichment word embedding scientific literature analysis document semantic similarity text understanding semantic enrichment word embedding scientific literature analysis
  • 相关文献

参考文献1

二级参考文献14

  • 1Gallaher M P,O’Connor A C,Dettbarn J L,et al.Cost analysis of inadequate inoperability in the capital facilities industry. National Institute of Standards and Technology(NIST)Technical Report No.GCR-04-867 . 2004
  • 2International Alliance for Interoperability(IAI).Guidelines for the Development of Industry Foundation Classes(IFC). . 1997
  • 3Ray S R.Interoperability Standards in the Semantic Web. Journal of Computing and Information Science in Engineering . 2002
  • 4Crowley A,Watson A.CIMsteel Integration Standards Release2. SCI-P-268 . 2000
  • 5Construction Specifications Institute.OmniClass Construction Classification System,Edition1.0. . 2006
  • 6Lipman R.Mapping between the CIMsteel integration standards(CIS/2)and industry foundation classes(IFC)product models for structural steel. Proceedings of ICCCBE . 2006
  • 7Teague T L,,Palmer M E,Jackson R H F.XML for Capital Facilities. Leadership and Management in Engineering . 2003
  • 8Nahm U Y,,Bilenko M,Mooney R J.Two approaches to handling noisy variation in text mining. Proceedings of the ICML-2002Workshop on Text Learning . 2002
  • 9Salton G.Automatic Text Processing:the Transformation,Analysis,and Retrieval of Information by Computer. . 1989
  • 10Roussinov D,,Zhao J L.Automatic discovery of similarity relationships through web mining. Decision Support . 2003

共引文献2

同被引文献72

引证文献11

二级引证文献70

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部