Word Embeddings and Semantic Spaces in Natural Language Processing 被引量：1

Word Embeddings and Semantic Spaces in Natural Language Processing

下载PDF

导出

摘要 One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP. One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP.

作者 Peter J. Worth Peter J. Worth(Dept. of Computer Science and Electrical Engineering, Florida Atlantic University, Boca Raton, FL, USA)

机构地区 Dept. of Computer Science and Electrical Engineering

出处《International Journal of Intelligence Science》 2023年第1期1-21,共21页 智能科学国际期刊（英文）

关键词 Natural Language Processing Vector Space Models Semantic Spaces Word Embeddings Representation Learning Text Vectorization Machine Learning Deep Learning Natural Language Processing Vector Space Models Semantic Spaces Word Embeddings Representation Learning Text Vectorization Machine Learning Deep Learning

分类号 O17 [理学—基础数学]

引文网络
相关文献

同被引文献3

1邵宜添.我国农产品监管研究的可视化知识图谱分析[J].信阳农林学院学报,2020,30(3):89-93. 被引量：1
2孙梦捷,李洁君,杨建平,袁琳嫣,孙多志,禄春强,刘峻.含有植物纤维或玉米淀粉的食品接触产品质量安全风险研究[J].塑料工业,2021,49(3):106-109. 被引量：8
3冀晓东,孙高岭,涂新雨,郑怀城,许应成.儿童用品化学安全知识图谱构建与应用分析[J].标准科学,2024(6):34-41. 被引量：1

引证文献1

1殷姣,张天龙,管旭琳.儿童用品质量安全知识图谱构建研究[J].产品安全与召回,2024(5):78-84.

1罗锦尚,时昕,吴洁,侯孟书.利用文档级信息结合语义空间加强事件检测[J].电子科技大学学报,2022,51(2):242-250.
2Nany Katamesh,Osama Abu-Elnasr,Samir Elmougy.Deep Learning Multimodal for Unstructured and Semi-Structured Textual Documents Classicatio[J].Computers, Materials & Continua,2021(7):589-606. 被引量：1
3雷炳权,伍长贤.粤教粤人版《英语》四(上) Unit 8 Helping at Home第三课时Sounds and words[J].小学教学设计,2022(33):33-35.
4姚俊华,汤代佳.基于自然语言处理技术的政务智能搜索引擎应用探索[J].软件工程,2023,26(2):59-62. 被引量：2
5李大硕,张宏军,廖春林,徐有为,王航,李逸林.多层次特征融合的中医药材推荐方法研究[J].软件导刊,2022,21(12):14-20.
6Sancheng Peng,Lihong Cao,Yongmei Zhou,Zhouhao Ouyang,Aimin Yang,Xinguang Li,Weijia Ji,Shui Yu.A survey on deep learning for textual emotion analysis in social networks[J].Digital Communications and Networks,2022,8(5):745-762. 被引量：1
7丁宇亮,张博.档案管理系统的智能化和自动化改进设计[J].信息与电脑,2022,34(19):95-98. 被引量：3
8Amira Hamed Abo-Elghit,Taher Hamza,Aya Al-Zoghby.Embedding Extraction for Arabic Text Using the AraBERT Model[J].Computers, Materials & Continua,2022(7):1967-1994.
9Wen Zhang,Lingfei Deng,Lei Zhang,Dongrui Wu.A Survey on Negative Transfer[J].IEEE/CAA Journal of Automatica Sinica,2023,10(2):305-329. 被引量：4
10高丽君,张宇涛,林昀萱,施慧玲.基于BiLSTM的酒店顾客满意度评价模型[J].科技创新与生产力,2022(12):65-70. 被引量：2

International Journal of Intelligence Science

2023年第1期

浏览历史

内容加载中请稍等...

Word Embeddings and Semantic Spaces in Natural Language Processing 被引量：1

同被引文献3

引证文献1

相关作者

相关机构

相关主题

浏览历史