摘要
文本数字化是人工智能时代自然语言处理中的关键步骤,词向量是文本数字化的主要方式之一。以嵌入式词向量Word2vec为例,研究了词语映射为词向量时所涉及的语言学原理和数学模型;分析了神经网络在生成词向量时的优势;阐释了Word2vec的性质和其中蕴含的语言学信息,最终目的是促进自然语言处理中对词向量的理解和应用。
Text digitization is a key step in natural language processing in the era of artificial intelligence.Word vector is one of the main ways of text digitization.Taking Word2vec,an embedding word vector,as an example,the linguistic principles and mathematical models involved in word mapping are studied.The advantages of neural network in generating word vector are ana⁃lyzed.The nature of Word2vec and the linguistic information contained therein are explained.The ultimate goal is to promote the un⁃derstanding and application of word vectors in natural language processing.
作者
杨泉
YANG Quan(Beijing Normal University,Beijing 100875)
出处
《计算机与数字工程》
2023年第11期2602-2607,2614,共7页
Computer & Digital Engineering
基金
国家社会科学基金项目“基于人工智能的短语结构句法关系判定方法研究”(编号:21BYY205)资助。