摘要
针对当前中文词嵌入模型无法较好地建模汉字字形结构的语义信息,提出了一种改进的中文词嵌入模型.该模型基于词、字和部件(五笔编码)等粒度进行联合学习,通过结合部件、字和词来构造词嵌入,使得该模型可以有效学习汉字字形结构所蕴含的语义信息,在一定程度上提升了中文词嵌入的质量.
Considering that current Chinese word embedding model can not well model the semantic information of Chinese character′s glyph structure,an improved Chinese word embedding model is proposed.The model constructs joint learning based on the granularities of words,characters and components(WUBI),can effectively learn the semantic information contained in the Chinese character glyph structure by constructing word embedding with components,characters and words,and improves the quality of Chinese word embeddings.
作者
杨雨晴
吴水秀
左家莉
YANG Yuqing;WU Shuixiu;ZUO Jiali(College of Computer and Information Engineering,Jiangxi Normal University,Nanchang Jiangxi 330022,China)
出处
《江西师范大学学报(自然科学版)》
CAS
北大核心
2021年第2期131-136,共6页
Journal of Jiangxi Normal University(Natural Science Edition)
基金
国家自然科学基金(60866018)资助项目.
关键词
词嵌入
语言模型
自然语言处理
word embedding
language model
nature language processing