摘要
短文本建模的稀疏问题是短文本主题建模的主要问题,文章提出基于词向量的短文本主题建模模型—语义词向量模型(Semantics Word Embedding Modeling,SWEM)。采用半自动的方法对短文本信息进行扩充,对短文本相应词语进行同义词林处理,增加短文本集合中词共现信息,丰富文档内容,推理出较高质量的文本主题结构,解决短文本的词共现信息不足的问题。实验表明,SWEM模型优于LDA、BTM等传统模型。
The sparse problem of short text modeling is the main problem of short text topic modeling.This paper proposes a word-vector based short text topic modeling model SWEM(Semantics word embedding modeling).It uses semi-automatic method to expand short text information,the word in short text is processed with corresponding synonyms of the word,to increase word co-occurrence information in short text set,to enrich document content,so as to infer a high quality text topic structure and to solve the problem of insufficient co-occurrence of words in decisive texts.Experiments show that SWEM model is superior to traditional models such as LDA and BTM.
作者
黄婵
Huang Chan(Ganzhou teachers college,Ganzhou,Jiangxi 341000,China)
出处
《计算机时代》
2019年第12期57-60,共4页
Computer Era
基金
江西省教育厅科学技术研究项目(GJJ151362)