摘要
模型规则化可以通过给模型加入先验知识,而避免模型过拟合,并且能够使模型参数稀疏,选择出最有代表性的模型参数.具有稀疏性的主题特征能够更有效的表示文本语义信息,通过WordNet和Word2Vec可以得到相似词集,将相似词集的编码向量归属为相同组,构造相似词组语义约束,并将其表示为层状结构化先验信息,论文因此而实现了两种层状稀疏规则化方法,应用于主题编码模型.实验表明采用层状规则化的稀疏编码模型可以提高主题模型编码效果,学习到主题一致性、分类结果更好的文本主题信息.
Regularization technology in the machine learning by adding the prior knowledge to the model,can not only address overfitting,but also result the sparse parameters,and gets the most representative parameters. The topic model with sparse features can more effectively represent the text semantic information. The sets of similar words can be obtained by using WordNet and Word2 Vec. After the semantic constraints of the similar words are constructed,the coding of those words can be regarded as in the same group. Then the hierarchical structured prior information can be constructed. The paper describes two class sparse coding regularization methods and applies them to topic coding. Experiments showthat those models with hierarchical regularization can improve the effect of the topic coding,and can get better subject consistency and better classification results.
作者
曹中华
夏家莉
李光泉
张志斌
CAO Zhong-hua;XIA Jia-li;LI Guang-quan;ZHANG Zhi-bin(School of Information Technology,Jiangxi University of Finance and Economics,Nanchang 330032,China;School of Software,Jiangxi Normal University,Nanchang 330022,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2019年第3期510-514,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(41661083)资助
关键词
主题编码
层状规则
词分组
稀疏
topic coding
hierarchical regulation
word group
sparse