期刊文献+

基于深度随机森林的商品类超短文本分类研究 被引量:4

Research on Classification of Commodity Ultra-Short Text Based on Deep Random Forest
下载PDF
导出
摘要 近年来,随着移动通信和信息技术的发展,网络上和实际应用场景中需要处理越来越多的长度不超过20字并且不带有辅助标签信息的超短文本数据.超短文本因其固有的词义多义性、文本特征极度稀疏、上下文明显缺失以及明辨语义困难等特点,如何对其进行有效地分类成为文本分类领域亟需解决的新问题.本文针对传统的短文本分类方法KNN和决策树在商品类超短文本上存在的由于特征稀少而导致分类器性能不佳的问题,提出了一种基于深度随机森林的商品类超短文本分类方法.该方法采用“分流”策略,利用外部知识库进行辅助,对知识库中存在明确类别的商品名直接确定其分类,对无法直接抽取类别的商品名,采用Word2vec对其在外部知识库中的描述进行向量化,并利用深度随机森林对向量进行分类,同时不断优化分类器直到训练集大小达到设定的阈值.实验结果表明,与传统的分类方法KNN和决策树相比,本文提出的分类方法在平均准确率上分别提高了22.78%和17.22%,平均召回率上分别提高了22.85%和15.23%. In recent years,with the development of mobile communication and information technology,more and more ultra-short text data with a length of no more than 20 words and no auxiliary tag information need to be processed on the network and in actual application scenarios.Because of inherent ambiguity and feature sparseness of ultra-short text,obvious lack of context,and difficulty in distinguishing semantics,an effective classification method is needed in the field of text categorization.To solve the performance problem of those classifiers based on the traditional short text classification method KNN and the decision tree,a new method was proposed based on deep random forest for the classification of commodity short texts.Using a“diversion”strategy and taking an external knowledge base as assistance,the method was arranged to directly determine the commodity name with the clear category in the knowledge base,and to vectorize the description of the incapable extracted commodity name based on a Word2vec tool.And then the vectors in the external knowledge base were classified according to deep random forest.Finally,the classifier was continually optimized until the threshold of training set size was reached.The experimental results show that compared with the traditional classification method KNN and decision tree,the classification method proposed in this paper can improve the average accuracy by 22.78%and 17.22%,and the average recall rate by 22.85%and 15.23%respectively.
作者 牛振东 石鹏飞 朱一凡 张思凡 NIU Zhendong;SHI Pengfei;ZHU Yifan;ZHANG Sifan(School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China)
出处 《北京理工大学学报》 EI CAS CSCD 北大核心 2021年第12期1277-1285,共9页 Transactions of Beijing Institute of Technology
基金 国家自然科学基金资助项目(61370137) 教育部中国移动研究基金资助项目(2016/27) 国家“九七三”计划项目(2012CB720700)。
关键词 超短文本分类 商品名称 深度随机森林 ultra-short text classification commodity deep random forest
  • 相关文献

参考文献6

二级参考文献45

共引文献130

同被引文献34

引证文献4

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部