期刊文献+

基于集成学习的标题分类算法研究 被引量:9

Headlines classification method based on ensemble learning
下载PDF
导出
摘要 标题分类是对一个标题性语句进行分类,通常这个标题是不超过20个字的短文本,内容精炼且概括性强。针对标题文本的特征稀疏性和含义不确定性,提出了一种融合随机森林与贝叶斯多项式的标题分类算法。该算法将贝叶斯多项式模型引入到随机森林底层分类器构建过程中,同时利用随机森林附带的OOB数据提出了一种基于二维权重分布的投票机制。最后在图书馆真实书目数据上进行实验,针对分类性能与当前基于LDA主题扩展的SVM算法进行对比,实验结果表明在一定条件下,该方法性能稳定、表现较佳。 The headlines classification is to classify for a headline statement which is not more than 20 words but is concise and summary. This paper proposed a headlines classification method based on improved random forest, which introduced Bayes polynomial model into the process of building underlying classifier, to solve the poor classification performance causing by feature fewer and uncertainty of headlines text. Meanwhile, it proposed a two-dimensional weighted voting mechanism using the out-of-bag data of random forest. Last, it conducted the experiment with the real data of library and compared with the SVM algorithm which was based on LDA theme extensions. The experimental results show that this approach has a stable performance and presents a better result under a certain conditions.
作者 高元 刘柏嵩
出处 《计算机应用研究》 CSCD 北大核心 2017年第4期1004-1007,共4页 Application Research of Computers
基金 国家社会科学基金资助项目(15FTQ002)
关键词 自然语言处理 标题分类 集成学习 改进随机森林 OOB二维权重分布 natural language processing headlines classification ensemble learning improved random forest OOB two-dimensional weight distribution
  • 相关文献

参考文献4

二级参考文献66

  • 1樊兴华,孙茂松.一种高性能的两类中文文本分类方法[J].计算机学报,2006,29(1):124-131. 被引量:70
  • 2李峰,李芳.中文词语语义相似度计算——基于《知网》2000[J].中文信息学报,2007,21(3):99-105. 被引量:105
  • 3张华平.计算所汉语词法分析系统ICTCLAS[EB/OL].[2002-08-16].http://www.nip.org.cn/project/project.php?pwj_id=6.
  • 4SebastianiI F. Machine Learning in Automated Text Categorization Consiglio Nazionale delle Rieerche[J]. Italy. ACM Computing Surveys,2002,34(1) : 1-47
  • 5Zelikovitz S,Transductive M F. Learning for Short-Text Classification Problem using Latent Semantic Indexing International [J]. Journal of Pattern Recognition and Artificial Intelligence, 2005,19(2) : 143-163
  • 6Pu Qiang, Yang Guo Wei. Short-Text Classification Based on ICA and LSA[J]//Proceedings of International Symposium on Neural Networks, 2006 (ISNN 2) : 265-270
  • 7马后锋 樊兴华.一种改进的增量贝叶斯分类算法[J].仪器仪表学报,2007,28(8Ⅲ):312-316.
  • 8Chen Enhong,Wu Gaofeng. An Ontology Learning Method Enhanced by Frame Semantics [J]//Proceedings of the Seventh IEEE International Symposium on Multimedia. 2005:374-382
  • 9郑德权,赵铁军,李生,等.基于内容的词义本体知识自动获取[A]∥全国第八届计算语言学联合学术会议(JSCL-2005)论文集[C].2005.
  • 10T K Landauer,D Laham,B Rehder,M E Schreiner.How wellcan passage meaning be derived without using word order?Acomparison of latent semantic analysis and humans[A].Proc19th Ann Meeting of the Cognitive Science Soc[C].Mawh-wah,NJ:Lawrence Erlbaum,1997.412-417.

共引文献121

同被引文献90

引证文献9

二级引证文献67

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部