期刊文献+

基于搜索的短文本分类算法研究 被引量:5

Search-based short-text classification
下载PDF
导出
摘要 针对传统分类算法在处理短文本时的不足,提出了一种基于搜索的NaveBayes文本分类方法。该分类方法对文本数据集规模、文档长度、类别数量、分布等情况综合考虑,对朴素贝叶斯算法进行改进,将搜索技术应用到了文本分类领域。该分类算法能够更好地适用于微博、微信、短信、短语评论等短文本分类领域。并且在分类算法、分类器构造和评估3方面进行了详细的介绍。实验证明,基于搜索的文本分类器对于短文本有更好的分类效果。 For short-text classification in case the traditional classification algorithm does not work well,this paper proposes a search-based method employing NaiveBayes.The classification method is considered in the text data set scale,document length,the number of categories,distribution and so on.The NaiveBayes algorithm is improved,and the search technology is applied to the domain of text classification.This classification algorithm can be applied to the short text categorization fields such as twitter,WeChat,short message,phrase comment and so on.This paper describes the whole process,including the classification algorithms,training and the evaluation.The results indicates that the classifier has better performance comparing with other methods.
作者 康卫 邱红哲 焦冬冬 房志奇 于寅虎 Kang Wei;Qiu Hongzhe;Jiao Dongdong;Fang Zhiqi;Yu Yinhu(National Computer System Engineering Research Institute of China,Beijing 100083,China;Beijing Aerospace Control Center,Beijing 100094,China)
出处 《电子技术应用》 2018年第11期121-123,128,共4页 Application of Electronic Technique
关键词 文本分类 搜索引擎 短文本 NaiveBayes text classification search engine short text NaiveBayes
  • 相关文献

参考文献4

二级参考文献29

  • 1雷鸣,尹申明,杨叔子.神经网络自适应学习研究[J].系统工程与电子技术,1994,16(3):19-27. 被引量:30
  • 2张立明.人工神经网络的模型及其应用[M].西安:上海:复旦大学出版社,1995..
  • 3周水庚.[D].上海:复旦大学,2000.
  • 4王建会 胡运发.基于等效半径的文本分类算法.技术报告:021011346[R].复旦大学,2002..
  • 5C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery,1998, 2(2): 955--974.
  • 6R. Schapire, Y. Singer. BoosTexter: A boosting-based system for text categorization. Machine Learning, 2000, 39(2/3) : 135-- 168.
  • 7Y. Dasarathy B. V. Minimal consistent set (MCS) identification for optimal nearest neighbor decision system terms design. IEEE Trans. on System Man Cybern, 1994, 24(3): 511-517.
  • 8W. Lam, C. Y. Ho. Using a generalized instance set for automatic text categorization. The 21st Ann. Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval(SIGIR'98), Melbourne, Australia, 1998.
  • 9Fuchun Peng, Dale Schuurmans. Self-supervised Chinese word segmentation. The 4th International Symposiun on Intelligent Data Analysis(IDA 2001), Cascais, Portugal, 2001.
  • 10R. W. Sproat, et al.. A stochastic finite-state wordsegmentation algorithm for Chinese. Computational Linguistics,1996, 22(3): 377--404.

共引文献169

同被引文献41

引证文献5

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部