期刊文献+

基于向量表示和标签传播的半监督短文本数据流分类算法 被引量:2

Semi-supervised Short Text Stream Classification Based on Vector Representation and Label Propagation
下载PDF
导出
摘要 社交网络平台产生海量的短文本数据流,具有快速、海量、概念漂移、文本长度短小、类标签大量缺失等特点.为此,文中提出基于向量表示和标签传播的半监督短文本数据流分类算法,可对仅含少量有标记数据的数据集进行有效分类.同时,为了适应概念漂移,提出基于聚类簇的概念漂移检测算法.在实际短文本数据流上的实验表明,相比半监督分类算法和半监督数据流分类算法,文中算法不仅提高分类精度和宏平均,还能快速适应数据流中的概念漂移. The huge volume of short text streams produced by social Network is fast, high-volume and it contains concept drift, short length of texts and massive unlabeled data. Therefore, a semisupervised short text stream classification algorithm based on vector representation and label propagation is proposed in this paper to classify short text stream with a few labeled data. Besides, to adapt to the concept drift, a concept drift detection algorithm based on clusters is proposed. Experimental results on real short text streams show that the proposed algorithm improves the classification accuracy and the macro average compared with classical semi-supervised classification algorithms and semi-supervised data stream classification algorithms, and it adapts to the concept drift quickly in data stream.
作者 王海燕 胡学钢 李培培 WANG Haiyan;HU Xuegang;LI Peipei(School of Computer and Information,Hefei University of Technology,Hefei 230601;Anhui Province Key Laboratory of Industry Safety and Emergency Technology,Hefei University of Technology,Hefei 230009)
出处 《模式识别与人工智能》 EI CSCD 北大核心 2018年第7期634-642,共9页 Pattern Recognition and Artificial Intelligence
基金 国家重点研发计划项目(No.2016YFC0801406) 国家自然科学基金项目(No.61503112 61673152)资助~~
关键词 短文本数据流 半监督分类 标签传播 概念漂移 Short Text Stream Semi-supervised Classification Label Propagation Concept Drift
  • 相关文献

同被引文献21

引证文献2

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部