期刊文献+

改进词向量和kNN的中文文本分类算法 被引量:6

Improved Chinese text classification algorithm based on word embedding and kNN
下载PDF
导出
摘要 为提高中文文本分类的效率和准确率,针对汉字象形字的特点和数据量剧增的大数据背景,建立基于深度学习的中文文本分类算法。首先根据汉字子字符(字形、偏旁、笔画等)象形字即形状自带含义的特点,建立基于子字符和上下文特征的双通道CBOW模型实现中文文本向量化;其次基于大数据的背景,针对传统的kNN算法分类速度慢的缺点,提出一种基于LSC聚类和多目标数据筛选的快速kNN分类算法;最后运用快速kNN算法对文本数据转化的特征词向量数据进行分类。实验结果表明,改进后的中文文本分类算法增加了算法的使用范围,能够更精确地处理中文文本数据,更快地处理大数据问题,在分类速率和效果上都有一定程度的提升。 By taking account of the characteristics of pictographic characters and the background of big data,a Chinese text classification algorithm based on deep learning is established to improve the efficiency and accuracy of text classification.According to the characteristics of the Chinese subcharacters(glyph,radical,stroke,etc.),that is,the pictographs′ shapes have their own meanings,a two-channel CBOW(continuous bag-of-words) model based on subcharacters and context is established for Chinese text vectorization. Due to the disadvantage of the slow classification speed of the traditional kNN(k-nearest neighbor)algorithm,a fast kNN classification algorithm based on LSC(landmark-based spectral clustering)and multiobjective data screening is proposed on the basis of the background of big data. The fast kNN algorithm is used to classify the feature vector data converted from the text data. The experimental results show that the improved Chinese text classification algorithm can enlarge its application range,process the Chinese text data more accurately and deal with big data problems more quickly. Its classification rate and effect have been improved to some extent.
作者 丁正生 马春洁 DING Zhengsheng;MA Chunjie(Xi’an University of Science and Technology,Xi’an 710600,China)
机构地区 西安科技大学
出处 《现代电子技术》 2022年第1期100-103,共4页 Modern Electronics Technique
基金 国家自然科学基金项目(71473194)。
关键词 中文文本分类 文本向量化 快速kNN算法 词向量 双通道CBOW模型 特征向量 数据分类 Chinese text classification text vectorization fast kNN algorithm word embedding two-channel CBOW model feature vector data classification
  • 相关文献

参考文献9

二级参考文献64

共引文献173

同被引文献51

引证文献6

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部