期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Parallel naive Bayes algorithm for large-scale Chinese text classification based on spark 被引量:21
1
作者 LIU Peng ZHAO Hui-han +3 位作者 TENG Jia-yu YANG Yan-yan LIU Ya-feng ZHU Zong-wei 《Journal of Central South University》 SCIE EI CAS CSCD 2019年第1期1-12,共12页
The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parall... The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining. 展开更多
关键词 chinese text classification naive Bayes SPARK HADOOP resilient distributed dataset PARALLELIZATION
下载PDF
Supervised Contrastive Learning with Term Weighting for Improving Chinese Text Classification
2
作者 Jiabao Guo Bo Zhao +2 位作者 Hui Liu Yifan Liu Qian Zhong 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第1期59-68,共10页
With the rapid growth of information retrieval technology,Chinese text classification,which is the basis of information content security,has become a widely discussed topic.In view of the huge difference compared with... With the rapid growth of information retrieval technology,Chinese text classification,which is the basis of information content security,has become a widely discussed topic.In view of the huge difference compared with English,Chinese text task is more complex in semantic information representations.However,most existing Chinese text classification approaches typically regard feature representation and feature selection as the key points,but fail to take into account the learning strategy that adapts to the task.Besides,these approaches compress the Chinese word into a representation vector,without considering the distribution of the term among the categories of interest.In order to improve the effect of Chinese text classification,a unified method,called Supervised Contrastive Learning with Term Weighting(SCL-TW),is proposed in this paper.Supervised contrastive learning makes full use of a large amount of unlabeled data to improve model stability.In SCL-TW,we calculate the score of term weighting to optimize the process of data augmentation of Chinese text.Subsequently,the transformed features are fed into a temporal convolution network to conduct feature representation.Experimental verifications are conducted on two Chinese benchmark datasets.The results demonstrate that SCL-TW outperforms other advanced Chinese text classification approaches by an amazing margin. 展开更多
关键词 chinese text classification Supervised Contrastive Learning(SCL) Term Weighting(TW) Temporal Convolution Network(TCN)
原文传递
Chinese News Text Classification Based on Convolutional Neural Network 被引量:1
3
作者 Hanxu Wang Xin Li 《Journal on Big Data》 2022年第1期41-60,共20页
With the explosive growth of Internet text information,the task of text classification is more important.As a part of text classification,Chinese news text classification also plays an important role.In public securit... With the explosive growth of Internet text information,the task of text classification is more important.As a part of text classification,Chinese news text classification also plays an important role.In public security work,public opinion news classification is an important topic.Effective and accurate classification of public opinion news is a necessary prerequisite for relevant departments to grasp the situation of public opinion and control the trend of public opinion in time.This paper introduces a combinedconvolutional neural network text classification model based on word2vec and improved TF-IDF:firstly,the word vector is trained through word2vec model,then the weight of each word is calculated by using the improved TFIDF algorithm based on class frequency variance,and the word vector and weight are combined to construct the text vector representation.Finally,the combined-convolutional neural network is used to train and test the Thucnews data set.The results show that the classification effect of this model is better than the traditional Text-RNN model,the traditional Text-CNN model and word2vec-CNN model.The test accuracy is 97.56%,the accuracy rate is 97%,the recall rate is 97%,and the F1-score is 97%. 展开更多
关键词 chinese news text classification word2vec model improved TF-IDF combined-convolutional neural network public opinion news
下载PDF
Hierarchical Classification of Chinese Documents Based on N grams 被引量:1
4
作者 Zhou Shui geng 1, Guan Ji hong 2, He Yan xiang 2 1. State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China 2. School of Computer Science, Wuhan University, Wuhan 430072, China 《Wuhan University Journal of Natural Sciences》 CAS 2001年第Z1期416-422,共7页
We explore the techniques of utilizing N gram information to categorize Chinese text documents hierarchically so that the classifier can shake off the burden of large dictionaries and complex segmentation process... We explore the techniques of utilizing N gram information to categorize Chinese text documents hierarchically so that the classifier can shake off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A hierarchical Chinese text classifier is implemented. Experimental results show that hierarchically classifying Chinese text documents based N grams can achieve satisfactory performance and outperforms the other traditional Chinese text classifiers. 展开更多
关键词 chinese text classification N grams feature selection hierarchical classification
下载PDF
Multi-Label Chinese Comments Categorization: Comparison of Multi-Label Learning Algorithms 被引量:4
5
作者 Jiahui He Chaozhi Wang +2 位作者 Hongyu Wu Leiming Yan Christian Lu 《Journal of New Media》 2019年第2期51-61,共11页
Multi-label text categorization refers to the problem of categorizing text througha multi-label learning algorithm. Text classification for Asian languages such as Chinese isdifferent from work for other languages suc... Multi-label text categorization refers to the problem of categorizing text througha multi-label learning algorithm. Text classification for Asian languages such as Chinese isdifferent from work for other languages such as English which use spaces to separate words.Before classifying text, it is necessary to perform a word segmentation operation to converta continuous language into a list of separate words and then convert it into a vector of acertain dimension. Generally, multi-label learning algorithms can be divided into twocategories, problem transformation methods and adapted algorithms. This work will usecustomer's comments about some hotels as a training data set, which contains labels for allaspects of the hotel evaluation, aiming to analyze and compare the performance of variousmulti-label learning algorithms on Chinese text classification. The experiment involves threebasic methods of problem transformation methods: Support Vector Machine, Random Forest,k-Nearest-Neighbor;and one adapted algorithm of Convolutional Neural Network. Theexperimental results show that the Support Vector Machine has better performance. 展开更多
关键词 Multi-label classification chinese text classification problem transformation adapted algorithms
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部