Multi-label text categorization refers to the problem of categorizing text througha multi-label learning algorithm. Text classification for Asian languages such as Chinese isdifferent from work for other languages suc...Multi-label text categorization refers to the problem of categorizing text througha multi-label learning algorithm. Text classification for Asian languages such as Chinese isdifferent from work for other languages such as English which use spaces to separate words.Before classifying text, it is necessary to perform a word segmentation operation to converta continuous language into a list of separate words and then convert it into a vector of acertain dimension. Generally, multi-label learning algorithms can be divided into twocategories, problem transformation methods and adapted algorithms. This work will usecustomer's comments about some hotels as a training data set, which contains labels for allaspects of the hotel evaluation, aiming to analyze and compare the performance of variousmulti-label learning algorithms on Chinese text classification. The experiment involves threebasic methods of problem transformation methods: Support Vector Machine, Random Forest,k-Nearest-Neighbor;and one adapted algorithm of Convolutional Neural Network. Theexperimental results show that the Support Vector Machine has better performance.展开更多
基金supported by the NSFC (Grant Nos. 61772281,61703212, 61602254)Jiangsu Province Natural Science Foundation [grant numberBK2160968]the Priority Academic Program Development of Jiangsu Higher Edu-cationInstitutions (PAPD) and Jiangsu Collaborative Innovation Center on AtmosphericEnvironment and Equipment Technology (CICAEET).
文摘Multi-label text categorization refers to the problem of categorizing text througha multi-label learning algorithm. Text classification for Asian languages such as Chinese isdifferent from work for other languages such as English which use spaces to separate words.Before classifying text, it is necessary to perform a word segmentation operation to converta continuous language into a list of separate words and then convert it into a vector of acertain dimension. Generally, multi-label learning algorithms can be divided into twocategories, problem transformation methods and adapted algorithms. This work will usecustomer's comments about some hotels as a training data set, which contains labels for allaspects of the hotel evaluation, aiming to analyze and compare the performance of variousmulti-label learning algorithms on Chinese text classification. The experiment involves threebasic methods of problem transformation methods: Support Vector Machine, Random Forest,k-Nearest-Neighbor;and one adapted algorithm of Convolutional Neural Network. Theexperimental results show that the Support Vector Machine has better performance.