In this paper,a new method for automatic classification of texts is presented.This system includes two phases;text processing and text categorization.In the first phase,various indexing criteria such as bi...In this paper,a new method for automatic classification of texts is presented.This system includes two phases;text processing and text categorization.In the first phase,various indexing criteria such as bigram,trigram and quad-gram are presented to extract the properties.Then,in the second phase,the W-SMO machine learning algorithm is used to train the system.In order to evaluate and compare the results of the two criteria of accuracy and readability,Macro-F1 and Micro-F1 have been calculated for different indexing methods.The results of experiments performed on 7676 standard text documents of Reuters showed that the best performance is related to w-smo bigram criteria with accuracy of 95.17 micro and 79.86 macro.Also,the results indicated that our proposed method has the best performance compared to the W-j48,Naïve Bayes,K-NN and Decision Tree algorithms.展开更多
文摘In this paper,a new method for automatic classification of texts is presented.This system includes two phases;text processing and text categorization.In the first phase,various indexing criteria such as bigram,trigram and quad-gram are presented to extract the properties.Then,in the second phase,the W-SMO machine learning algorithm is used to train the system.In order to evaluate and compare the results of the two criteria of accuracy and readability,Macro-F1 and Micro-F1 have been calculated for different indexing methods.The results of experiments performed on 7676 standard text documents of Reuters showed that the best performance is related to w-smo bigram criteria with accuracy of 95.17 micro and 79.86 macro.Also,the results indicated that our proposed method has the best performance compared to the W-j48,Naïve Bayes,K-NN and Decision Tree algorithms.