期刊文献+

Automatic Arabic Document Classification via kNN

Automatic Arabic Document Classification via kNN
下载PDF
导出
摘要 Many algorithms have been implemented for the problem of document categorization. The majority work in this area was achieved for English text, while a very few approaches have been introduced for the Arabic text. The nature of Arabic text is different from that of the English text and the preprocessing of the Arabic text is more challenging. This is due to Arabic language is a highly inflectional and derivational language that makes document mining a hard and complex task. In this paper, we present an Automatic Arabic documents classification system based on kNN algorithm. Also, we develop an approach to solve keywords extraction and reduction problems by using Document Frequency (DF) threshold method. The results indicate that the ability of the kNN to deal with Arabic text outperforms the other existing systems. The proposed system reached 0.95 micro-recall scores with 850 Arabic texts in 6 different categories. Many algorithms have been implemented for the problem of document categorization. The majority work in this area was achieved for English text, while a very few approaches have been introduced for the Arabic text. The nature of Arabic text is different from that of the English text and the preprocessing of the Arabic text is more challenging. This is due to Arabic language is a highly inflectional and derivational language that makes document mining a hard and complex task. In this paper, we present an Automatic Arabic documents classification system based on kNN algorithm. Also, we develop an approach to solve keywords extraction and reduction problems by using Document Frequency (DF) threshold method. The results indicate that the ability of the kNN to deal with Arabic text outperforms the other existing systems. The proposed system reached 0.95 micro-recall scores with 850 Arabic texts in 6 different categories.
出处 《Computer Aided Drafting,Design and Manufacturing》 2008年第2期65-73,共9页 计算机辅助绘图设计与制造(英文版)
关键词 Arabic documents classification KNN vector model keywords extraction Arabic documents classification kNN vector model keywords extraction
  • 相关文献

参考文献14

  • 1Kostas Fragos,Yannis Maistros,Christos Skourlas.A weighted maximum entropy language model for text classification. NLUCS2005 .
  • 2He J,,Tan A,Tan C.Comparative study on Chinese text categorization methods. Proceeding the PRICAI 2000 Workshop on Text and Web Mining . 2000
  • 3Laila Khreisat.Arabic text classification using N-gram frequency statistics a comparative study. http://ww1.ucmss.com/books/LFS/CSREA2006/DMI5 552.pdf . 2006
  • 4Bassam Hammo,Hani Abu-Salem,Steven Lytinen.QARAB: A question answering system to support the arabic language. Proceedings of the ACL 2nd Workshop on Computational Approaches to SemiticLanguages . 2002
  • 5Sakhr software company‘s website Hhttp://www.sakhrsoft.com . 2004
  • 6El-Kourdi M,Bensaid A,Rachidi T.Automatic Arabic document categorization based on the Na?ve Bayes algorithm. Proceedings 20th International Conference on Computational Linguistics . 2004
  • 7Sawaf H,Zaplo J,Ney H.Statistical classification methods for Arabic news articles. http://www.elsnet.org/arabic2001/sawaf.pdf . 2001
  • 8Chen A,Gey F.Building an Arabic stemmer for information retrieval. Proceedings the 11th Text Retrieval Conference . 2002
  • 9AL Quds.com website http://www.alquds.com .
  • 10Aljazeera.net, website http://www.aljazeera.net .

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部