Automatic Arabic Document Classification via kNN

Automatic Arabic Document Classification via kNN

下载PDF

导出

摘要 Many algorithms have been implemented for the problem of document categorization. The majority work in this area was achieved for English text, while a very few approaches have been introduced for the Arabic text. The nature of Arabic text is different from that of the English text and the preprocessing of the Arabic text is more challenging. This is due to Arabic language is a highly inflectional and derivational language that makes document mining a hard and complex task. In this paper, we present an Automatic Arabic documents classification system based on kNN algorithm. Also, we develop an approach to solve keywords extraction and reduction problems by using Document Frequency (DF) threshold method. The results indicate that the ability of the kNN to deal with Arabic text outperforms the other existing systems. The proposed system reached 0.95 micro-recall scores with 850 Arabic texts in 6 different categories. Many algorithms have been implemented for the problem of document categorization. The majority work in this area was achieved for English text, while a very few approaches have been introduced for the Arabic text. The nature of Arabic text is different from that of the English text and the preprocessing of the Arabic text is more challenging. This is due to Arabic language is a highly inflectional and derivational language that makes document mining a hard and complex task. In this paper, we present an Automatic Arabic documents classification system based on kNN algorithm. Also, we develop an approach to solve keywords extraction and reduction problems by using Document Frequency (DF) threshold method. The results indicate that the ability of the kNN to deal with Arabic text outperforms the other existing systems. The proposed system reached 0.95 micro-recall scores with 850 Arabic texts in 6 different categories.

作者 HANI M. O. Iwidat

机构地区 School of Computer Science and Engineering

出处《Computer Aided Drafting,Design and Manufacturing》 2008年第2期65-73,共9页 计算机辅助绘图设计与制造（英文版）

关键词 Arabic documents classification KNN vector model keywords extraction Arabic documents classification kNN vector model keywords extraction

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献14

1Kostas Fragos,Yannis Maistros,Christos Skourlas.A weighted maximum entropy language model for text classification. NLUCS2005 .
2He J,,Tan A,Tan C.Comparative study on Chinese text categorization methods. Proceeding the PRICAI 2000 Workshop on Text and Web Mining . 2000
3Laila Khreisat.Arabic text classification using N-gram frequency statistics a comparative study. http://ww1.ucmss.com/books/LFS/CSREA2006/DMI5 552.pdf . 2006
4Bassam Hammo,Hani Abu-Salem,Steven Lytinen.QARAB: A question answering system to support the arabic language. Proceedings of the ACL 2nd Workshop on Computational Approaches to SemiticLanguages . 2002
5Sakhr software company‘s website Hhttp://www.sakhrsoft.com . 2004
6El-Kourdi M,Bensaid A,Rachidi T.Automatic Arabic document categorization based on the Na?ve Bayes algorithm. Proceedings 20th International Conference on Computational Linguistics . 2004
7Sawaf H,Zaplo J,Ney H.Statistical classification methods for Arabic news articles. http://www.elsnet.org/arabic2001/sawaf.pdf . 2001
8Chen A,Gey F.Building an Arabic stemmer for information retrieval. Proceedings the 11th Text Retrieval Conference . 2002
9AL Quds.com website http://www.alquds.com .
10Aljazeera.net, website http://www.aljazeera.net .

1Marwan AIi.H. Omer Shilong Ma.Stemming Algorithm to Classify Arabic Documents[J].通讯和计算机（中英文版）,2010,7(9):1-5. 被引量：1
2Yi-Hsing Chang.Automatically Constructing an Effective Domain Ontology for Document Classification[J].Computer Technology and Application,2011,2(3):182-189. 被引量：2
3朱颢东,周姝,钟勇.结合ODF和辨识集的特征选择[J].重庆邮电大学学报（自然科学版）,2010,22(1):94-98. 被引量：1
4Abbas,H,HASSIN,黄建华,唐降龙.A word level segmentation for off-line Arabic characters[J].Journal of Harbin Institute of Technology(New Series),2002,9(4):391-396.
5黄建华,唐降龙.Offline Arabic character recognition system[J].Journal of Harbin Institute of Technology(New Series),2003,10(1):80-88.
6Mehmmood A. Abd,Sarab Al Rubeaai,George Paschos.Hybrid Features for an Arabic Word Recognition System[J].Computer Technology and Application,2012,3(10):685-691.
7Gameel Abdelmageed Hussein Ahmed.Towards Utilizing Media Materials in Teaching Arabic[J].US-China Education Review(A),2015,5(3):194-204.
8段刚龙,黄志文,王建仁.一种F-scores和SVM结合的客户分类方法[J].计算机系统应用,2011,20(1):197-200.
9Fawwaz AI-Abed A1-Haq Ibrahim AI-Salman.Jordanian Bengali Pidgin Arabic[J].US-China Foreign Language,2014,12(5):331-348.
10Muhammad Sabri Sahrir.Design of an Arabic Spell Checker Font for Enhancing Writing Skills：A Self-learning Prototype Among Non-Arabic Speakers[J].US-China Education Review(A),2015,5(1):26-37.

Computer Aided Drafting,Design and Manufacturing

2008年第2期

浏览历史

内容加载中请稍等...

Automatic Arabic Document Classification via kNN

参考文献14

相关作者

相关机构

相关主题

浏览历史