期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Benchmarking Performance of Document Level Classification and Topic Modeling 被引量:1
1
作者 Muhammad Shahid Bhatti Azmat Ullah +3 位作者 Rohaya Latip Abid Sohail Anum Riaz Rohail Hassan 《Computers, Materials & Continua》 SCIE EI 2022年第4期125-141,共17页
Text classification of low resource language is always a trivial and challenging problem.This paper discusses the process of Urdu news classification and Urdu documents similarity.Urdu is one of the most famous spoken... Text classification of low resource language is always a trivial and challenging problem.This paper discusses the process of Urdu news classification and Urdu documents similarity.Urdu is one of the most famous spoken languages in Asia.The implementation of computational methodologies for text classification has increased over time.However,Urdu language has not much experimented with research,it does not have readily available datasets,which turn out to be the primary reason behind limited research and applying the latest methodologies to the Urdu.To overcome these obstacles,a mediumsized dataset having six categories is collected from authentic Pakistani news sources.Urdu is a rich but complex language.Text processing can be challenging for Urdu due to its complex features as compared to other languages.Term frequency-inverse document frequency(TFIDF)based term weighting scheme for extracting features,chi-2 for selecting essential features,and Linear discriminant analysis(LDA)for dimensionality reduction have been used.TFIDF matrix and cosine similarity measure have been used to identify similar documents in a collection and find the semantic meaning of words in a document FastText model has been applied.The training-test split evaluation methodology is used for this experimentation,which includes 70%for training data and 30%for testing data.State-of-the-art machine learning and deep dense neural network approaches for Urdu news classification have been used.Finally,we trained Multinomial Naïve Bayes,XGBoost,Bagging,and Deep dense neural network.Bagging and deep dense neural network outperformed the other algorithms.The experimental results show that deep dense achieves 92.0%mean f1 score,and Bagging 95.0%f1 score. 展开更多
关键词 Deep neural network machine learning natural language processing TFIDF sparse matrix cosine similarity CLASSIFICATION linear discriminant analysis gradient boosting
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部