Roman Urdu News Headline Classification Empowered with Machine Learning

下载PDF

导出

摘要 Roman Urdu has been used for text messaging over the Internet for years especially in Indo-Pak Subcontinent.Persons from the subcontinent may speak the same Urdu language but they might be using different scripts for writing.The communication using the Roman characters,which are used in the script of Urdu language on social media,is now considered the most typical standard of communication in an Indian landmass that makes it an expensive information supply.English Text classification is a solved problem but there have been only a few efforts to examine the rich information supply of Roman Urdu in the past.This is due to the numerous complexities involved in the processing of Roman Urdu data.The complexities associated with Roman Urdu include the non-availability of the tagged corpus,lack of a set of rules,and lack of standardized spellings.A large amount of Roman Urdu news data is available on mainstream news websites and social media websites like Facebook,Twitter but meaningful information can only be extracted if data is in a structured format.We have developed a Roman Urdu news headline classifier,which will help to classify news into relevant categories on which further analysis and modeling can be done.The author of this research aims to develop the Roman Urdu news classifier,which will classify the news into five categories(health,business,technology,sports,international).First,we will develop the news dataset using scraping tools and then after preprocessing,we will compare the results of different machine learning algorithms like Logistic Regression(LR),Multinomial Naïve Bayes(MNB),Long short term memory(LSTM),and Convolutional Neural Network(CNN).After this,we will use a phonetic algorithm to control lexical variation and test news from different websites.The preliminary results suggest that a more accurate classification can be accomplished by monitoring noise inside data and by classifying the news.After applying above mentioned different machine learning algorithms,results have shown that Multinomial Naïve Bayes classifier is giving the best accuracy of 90.17%which is due to the noise lexical variation.

作者 Rizwan Ali Naqvi Muhammad Adnan Khan Nauman Malik Shazia Saqib Tahir Alyas Dildar Hussain

机构地区 Department of Unmanned Vehicle Engineering Department of Computer Science School of Computational Sciences

出处《Computers, Materials & Continua》 SCIE EI 2020年第11期1221-1236,共16页 计算机、材料和连续体（英文）

基金 This work is supported by the KIAS(Research Number:CG076601)and in part by Sejong University Faculty Research Fund.

分类号 H31 [语言文字—英语]

引文网络
相关文献

1HEADLINE[J].China Standardization,2021,86(6):6-11.
2资讯[J].中国外资,2021(12):8-9.
3Thomas Paul,Mark O.Kimberley,Peter N.Beets.Natural forests in New Zealand–a large terrestrial carbon pool in a national state of equilibrium[J].Forest Ecosystems,2021,8(3):458-478. 被引量：2
4Hannes Seppänen,Kirsi Virrantaus.The role of GI-supported methods in crisis management[J].International Journal of Digital Earth,2010,3(4):340-354.
5Merav Kliner,Abigail Knight,Canaan Mamvura,John Wright,John Walley.Using no-cost mobile phone reminders to improve attendance for HIV test results: a pilot study in rural Swaziland[J].Infectious Diseases of Poverty,2013,2(1):93-99. 被引量：2
6周敏.译林版《英语》三(上)Unit 5 Look at me!第一课时[J].小学教学设计,2021(36):26-28.
7Xin Wan.A Study on the Lexical Complexity of Narrative Translation by English Major Learners[J].教育研究前沿（中英文版）,2020,10(3):255-261.
8Nusrat Nasir Nimnee,Md.Abdul Halim.Impact of COVID-19 on Higher Education System:University Student’s Perspective[J].Journal of International Education and Practice,2021,4(1):1-15.
9赵素玉,叶玲燕,饶慧慧,梅益斌,徐哲明.心脏康复运动在经皮冠状动脉介入术后患者中的应用[J].中国乡村医药,2021,28(22):11-13.
10Hantian Wu.Decision-Making Process of International Undergraduate Students: An Exploratory Narrative Inquiry into Reflections of Chinese Students in Canada[J].ECNU Review of Education,2020,3(2):254-268.

Computers, Materials & Continua

2020年第11期

浏览历史

内容加载中请稍等...

Roman Urdu News Headline Classification Empowered with Machine Learning

相关作者

相关机构

相关主题

浏览历史