Construction of an Automatic Bengali Text Summarizer Using Machine Learning Approaches

Construction of an Automatic Bengali Text Summarizer Using Machine Learning Approaches

下载PDF

导出

摘要 In our study, we chose python as the programming platform for finding an Automatic Bengali Document Summarizer. English has sufficient tools to process and receive summarized records. However, there is no specifically applicable to Bengali since Bengali has a lot of ambiguity, it differs from English in terms of grammar. Afterward, this language holds an important place because this language is spoken by 26 core people all over the world. As a result, it has taken a new method to summarize Bengali documents. The proposed system has been designed by using the following stages: pre-processing the sample doc/input doc, word tagging, pronoun replacement, sentence ranking, as well as summary. Pronoun replacement has been used to reduce the incidence of swinging pronouns in the performance review. We ranked sentences based on sentence frequency, numerical figures, and pronoun replacement. Checking the similarity between two sentences in order to exclude one since it has less duplication. Hereby, we’ve taken 3000 data as input from newspaper and book documents and learned the words to be appropriate with syntax. In addition, to evaluate the performance of the designed summarizer, the design system looked at the different documents. According to the assessment method, the recall, precision, and F-score were 0.70, 0.82 and 0.74, respectively, representing 70%, 82% and 74% recall, precision, and F-score. It has been found that the proper pronoun replacement was 72%. In our study, we chose python as the programming platform for finding an Automatic Bengali Document Summarizer. English has sufficient tools to process and receive summarized records. However, there is no specifically applicable to Bengali since Bengali has a lot of ambiguity, it differs from English in terms of grammar. Afterward, this language holds an important place because this language is spoken by 26 core people all over the world. As a result, it has taken a new method to summarize Bengali documents. The proposed system has been designed by using the following stages: pre-processing the sample doc/input doc, word tagging, pronoun replacement, sentence ranking, as well as summary. Pronoun replacement has been used to reduce the incidence of swinging pronouns in the performance review. We ranked sentences based on sentence frequency, numerical figures, and pronoun replacement. Checking the similarity between two sentences in order to exclude one since it has less duplication. Hereby, we’ve taken 3000 data as input from newspaper and book documents and learned the words to be appropriate with syntax. In addition, to evaluate the performance of the designed summarizer, the design system looked at the different documents. According to the assessment method, the recall, precision, and F-score were 0.70, 0.82 and 0.74, respectively, representing 70%, 82% and 74% recall, precision, and F-score. It has been found that the proper pronoun replacement was 72%.

作者 Busrat Jahan Mahfuja Khatun Zinat Ara Zabu Afranul Hoque Sayed Uddin Rayhan Busrat Jahan;Mahfuja Khatun;Zinat Ara Zabu;Afranul Hoque;Sayed Uddin Rayhan(Department of Computer Science & Engineering, Feni University, Feni, Bangladesh;Department of Computer Science & Engineering, United International University, Dhaka, Bangladesh)

机构地区 Department of Computer Science & Engineering Department of Computer Science & Engineering

出处《Journal of Data Analysis and Information Processing》 2022年第1期43-57,共15页 数据分析和信息处理（英文）

关键词 Natural Language Processing Formatting Bangla Text Summarizer Bengali Language Processing Word Tagging Pronoun Replacement Sentence Ranking Natural Language Processing Formatting Bangla Text Summarizer Bengali Language Processing Word Tagging Pronoun Replacement Sentence Ranking

分类号 H31 [语言文字—英语]

引文网络
相关文献

1Christoph Kaiser,Bruno P.Mmbando,Joseph N.Siewe Fodjo,Patrick Suykerbuyk,Mohamed Mnacho,Advocatus Kakorozya,William Matuja,Adam Hendy,Helena Greter,Williams H.Makunde,Robert Colebunders.Onchocerciasis-associated epilepsy:another piece in the puzzle from the Mahenge mountains, southern Tanzania[J].Infectious Diseases of Poverty,2019,8(3):88-93.
2Performance Review for Nine Major Listed Tire Companies in the First Three Quarters[J].中国橡胶,2018,34(12).
3Yongjin Hu,Yuanbo Guo,Junxiu Liu,Han Zhang.A Hybrid Method of Coreference Resolution in Information Security[J].Computers, Materials & Continua,2020(8):1297-1315. 被引量：1
4李华旭.基于RNN和Transformer模型的自然语言处理研究综述[J].信息记录材料,2021,22(12):7-10. 被引量：21
5Edith Nesbit.第九章车站上的人[J].疯狂英语（新读写）,2021(12):43-44.
6Edith Nesbit.第七章可怕的秘密[J].疯狂英语（新读写）,2021(12):37-38.
7马雨,解庆,唐伶俐,刘永坚.一种基于多任务学习的方面级情感分析方法[J].计算机应用与软件,2022,39(2):245-252. 被引量：1
8无.Tabloids[J].疯狂英语（初中天地）,2021(12):40-41.
9Yonghong Xie,Liangyuan Hu,Xingxing Chen,Jim Feng,Dezheng Zhang.Auxiliary Diagnosis Based on the Knowledge Graph of TCM Syndrome[J].Computers, Materials & Continua,2020(10):481-494. 被引量：4
10YANG TINGTING,XI DAHE.No Excuses[J].The World of Chinese,2022(1):88-90.

Journal of Data Analysis and Information Processing

2022年第1期

浏览历史

内容加载中请稍等...

Construction of an Automatic Bengali Text Summarizer Using Machine Learning Approaches

相关作者

相关机构

相关主题

浏览历史