To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spat...To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spatial Pyramid Pooling(FCN-AC-ASPP).For a picture taken by a smart phone,firstly,the image is transformed into a regular image by the dewarping algorithm.Secondly,the FCN-AC-ASPP is used to classify printed texts and handwritten texts.Lastly,handwritten texts can be removed by a simple algorithm.Experiments show that the classification accuracy of the FCN-AC-ASPP is better than FCN,DeeplabV3+,FCN-AC.For handwritten texts removal effect,the method of combining dewarping and FCN-AC-ASPP is superior to FCN-AC-ASP alone.展开更多
Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the...Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques.展开更多
Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted vid...Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios.展开更多
Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive te...Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).展开更多
Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved throu...Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research.展开更多
Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal ...Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal is to enhance inference speed without sacrificing significant detection accuracy,thereby enabling robust performance on resource-constrained devices like drones,closed-circuit television cameras,and other embedded systems.To achieve this,we propose key modifications to the network architecture to lighten the original backbone and improve feature aggregation,including replacing standard convolution with depth-wise convolution,adopting the C2 sequence module in place of C3,employing Spatial Pyramid Pooling Global(SPPG)instead of Spatial Pyramid Pooling Fast(SPPF)and integrating Bi-directional Feature Pyramid Network(BiFPN)into the neck.Experimental results demonstrate a remarkable 26%improvement in inference speed compared to the baseline,with only marginal reductions of 1.6%and 4.2%in mean average precision(mAP)at the intersection over union(IoU)thresholds of 0.5 and 0.5:0.95,respectively.Our work represents a significant advancement in scene text detection,striking a balance between speed and accuracy,making it well-suited for performance-constrained environments.展开更多
This experimental study investigated how text difficulty and different working memory capacity(WMC)affected Chinese EFL learners’reading comprehension and their tendency to engage in task-unrelated thoughts,that is,m...This experimental study investigated how text difficulty and different working memory capacity(WMC)affected Chinese EFL learners’reading comprehension and their tendency to engage in task-unrelated thoughts,that is,mind wandering(MW),in the course of reading.Sixty first-year university non-English majors participated in the study.A two-factor mixed experimental design of 2(text difficulty:difficult and simple)×2(WMC:high/large and low/small)was employed.Results revealed that 1)the main and interaction effects of WMC and text difficulty on voluntary MW were significant,whereas those on involuntary MW were not;2)while reading the easy texts,the involuntary MW of high-WMC individuals was less frequent than that of low-WMC ones,whereas while reading the difficult ones,the direct relationship between WMC and involuntary MW was not found;and that 3)high-WMC individuals had a lower overall rate of MW and better reading performance than low-WMC individuals did,but with increasing text difficulty,their rates of overall MW and voluntary MW were getting higher and higher,and the reading performance was getting lower and lower.These results lend support to WM theory and have pedagogical implications for the instruction of L2 reading.展开更多
To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved a...To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.展开更多
Sophisticated systems for the testing of fibres and advanced materials were showcased by key members of the British Textile Machinery Association(BTMA)at the forthcoming Techtextil 2024 exhibition which takes place in...Sophisticated systems for the testing of fibres and advanced materials were showcased by key members of the British Textile Machinery Association(BTMA)at the forthcoming Techtextil 2024 exhibition which takes place in Frankfurt,Germany,from April 23-26.展开更多
The assessment of translation quality in political texts is primarily based on achieving effective communication.Throughout the translation process,it is essential to not only accurately convey the original content bu...The assessment of translation quality in political texts is primarily based on achieving effective communication.Throughout the translation process,it is essential to not only accurately convey the original content but also effectively transform the structural mechanisms of the source language.In the translation reconstruction of political texts,various textual cohesion methods are often employed,with conjunctions serving as a primary means for semantic coherence within text units.展开更多
Data compression plays a key role in optimizing the use of memory storage space and also reducing latency in data transmission. In this paper, we are interested in lossless compression techniques because their perform...Data compression plays a key role in optimizing the use of memory storage space and also reducing latency in data transmission. In this paper, we are interested in lossless compression techniques because their performance is exploited with lossy compression techniques for images and videos generally using a mixed approach. To achieve our intended objective, which is to study the performance of lossless compression methods, we first carried out a literature review, a summary of which enabled us to select the most relevant, namely the following: arithmetic coding, LZW, Tunstall’s algorithm, RLE, BWT, Huffman coding and Shannon-Fano. Secondly, we designed a purposive text dataset with a repeating pattern in order to test the behavior and effectiveness of the selected compression techniques. Thirdly, we designed the compression algorithms and developed the programs (scripts) in Matlab in order to test their performance. Finally, following the tests conducted on relevant data that we constructed according to a deliberate model, the results show that these methods presented in order of performance are very satisfactory:- LZW- Arithmetic coding- Tunstall algorithm- BWT + RLELikewise, it appears that on the one hand, the performance of certain techniques relative to others is strongly linked to the sequencing and/or recurrence of symbols that make up the message, and on the other hand, to the cumulative time of encoding and decoding.展开更多
We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract informa...We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract information from medical text,facilitating more accurate classification while minimizing the number of trainable parameters.Extensive experiments conducted on various datasets demonstrate the effectiveness of our approach.展开更多
Objective To discuss how to use social media data for post-marketing drug safety monitoring in China as soon as possible by systematically combing the text mining applications,and to provide new ideas and methods for ...Objective To discuss how to use social media data for post-marketing drug safety monitoring in China as soon as possible by systematically combing the text mining applications,and to provide new ideas and methods for pharmacovigilance.Methods Relevant domestic and foreign literature was used to explore text classification based on machine learning,text mining based on deep learning(neural networks)and adverse drug reaction(ADR)terminology.Results and Conclusion Text classification based on traditional machine learning mainly include support vector machine(SVM)algorithm,naive Bayesian(NB)classifier,decision tree,hidden Markov model(HMM)and bidirectional en-coder representations from transformers(BERT).The main neural network text mining based on deep learning are convolution neural network(CNN),recurrent neural network(RNN)and long short-term memory(LSTM).ADR terminology standardization tools mainly include“Medical Dictionary for Regulatory Activities”(MedDRA),“WHODrug”and“Systematized Nomenclature of Medicine-Clinical Terms”(SNOMED CT).展开更多
The development of science and technology has made it not only possible but very convenient for people living in different parts of the world to communicate with each other, thus bringing forth a new form of communica...The development of science and technology has made it not only possible but very convenient for people living in different parts of the world to communicate with each other, thus bringing forth a new form of communication: computer-mediated communication (CMC). Text-based CMC is one of the most popular forms of CMC in which people send instant messages to others in different settings. Since this mode of interaction combines features of both the written and spoken language (Greenfield & Subrahmanyam, 2003), it's of great interest whether it follows the same sequential rule as the telephone conversation. However, compared to telephone conversations, computer-mediated communication has received much less attention, let alone text-based CMC. The existing body of literature mostly focuses on content analysis and linguistic features but neglects the sequential organization of such interaction (Paolillo, 1999; Greenfield and Subrahmanyam, 2003; Herring, 1999). In light of this, this paper examines the opening moves of instant message exchanges among Chinese adults in an attempt to find out the unique features characterizing the way they open an online chat. The framework that was chosen for data analysis was the sequential model proposed by Schegloff for American telephone openings.展开更多
Two lines of research on eye movements in reading are summarized. One line of research examines how adult readers identify compound words during reading. The other line of research deals with how a specific reading go...Two lines of research on eye movements in reading are summarized. One line of research examines how adult readers identify compound words during reading. The other line of research deals with how a specific reading goal influences the way long expository texts are read. Both lines of research are conducted using Finnish as the source language. With respect to the first research question, it is demonstrated that compound words are recognized either holistically or via their components, depending on the length of the compound word. Readers begin to process whatever information is readily available in the foveal vision(i.e., either the whole-word form or the initial component). The second line of research demonstrates that(1)a specific reading goal is capable of exerting an early effect on readers’ eye fixation patterns,(2)time course analyses based on eye movement patterns can reveal interesting individual differences, and(3)working memory capacity is linked to the efficiency to strategically allocate attention as well as to encode information to and retrieve it from the long-term memory. It is concluded that the eye-tracking technique is an excellent research tool to tap into the workings of the human mind during the comprehension of written texts.展开更多
Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeli...Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeling.Even though the importance of this task,Arabic Text Classification tools still suffer from many problems and remain incapable of responding to the increasing volume of Arabic content that circulates on the web or resides in large databases.This paper introduces a novel machine learning-based approach that exclusively uses hybrid(stylistic and semantic)features.First,we clean the Arabic documents and translate them to English using translation tools.Consequently,the semantic features are automatically extracted from the translated documents using an existing database of English topics.Besides,the model automatically extracts from the textual content a set of stylistic features such as word and character frequencies and punctuation.Therefore,we obtain 3 types of features:semantic,stylistic and hybrid.Using each time,a different type of feature,we performed an in-depth comparison study of nine well-known Machine Learning models to evaluate our approach and used a standard Arabic corpus.The obtained results show that Neural Network outperforms other models and provides good performances using hybrid features(F1-score=0.88%).展开更多
基金Sponsored by the Scientific Research Project of Zhejiang Provincial Department of Education(Grant No.KYY-ZX-20210329).
文摘To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spatial Pyramid Pooling(FCN-AC-ASPP).For a picture taken by a smart phone,firstly,the image is transformed into a regular image by the dewarping algorithm.Secondly,the FCN-AC-ASPP is used to classify printed texts and handwritten texts.Lastly,handwritten texts can be removed by a simple algorithm.Experiments show that the classification accuracy of the FCN-AC-ASPP is better than FCN,DeeplabV3+,FCN-AC.For handwritten texts removal effect,the method of combining dewarping and FCN-AC-ASPP is superior to FCN-AC-ASP alone.
基金supported by the General Projects of ISTIC Innovation Foundation“Problem innovation solution mining based on text generation model”(MS2024-03).
文摘Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques.
基金This work is supported in part by the National Natural Science Foundation of China(Grant Number 61971078)which provided domain expertise and computational power that greatly assisted the activity+1 种基金This work was financially supported by Chongqing Municipal Education Commission Grants forMajor Science and Technology Project(KJZD-M202301901)the Science and Technology Research Project of Jiangxi Department of Education(GJJ2201049).
文摘Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios.
文摘Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).
文摘Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research.
基金the National Natural Science Foundation of PRChina(42075130)Nari Technology Co.,Ltd.(4561655965)。
文摘Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal is to enhance inference speed without sacrificing significant detection accuracy,thereby enabling robust performance on resource-constrained devices like drones,closed-circuit television cameras,and other embedded systems.To achieve this,we propose key modifications to the network architecture to lighten the original backbone and improve feature aggregation,including replacing standard convolution with depth-wise convolution,adopting the C2 sequence module in place of C3,employing Spatial Pyramid Pooling Global(SPPG)instead of Spatial Pyramid Pooling Fast(SPPF)and integrating Bi-directional Feature Pyramid Network(BiFPN)into the neck.Experimental results demonstrate a remarkable 26%improvement in inference speed compared to the baseline,with only marginal reductions of 1.6%and 4.2%in mean average precision(mAP)at the intersection over union(IoU)thresholds of 0.5 and 0.5:0.95,respectively.Our work represents a significant advancement in scene text detection,striking a balance between speed and accuracy,making it well-suited for performance-constrained environments.
文摘This experimental study investigated how text difficulty and different working memory capacity(WMC)affected Chinese EFL learners’reading comprehension and their tendency to engage in task-unrelated thoughts,that is,mind wandering(MW),in the course of reading.Sixty first-year university non-English majors participated in the study.A two-factor mixed experimental design of 2(text difficulty:difficult and simple)×2(WMC:high/large and low/small)was employed.Results revealed that 1)the main and interaction effects of WMC and text difficulty on voluntary MW were significant,whereas those on involuntary MW were not;2)while reading the easy texts,the involuntary MW of high-WMC individuals was less frequent than that of low-WMC ones,whereas while reading the difficult ones,the direct relationship between WMC and involuntary MW was not found;and that 3)high-WMC individuals had a lower overall rate of MW and better reading performance than low-WMC individuals did,but with increasing text difficulty,their rates of overall MW and voluntary MW were getting higher and higher,and the reading performance was getting lower and lower.These results lend support to WM theory and have pedagogical implications for the instruction of L2 reading.
文摘To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.
文摘Sophisticated systems for the testing of fibres and advanced materials were showcased by key members of the British Textile Machinery Association(BTMA)at the forthcoming Techtextil 2024 exhibition which takes place in Frankfurt,Germany,from April 23-26.
基金This article is a phased achievement of the 2020 research project“Research on Chinese-Russian Translation of Political Terminology Based on Corpora”(YB2020005)by CNTERM.
文摘The assessment of translation quality in political texts is primarily based on achieving effective communication.Throughout the translation process,it is essential to not only accurately convey the original content but also effectively transform the structural mechanisms of the source language.In the translation reconstruction of political texts,various textual cohesion methods are often employed,with conjunctions serving as a primary means for semantic coherence within text units.
文摘Data compression plays a key role in optimizing the use of memory storage space and also reducing latency in data transmission. In this paper, we are interested in lossless compression techniques because their performance is exploited with lossy compression techniques for images and videos generally using a mixed approach. To achieve our intended objective, which is to study the performance of lossless compression methods, we first carried out a literature review, a summary of which enabled us to select the most relevant, namely the following: arithmetic coding, LZW, Tunstall’s algorithm, RLE, BWT, Huffman coding and Shannon-Fano. Secondly, we designed a purposive text dataset with a repeating pattern in order to test the behavior and effectiveness of the selected compression techniques. Thirdly, we designed the compression algorithms and developed the programs (scripts) in Matlab in order to test their performance. Finally, following the tests conducted on relevant data that we constructed according to a deliberate model, the results show that these methods presented in order of performance are very satisfactory:- LZW- Arithmetic coding- Tunstall algorithm- BWT + RLELikewise, it appears that on the one hand, the performance of certain techniques relative to others is strongly linked to the sequencing and/or recurrence of symbols that make up the message, and on the other hand, to the cumulative time of encoding and decoding.
文摘We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract information from medical text,facilitating more accurate classification while minimizing the number of trainable parameters.Extensive experiments conducted on various datasets demonstrate the effectiveness of our approach.
文摘Objective To discuss how to use social media data for post-marketing drug safety monitoring in China as soon as possible by systematically combing the text mining applications,and to provide new ideas and methods for pharmacovigilance.Methods Relevant domestic and foreign literature was used to explore text classification based on machine learning,text mining based on deep learning(neural networks)and adverse drug reaction(ADR)terminology.Results and Conclusion Text classification based on traditional machine learning mainly include support vector machine(SVM)algorithm,naive Bayesian(NB)classifier,decision tree,hidden Markov model(HMM)and bidirectional en-coder representations from transformers(BERT).The main neural network text mining based on deep learning are convolution neural network(CNN),recurrent neural network(RNN)and long short-term memory(LSTM).ADR terminology standardization tools mainly include“Medical Dictionary for Regulatory Activities”(MedDRA),“WHODrug”and“Systematized Nomenclature of Medicine-Clinical Terms”(SNOMED CT).
文摘The development of science and technology has made it not only possible but very convenient for people living in different parts of the world to communicate with each other, thus bringing forth a new form of communication: computer-mediated communication (CMC). Text-based CMC is one of the most popular forms of CMC in which people send instant messages to others in different settings. Since this mode of interaction combines features of both the written and spoken language (Greenfield & Subrahmanyam, 2003), it's of great interest whether it follows the same sequential rule as the telephone conversation. However, compared to telephone conversations, computer-mediated communication has received much less attention, let alone text-based CMC. The existing body of literature mostly focuses on content analysis and linguistic features but neglects the sequential organization of such interaction (Paolillo, 1999; Greenfield and Subrahmanyam, 2003; Herring, 1999). In light of this, this paper examines the opening moves of instant message exchanges among Chinese adults in an attempt to find out the unique features characterizing the way they open an online chat. The framework that was chosen for data analysis was the sequential model proposed by Schegloff for American telephone openings.
文摘Two lines of research on eye movements in reading are summarized. One line of research examines how adult readers identify compound words during reading. The other line of research deals with how a specific reading goal influences the way long expository texts are read. Both lines of research are conducted using Finnish as the source language. With respect to the first research question, it is demonstrated that compound words are recognized either holistically or via their components, depending on the length of the compound word. Readers begin to process whatever information is readily available in the foveal vision(i.e., either the whole-word form or the initial component). The second line of research demonstrates that(1)a specific reading goal is capable of exerting an early effect on readers’ eye fixation patterns,(2)time course analyses based on eye movement patterns can reveal interesting individual differences, and(3)working memory capacity is linked to the efficiency to strategically allocate attention as well as to encode information to and retrieve it from the long-term memory. It is concluded that the eye-tracking technique is an excellent research tool to tap into the workings of the human mind during the comprehension of written texts.
文摘Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeling.Even though the importance of this task,Arabic Text Classification tools still suffer from many problems and remain incapable of responding to the increasing volume of Arabic content that circulates on the web or resides in large databases.This paper introduces a novel machine learning-based approach that exclusively uses hybrid(stylistic and semantic)features.First,we clean the Arabic documents and translate them to English using translation tools.Consequently,the semantic features are automatically extracted from the translated documents using an existing database of English topics.Besides,the model automatically extracts from the textual content a set of stylistic features such as word and character frequencies and punctuation.Therefore,we obtain 3 types of features:semantic,stylistic and hybrid.Using each time,a different type of feature,we performed an in-depth comparison study of nine well-known Machine Learning models to evaluate our approach and used a standard Arabic corpus.The obtained results show that Neural Network outperforms other models and provides good performances using hybrid features(F1-score=0.88%).