期刊文献+
共找到31,157篇文章
< 1 2 250 >
每页显示 20 50 100
Smart Approaches to Efficient Text Mining for Categorizing Sexual Reproductive Health Short Messages into Key Themes
1
作者 Tobias Makai Mayumbo Nyirenda 《Open Journal of Applied Sciences》 2024年第2期511-532,共22页
To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved a... To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms. 展开更多
关键词 Knowledge Discovery in text (KDT) Sexual Reproductive Health (SRH) text Categorization text Classification text Extraction text Mining Feature Extraction Automated Classification Process Performance Stemming and Lemmatization Natural Language Processing (NLP)
下载PDF
CVTD: A Robust Car-Mounted Video Text Detector
2
作者 Di Zhou Jianxun Zhang +2 位作者 Chao Li Yifan Guo Bowen Li 《Computers, Materials & Continua》 SCIE EI 2024年第2期1821-1842,共22页
Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted vid... Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios. 展开更多
关键词 Deep learning text detection Car-mounted video text detector intelligent driving assistance arbitrary shape text detector
下载PDF
Identifying multidisciplinary problems from scientific publications based on a text generation method
3
作者 Ziyan Xu Hongqi Han +2 位作者 Linna Li Junsheng Zhang Zexu Zhou 《Journal of Data and Information Science》 CSCD 2024年第3期213-237,共25页
Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the... Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques. 展开更多
关键词 Problem identification MULTIDISCIPLINARY text generation text classification
下载PDF
Leveraging Uncertainty for Depth-Aware Hierarchical Text Classification
4
作者 Zixuan Wu Ye Wang +2 位作者 Lifeng Shen Feng Hu Hong Yu 《Computers, Materials & Continua》 SCIE EI 2024年第9期4111-4127,共17页
Hierarchical Text Classification(HTC)aims to match text to hierarchical labels.Existing methods overlook two critical issues:first,some texts cannot be fully matched to leaf node labels and need to be classified to th... Hierarchical Text Classification(HTC)aims to match text to hierarchical labels.Existing methods overlook two critical issues:first,some texts cannot be fully matched to leaf node labels and need to be classified to the correct parent node instead of treating leaf nodes as the final classification target.Second,error propagation occurs when a misclassification at a parent node propagates down the hierarchy,ultimately leading to inaccurate predictions at the leaf nodes.To address these limitations,we propose an uncertainty-guided HTC depth-aware model called DepthMatch.Specifically,we design an early stopping strategy with uncertainty to identify incomplete matching between text and labels,classifying them into the corresponding parent node labels.This approach allows us to dynamically determine the classification depth by leveraging evidence to quantify and accumulate uncertainty.Experimental results show that the proposed DepthMatch outperforms recent strong baselines on four commonly used public datasets:WOS(Web of Science),RCV1-V2(Reuters Corpus Volume I),AAPD(Arxiv Academic Paper Dataset),and BGC.Notably,on the BGC dataset,it improvesMicro-F1 andMacro-F1 scores by at least 1.09%and 1.74%,respectively. 展开更多
关键词 Hierarchical text classification incomplete text-label matching UNCERTAINTY depth-aware early stopping strategy
下载PDF
基于改进TextRank的科技文本关键词抽取方法
5
作者 杨冬菊 胡成富 《计算机应用》 CSCD 北大核心 2024年第6期1720-1726,共7页
针对科技文本关键词抽取任务中抽取出现次数少但能较好表达文本主旨的词语效果差的问题,提出一种基于改进TextRank的关键词抽取方法。首先,利用词语的词频-逆文档频率(TF-IDF)统计特征和位置特征优化共现图中词语间的概率转移矩阵,通过... 针对科技文本关键词抽取任务中抽取出现次数少但能较好表达文本主旨的词语效果差的问题,提出一种基于改进TextRank的关键词抽取方法。首先,利用词语的词频-逆文档频率(TF-IDF)统计特征和位置特征优化共现图中词语间的概率转移矩阵,通过迭代计算得到词语的初始得分;然后,利用K-Core(K-Core decomposition)算法挖掘KCore子图得到词语的层级特征,利用平均信息熵特征衡量词语的主题表征能力;最后,在词语初始得分的基础上融合层级特征和平均信息熵特征,从而确定关键词。实验结果表明,在公开数据集上,与TextRank方法和OTextRank(Optimized TextRank)方法相比,所提方法在抽取不同关键词数量的实验中,F1均值分别提高了6.5和3.3个百分点;在科技服务项目数据集上,与TextRank方法和OTextRank方法相比,所提方法在抽取不同关键词数量的实验中,F1均值分别提高了7.4和3.2个百分点。实验结果验证了所提方法抽取出现频率低但较好表达文本主旨关键词的有效性。 展开更多
关键词 科技文本 关键词抽取 textRank K-Core图 平均信息熵
下载PDF
Method to Remove Handwritten Texts Using Smart Phone
6
作者 Haiquan Fang 《Journal of Harbin Institute of Technology(New Series)》 CAS 2024年第2期12-21,共10页
To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spat... To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spatial Pyramid Pooling(FCN-AC-ASPP).For a picture taken by a smart phone,firstly,the image is transformed into a regular image by the dewarping algorithm.Secondly,the FCN-AC-ASPP is used to classify printed texts and handwritten texts.Lastly,handwritten texts can be removed by a simple algorithm.Experiments show that the classification accuracy of the FCN-AC-ASPP is better than FCN,DeeplabV3+,FCN-AC.For handwritten texts removal effect,the method of combining dewarping and FCN-AC-ASPP is superior to FCN-AC-ASP alone. 展开更多
关键词 handwritten texts printed texts CLASSIFICATION FCN-AC-ASPP smart phone
下载PDF
A Comparative Study of Artificial Intelligence and Translation Software in Chinese-English Translation:A Focus on Literary and Technical Texts
7
作者 LIU Yong-shan 《Journal of Literature and Art Studies》 2024年第9期815-820,共6页
In recent years,the domain of machine translation has experienced remarkable growth,particularly with the emergence of neural machine translation,which has significantly enhanced both the accuracy and fluency of trans... In recent years,the domain of machine translation has experienced remarkable growth,particularly with the emergence of neural machine translation,which has significantly enhanced both the accuracy and fluency of translation.At the same time,AI also showed its tremendous advancement,with its capabilities now extending to assisting users in a multitude of tasks,including translation,garnering attention across various sectors.In this paper,the author selects representative sentences from both literary and scientific texts,and translates them using two translation software and two AI tools for comparison.The results show that all four translation tools are very efficient and can help with simple translation tasks.However,the accuracy of terminology needs to be improved,and it is difficult to make adjustments based on the characteristics of the target language.It is worth mentioning that one of the advantages of AI is its interactivity,which allows it to modify the translation according to the translator’s needs. 展开更多
关键词 Artificial Intelligence translation software literary texts technical texts
下载PDF
Study on the Textual Coherence Function of Conjunctions in Political Texts and Their Translation Reconstruction
8
作者 Goya Guli Kader Jingwen Qiao Aixia Yang 《Journal of Contemporary Educational Research》 2024年第1期25-30,共6页
The assessment of translation quality in political texts is primarily based on achieving effective communication.Throughout the translation process,it is essential to not only accurately convey the original content bu... The assessment of translation quality in political texts is primarily based on achieving effective communication.Throughout the translation process,it is essential to not only accurately convey the original content but also effectively transform the structural mechanisms of the source language.In the translation reconstruction of political texts,various textual cohesion methods are often employed,with conjunctions serving as a primary means for semantic coherence within text units. 展开更多
关键词 Political texts CONJUNCTIONS textual cohesion Chinese to Russian translation
下载PDF
基于SWPF2vec和DJ-TextRCNN的古籍文本主题分类研究 被引量:1
9
作者 武帅 杨秀璋 +1 位作者 何琳 公佐权 《情报学报》 CSSCI CSCD 北大核心 2024年第5期601-615,共15页
以编目分类和规则匹配为主的古籍文本主题分类方法存在工作效能低、专家知识依赖性强、分类依据单一化、古籍文本主题自动分类难等问题。对此,本文结合古籍文本内容和文字特征,尝试从古籍内容分类得到符合研究者需求的主题,推动数字人... 以编目分类和规则匹配为主的古籍文本主题分类方法存在工作效能低、专家知识依赖性强、分类依据单一化、古籍文本主题自动分类难等问题。对此,本文结合古籍文本内容和文字特征,尝试从古籍内容分类得到符合研究者需求的主题,推动数字人文研究范式的转型。首先,参照东汉古籍《说文解字》对文字的分析方式,以前期标注的古籍语料数据集为基础,构建全新的“字音(说)-原文(文)-结构(解)-字形(字)”四维特征数据集。其次,设计四维特征向量提取模型(speaking,word,pattern,and font to vector,SWPF2vec),并结合预训练模型实现对古籍文本细粒度的特征表示。再其次,构建融合卷积神经网络、循环神经网络和多头注意力机制的古籍文本主题分类模型(dianji-recurrent convolutional neural networks for text classification,DJ-TextRCNN)。最后,融入四维语义特征,实现对古籍文本多维度、深层次、细粒度的语义挖掘。在古籍文本主题分类任务上,DJ-TextRCNN模型在不同维度特征下的主题分类准确率均为最优,在“说文解字”四维特征下达到76.23%的准确率,初步实现了对古籍文本的精准主题分类。 展开更多
关键词 多维特征融合 古籍文本 主题分类 SWPF2vec DJ-textRCNN
下载PDF
一种利用词典扩展数据库模式信息的Text2SQL方法
10
作者 于晓昕 何东 +2 位作者 叶子铭 陈黎 于中华 《四川大学学报(自然科学版)》 CAS CSCD 北大核心 2024年第1期78-88,共11页
现有Text2SQL方法严重依赖表名和列名在自然语言查询中的显式提及,在同物异名的实际应用场景中准确率急剧下降.此外,这些方法仅仅依赖数据库模式捕捉数据库建模的领域知识,而数据库模式作为结构化的元数据,其表达领域知识的能力是非常... 现有Text2SQL方法严重依赖表名和列名在自然语言查询中的显式提及,在同物异名的实际应用场景中准确率急剧下降.此外,这些方法仅仅依赖数据库模式捕捉数据库建模的领域知识,而数据库模式作为结构化的元数据,其表达领域知识的能力是非常有限的,即使有经验的程序员也很难仅从数据库模式完全领会该数据库建模的领域知识,因此程序员必须依赖详细的数据库设计文档才能构造SQL语句以正确地表达特定的查询.为此,本文提出一种利用词典扩展数据库模式信息的Text2SQL方法,该方法从数据库表名和列名解析出其中的单词或短语,查询词典获取这些单词或短语的语义解释,将这些解释看成是相应表名或列名的扩展内容,与表名、列名及其他数据库模式信息(主键、外键等)相结合,作为模型的输入,从而使模型能够更全面地学习数据库建模的应用领域知识.在Spider-syn和Spider数据集上进行的实验说明了所提出方法的有效性,即使自然语言查询中使用的表名和列名与数据库模式中对应的表名和列名完全不同,本文方法也能够得到较好的SQL翻译结果,明显优于最新提出的抗同义词替换攻击的方法. 展开更多
关键词 数据库模式 语义扩展 解释信息 text2SQL
下载PDF
树立行业发展新方向——Techtextil&Texprocess 2024亮点回顾
11
作者 张娜 王佳月 赵永霞 《纺织导报》 CAS 2024年第3期41-50,共10页
为期4天的法兰克福国际产业用纺织品及非织造布展览会及国际纺织品及柔性材料缝制加工展览会(Techtextil&Texprocess 2024)吸引了来自全球53个国家和地区的1700家领先企业参展和来自102个国家和地区的38000名观众,展会规模再创新高... 为期4天的法兰克福国际产业用纺织品及非织造布展览会及国际纺织品及柔性材料缝制加工展览会(Techtextil&Texprocess 2024)吸引了来自全球53个国家和地区的1700家领先企业参展和来自102个国家和地区的38000名观众,展会规模再创新高,充分彰显了纺织行业蓬勃的生命力与持续的创新力。 展开更多
关键词 产业用纺织品 纺织行业 柔性材料 国际纺织品 展会规模 发展新方向 text 法兰克福
下载PDF
多视图融合DJ-TextRCNN的古籍文本主题推荐研究
12
作者 武帅 杨秀璋 何琳 《情报学报》 CSSCI CSCD 北大核心 2024年第1期61-75,共15页
传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需... 传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需求的文本主题内容的方法,以推动数字人文研究的进一步发展。首先,选取本课题组前期标注的古籍语料数据进行主题类别标注和视图分类;其次,构建融合BERT(bidirectional encoder representation from transformers)预训练模型、改进卷积神经网络、循环神经网络和多头注意力机制的语义挖掘模型;最后,融入“主体-关系-客体”多视图的语义增强模型,构建DJ-TextRCNN(DianJi-recurrent convolutional neural networks for text classification)模型实现对典籍文本更细粒度、更深层次、更多维度的语义挖掘。研究结果发现,DJ-TextRCNN模型在不同视图下的古籍主题推荐任务的准确率均为最优。在“主体-关系-客体”视图下,精确率达到88.54%,初步实现了对古籍文本的精准主题推荐,对中华文化深层次、细粒度的语义挖掘具有一定的指导意义。 展开更多
关键词 数字人文 古籍文本 主题推荐 多视图融合 DJ-textRCNN
下载PDF
基于TextCNN-Attention-BiLSTM融合模型的煤矿隐患文本分类研究
13
作者 罗海平 曾向阳 陈勇 《武汉理工大学学报(信息与管理工程版)》 CAS 2024年第2期299-305,共7页
为实现大量煤矿隐患文本的迅速、精确分类,及时了解安全概况并加以管理。首先,选取安全文库网中多个煤矿隐患数据库为实验数据源,对煤矿隐患文本进行预处理,包括去除噪声词、分词和词向量表示等;其次,利用TextCNN对文本进行卷积操作,提... 为实现大量煤矿隐患文本的迅速、精确分类,及时了解安全概况并加以管理。首先,选取安全文库网中多个煤矿隐患数据库为实验数据源,对煤矿隐患文本进行预处理,包括去除噪声词、分词和词向量表示等;其次,利用TextCNN对文本进行卷积操作,提取不同尺寸的特征表示,再利用BiLSTM模型对得到的特征向量进行时序建模,并结合注意力机制(Attention),从而更好地关注文本中关键信息,捕捉文本全局语义信息;最后,利用全连接层的多标签分类器预测文本隐患类别。实验结果表明:TextCNN-Attention-BiLSTM融合模型在准确率、精确率、召回率和F 1值上均达到92%以上,为煤矿隐患文本分类提供了一种更加准确和有效的解决方案,对煤矿安全管理优化具有重要意义。 展开更多
关键词 煤矿安全 textCNN 注意力机制 BiLSTM 文本分类
下载PDF
基于语义增强模式链接的Text-to-SQL模型
14
作者 吴相岚 肖洋 +1 位作者 刘梦莹 刘明铭 《计算机应用》 CSCD 北大核心 2024年第9期2689-2695,共7页
为优化基于异构图编码器的Text-to-SQL生成效果,提出SELSQL模型。首先,模型采用端到端的学习框架,使用双曲空间下的庞加莱距离度量替代欧氏距离度量,以此优化使用探针技术从预训练语言模型中构建的语义增强的模式链接图;其次,利用K头加... 为优化基于异构图编码器的Text-to-SQL生成效果,提出SELSQL模型。首先,模型采用端到端的学习框架,使用双曲空间下的庞加莱距离度量替代欧氏距离度量,以此优化使用探针技术从预训练语言模型中构建的语义增强的模式链接图;其次,利用K头加权的余弦相似度以及图正则化方法学习相似度度量图使得初始模式链接图在训练中迭代优化;最后,使用改良的关系图注意力网络(RGAT)图编码器以及多头注意力机制对两个模块的联合语义模式链接图进行编码,并且使用基于语法的神经语义解码器和预定义的结构化语言进行结构化查询语言(SQL)语句解码。在Spider数据集上的实验结果表明,使用ELECTRA-large预训练模型时,SELSQL模型比最佳基线模型的准确率提升了2.5个百分点,对于复杂SQL语句生成的提升效果很大。 展开更多
关键词 模式链接 图结构学习 预训练语言模型 text-to-SQL 异构图
下载PDF
Assessing trends in wildland-urban interface fire research through text mining: a comprehensive analysis of published literature
15
作者 Hafsae Lamsaf Asmae Lamsaf +1 位作者 Mounir A.Kerroum Miguel Almeida 《Journal of Forestry Research》 SCIE EI CAS CSCD 2024年第4期102-114,共13页
Research on fires at the wildland-urban inter-face(WUI)has generated significant insights and advance-ments across various fields of study.Environmental,agri-culture,and social sciences have played prominent roles in ... Research on fires at the wildland-urban inter-face(WUI)has generated significant insights and advance-ments across various fields of study.Environmental,agri-culture,and social sciences have played prominent roles in understanding the impacts of fires in the environment,in protecting communities,and addressing management challenges.This study aimed to create a database using a text mining technique for global researchers interested in WUI-projects and highlighting the interest of countries in this field.Author’s-Keywords analysis emphasized the dominance of fire science-related terms,especially related to WUI,and identified keyword clusters related to the WUI fire-risk-assessment-system-“exposure”,“danger”,and“vulnerability”within wildfire research.Trends over the past decade showcase shifting research interests with a growing focus on WUI fires,while regional variations highlighted that the“exposure”keyword cluster received greater atten-tion in the southern Europe and South America.However,vulnerability keywords have relatively a lower representation across all regions.The analysis underscores the interdisci-plinary nature of WUI research and emphasizes the need for targeted approaches to address the unique challenges of the wildland-urban interface.Overall,this study provides valu-able insights for researchers and serves as a foundation for further collaboration in this field through the understanding of the trends over recent years and in different regions. 展开更多
关键词 WUI text mining WILDFIRES Fire science State of the art Scientific publications
下载PDF
From text to image:challenges in integrating vision into ChatGPT for medical image interpretation
16
作者 Shunsuke Koga Wei Du 《Neural Regeneration Research》 SCIE CAS 2025年第2期487-488,共2页
Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive te... Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023). 展开更多
关键词 IMAGE DIAGNOSIS text
下载PDF
Relational Turkish Text Classification Using Distant Supervised Entities and Relations
17
作者 Halil Ibrahim Okur Kadir Tohma Ahmet Sertbas 《Computers, Materials & Continua》 SCIE EI 2024年第5期2209-2228,共20页
Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved throu... Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research. 展开更多
关键词 text classification relation extraction NER distant supervision deep learning machine learning
下载PDF
YOLOv5ST:A Lightweight and Fast Scene Text Detector
18
作者 Yiwei Liu Yingnan Zhao +2 位作者 Yi Chen Zheng Hu Min Xia 《Computers, Materials & Continua》 SCIE EI 2024年第4期909-926,共18页
Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal ... Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal is to enhance inference speed without sacrificing significant detection accuracy,thereby enabling robust performance on resource-constrained devices like drones,closed-circuit television cameras,and other embedded systems.To achieve this,we propose key modifications to the network architecture to lighten the original backbone and improve feature aggregation,including replacing standard convolution with depth-wise convolution,adopting the C2 sequence module in place of C3,employing Spatial Pyramid Pooling Global(SPPG)instead of Spatial Pyramid Pooling Fast(SPPF)and integrating Bi-directional Feature Pyramid Network(BiFPN)into the neck.Experimental results demonstrate a remarkable 26%improvement in inference speed compared to the baseline,with only marginal reductions of 1.6%and 4.2%in mean average precision(mAP)at the intersection over union(IoU)thresholds of 0.5 and 0.5:0.95,respectively.Our work represents a significant advancement in scene text detection,striking a balance between speed and accuracy,making it well-suited for performance-constrained environments. 展开更多
关键词 Scene text detection YOLOv5 LIGHTWEIGHT object detection
下载PDF
Generating Factual Text via Entailment Recognition Task
19
作者 Jinqiao Dai Pengsen Cheng Jiayong Liu 《Computers, Materials & Continua》 SCIE EI 2024年第7期547-565,共19页
Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.Ho... Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.However,existing research predominantly depends on summarizationmodels to offer paragraph-level semantic information for enhancing factual correctness.The challenge lies in effectively generating factual text using sentence-level variational autoencoder-based models.In this paper,a novel model called fact-aware conditional variational autoencoder is proposed to balance the factual correctness and diversity of generated text.Specifically,our model encodes the input sentences and uses them as facts to build a conditional variational autoencoder network.By training a conditional variational autoencoder network,the model is enabled to generate text based on input facts.Building upon this foundation,the input text is passed to the discriminator along with the generated text.By employing adversarial training,the model is encouraged to generate text that is indistinguishable to the discriminator,thereby enhancing the quality of the generated text.To further improve the factual correctness,inspired by the natural language inference system,the entailment recognition task is introduced to be trained together with the discriminator via multi-task learning.Moreover,based on the entailment recognition results,a penalty term is further proposed to reconstruct the loss of our model,forcing the generator to generate text consistent with the facts.Experimental results demonstrate that compared with competitivemodels,ourmodel has achieved substantial improvements in both the quality and factual correctness of the text,despite only sacrificing a small amount of diversity.Furthermore,when considering a comprehensive evaluation of diversity and quality metrics,our model has also demonstrated the best performance. 展开更多
关键词 text generation entailment recognition task natural language processing artificial intelligence
下载PDF
Ensemble Filter-Wrapper Text Feature Selection Methods for Text Classification
20
作者 Oluwaseun Peter Ige Keng Hoon Gan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第11期1847-1865,共19页
Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset’s dimensionality.This involves elim... Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset’s dimensionality.This involves eliminating irrelevant,redundant,and noisy features to streamline the classification process.Various methods,from single feature selection techniques to ensemble filter-wrapper methods,have been used in the literature.Metaheuristic algorithms have become popular due to their ability to handle optimization complexity and the continuous influx of text documents.Feature selection is inherently multi-objective,balancing the enhancement of feature relevance,accuracy,and the reduction of redundant features.This research presents a two-fold objective for feature selection.The first objective is to identify the top-ranked features using an ensemble of three multi-univariate filter methods:Information Gain(Infogain),Chi-Square(Chi^(2)),and Analysis of Variance(ANOVA).This aims to maximize feature relevance while minimizing redundancy.The second objective involves reducing the number of selected features and increasing accuracy through a hybrid approach combining Artificial Bee Colony(ABC)and Genetic Algorithms(GA).This hybrid method operates in a wrapper framework to identify the most informative subset of text features.Support Vector Machine(SVM)was employed as the performance evaluator for the proposed model,tested on two high-dimensional multiclass datasets.The experimental results demonstrated that the ensemble filter combined with the ABC+GA hybrid approach is a promising solution for text feature selection,offering superior performance compared to other existing feature selection algorithms. 展开更多
关键词 Metaheuristic algorithms text classification multi-univariate filter feature selection ensemble filter-wrapper techniques
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部