期刊文献+
共找到31,141篇文章
< 1 2 250 >
每页显示 20 50 100
基于改进TextRank的科技文本关键词抽取方法
1
作者 杨冬菊 胡成富 《计算机应用》 CSCD 北大核心 2024年第6期1720-1726,共7页
针对科技文本关键词抽取任务中抽取出现次数少但能较好表达文本主旨的词语效果差的问题,提出一种基于改进TextRank的关键词抽取方法。首先,利用词语的词频-逆文档频率(TF-IDF)统计特征和位置特征优化共现图中词语间的概率转移矩阵,通过... 针对科技文本关键词抽取任务中抽取出现次数少但能较好表达文本主旨的词语效果差的问题,提出一种基于改进TextRank的关键词抽取方法。首先,利用词语的词频-逆文档频率(TF-IDF)统计特征和位置特征优化共现图中词语间的概率转移矩阵,通过迭代计算得到词语的初始得分;然后,利用K-Core(K-Core decomposition)算法挖掘KCore子图得到词语的层级特征,利用平均信息熵特征衡量词语的主题表征能力;最后,在词语初始得分的基础上融合层级特征和平均信息熵特征,从而确定关键词。实验结果表明,在公开数据集上,与TextRank方法和OTextRank(Optimized TextRank)方法相比,所提方法在抽取不同关键词数量的实验中,F1均值分别提高了6.5和3.3个百分点;在科技服务项目数据集上,与TextRank方法和OTextRank方法相比,所提方法在抽取不同关键词数量的实验中,F1均值分别提高了7.4和3.2个百分点。实验结果验证了所提方法抽取出现频率低但较好表达文本主旨关键词的有效性。 展开更多
关键词 科技文本 关键词抽取 textRank K-Core图 平均信息熵
下载PDF
基于BERT和TextCNN的智能制造成熟度评估方法 被引量:1
2
作者 张淦 袁堂晓 +1 位作者 汪惠芬 柳林燕 《计算机集成制造系统》 EI CSCD 北大核心 2024年第3期852-863,共12页
随着智能制造2025目标的临近,企业为了解自身能力水平纷纷加入到智能制造成熟度评估的行列中。然而,由于智能制造成熟度评估标准的复杂性,企业缺乏其对行业水平的了解,导致企业贸然申请,浪费自身时间的同时又占用大量评估资源。鉴于此,... 随着智能制造2025目标的临近,企业为了解自身能力水平纷纷加入到智能制造成熟度评估的行列中。然而,由于智能制造成熟度评估标准的复杂性,企业缺乏其对行业水平的了解,导致企业贸然申请,浪费自身时间的同时又占用大量评估资源。鉴于此,设计了一种新的评估流程,采用文本处理算法对整个评估过程进行了重构,通过利用国标文件中智能制造成熟度评估标准,将其作为训练集,采用基于预训练语言模型与文本神经网络(BERT+TextCNN)相结合的智能评估算法代替人工评估。在真实的企业智能制造数据集上的验证表明,当BERT+TextCNN评估模型在卷积核为[2,3,4]、迭代次数为6次、学习率为3e-5时,对智能制造成熟度进行评估,准确率达到85.32%。这表明所设计的评估方法能够较准确地帮助企业完成智能制造成熟度自评估,有助于企业了解自身智能制造能力水平,制定正确的发展方向。 展开更多
关键词 智能制造成熟度模型 BERT预训练语言模型 文本卷积神经网络 评估过程重构
下载PDF
基于SWPF2vec和DJ-TextRCNN的古籍文本主题分类研究 被引量:1
3
作者 武帅 杨秀璋 +1 位作者 何琳 公佐权 《情报学报》 CSSCI CSCD 北大核心 2024年第5期601-615,共15页
以编目分类和规则匹配为主的古籍文本主题分类方法存在工作效能低、专家知识依赖性强、分类依据单一化、古籍文本主题自动分类难等问题。对此,本文结合古籍文本内容和文字特征,尝试从古籍内容分类得到符合研究者需求的主题,推动数字人... 以编目分类和规则匹配为主的古籍文本主题分类方法存在工作效能低、专家知识依赖性强、分类依据单一化、古籍文本主题自动分类难等问题。对此,本文结合古籍文本内容和文字特征,尝试从古籍内容分类得到符合研究者需求的主题,推动数字人文研究范式的转型。首先,参照东汉古籍《说文解字》对文字的分析方式,以前期标注的古籍语料数据集为基础,构建全新的“字音(说)-原文(文)-结构(解)-字形(字)”四维特征数据集。其次,设计四维特征向量提取模型(speaking,word,pattern,and font to vector,SWPF2vec),并结合预训练模型实现对古籍文本细粒度的特征表示。再其次,构建融合卷积神经网络、循环神经网络和多头注意力机制的古籍文本主题分类模型(dianji-recurrent convolutional neural networks for text classification,DJ-TextRCNN)。最后,融入四维语义特征,实现对古籍文本多维度、深层次、细粒度的语义挖掘。在古籍文本主题分类任务上,DJ-TextRCNN模型在不同维度特征下的主题分类准确率均为最优,在“说文解字”四维特征下达到76.23%的准确率,初步实现了对古籍文本的精准主题分类。 展开更多
关键词 多维特征融合 古籍文本 主题分类 SWPF2vec DJ-textRCNN
下载PDF
一种利用词典扩展数据库模式信息的Text2SQL方法
4
作者 于晓昕 何东 +2 位作者 叶子铭 陈黎 于中华 《四川大学学报(自然科学版)》 CAS CSCD 北大核心 2024年第1期78-88,共11页
现有Text2SQL方法严重依赖表名和列名在自然语言查询中的显式提及,在同物异名的实际应用场景中准确率急剧下降.此外,这些方法仅仅依赖数据库模式捕捉数据库建模的领域知识,而数据库模式作为结构化的元数据,其表达领域知识的能力是非常... 现有Text2SQL方法严重依赖表名和列名在自然语言查询中的显式提及,在同物异名的实际应用场景中准确率急剧下降.此外,这些方法仅仅依赖数据库模式捕捉数据库建模的领域知识,而数据库模式作为结构化的元数据,其表达领域知识的能力是非常有限的,即使有经验的程序员也很难仅从数据库模式完全领会该数据库建模的领域知识,因此程序员必须依赖详细的数据库设计文档才能构造SQL语句以正确地表达特定的查询.为此,本文提出一种利用词典扩展数据库模式信息的Text2SQL方法,该方法从数据库表名和列名解析出其中的单词或短语,查询词典获取这些单词或短语的语义解释,将这些解释看成是相应表名或列名的扩展内容,与表名、列名及其他数据库模式信息(主键、外键等)相结合,作为模型的输入,从而使模型能够更全面地学习数据库建模的应用领域知识.在Spider-syn和Spider数据集上进行的实验说明了所提出方法的有效性,即使自然语言查询中使用的表名和列名与数据库模式中对应的表名和列名完全不同,本文方法也能够得到较好的SQL翻译结果,明显优于最新提出的抗同义词替换攻击的方法. 展开更多
关键词 数据库模式 语义扩展 解释信息 text2SQL
下载PDF
基于DAN与FastText的藏文短文本分类研究
5
作者 李果 陈晨 +1 位作者 杨进 群诺 《计算机科学》 CSCD 北大核心 2024年第S01期103-107,共5页
随着藏文信息不断融入社会生活,越来越多的藏文短文本数据存在网络平台上。针对传统分类方法在藏文短文本上分类性能低的问题,文中提出了一种基于DAN-FastText的藏文短文本分类模型。该模型使用FastText网络在较大规模的藏文语料上进行... 随着藏文信息不断融入社会生活,越来越多的藏文短文本数据存在网络平台上。针对传统分类方法在藏文短文本上分类性能低的问题,文中提出了一种基于DAN-FastText的藏文短文本分类模型。该模型使用FastText网络在较大规模的藏文语料上进行无监督训练获得预训练的藏文音节向量集,使用预训练的音节向量集将藏文短文本信息转化为音节向量,把音节向量送入DAN(Deep Averaging Networks)网络并在输出阶段融合经过FastText网络训练的句向量特征,最后通过全连接层和softmax层完成分类。在公开的TNCC(Tibetan News Classification Corpus)新闻标题数据集上所提模型的Macro-F1是64.53%,比目前最好评测结果TiBERT模型的Macro-F1得分高出2.81%,比GCN模型的Macro-F1得分高出6.14%,融合模型具有较好的藏文短文本分类效果。 展开更多
关键词 藏文短文本分类 特征融合 深度平均网络 快速文本
下载PDF
基于改进分层注意网络和TextCNN联合建模的暴力犯罪分级算法
6
作者 张家伟 高冠东 +1 位作者 肖珂 宋胜尊 《计算机应用》 CSCD 北大核心 2024年第2期403-410,共8页
为了科学、智能地对服刑人员的暴力倾向分级,将自然语言处理(NLP)中的文本分类方法引入犯罪心理学领域,提出一种基于改进分层注意网络(HAN)与TextCNN(Text Convolutional Neural Network)两通道联合建模的犯罪语义卷积分层注意网络(CCHA... 为了科学、智能地对服刑人员的暴力倾向分级,将自然语言处理(NLP)中的文本分类方法引入犯罪心理学领域,提出一种基于改进分层注意网络(HAN)与TextCNN(Text Convolutional Neural Network)两通道联合建模的犯罪语义卷积分层注意网络(CCHA-Net),通过分别挖掘犯罪事实与服刑人员基本情况的语义信息,完成暴力犯罪气质分级。首先,采用Focal Loss同时替代两通道中的Cross-Entropy函数,优化样本数量不均衡问题。其次,在两通道输入层中,同时引入位置编码,改进对位置信息的感知能力;改进HAN通道,采用最大池化构建显著向量。最后,输出层都采用全局平均池化替代全连接方法,以避免过拟合。实验结果表明,与AC-BiLSTM(Attention-based Bidirectional Long Short-Term Memory with Convolution layer)、支持向量机(SVM)等17种相关基线模型相比,CCHA-Net各项指标均最优,微平均F1(Micro_F1)为99.57%,宏平均和微平均下的曲线下面积(AUC)分别为99.45%和99.89%,相较于次优的AC-BiLSTM提高了4.08、5.59和0.74个百分点,验证了CCHA-Net能有效胜任暴力犯罪气质分级任务。 展开更多
关键词 深度学习 文本分类 卷积神经网络 分层注意网络 暴力犯罪分级 气质类型
下载PDF
树立行业发展新方向——Techtextil&Texprocess 2024亮点回顾
7
作者 张娜 王佳月 赵永霞 《纺织导报》 CAS 2024年第3期41-50,共10页
为期4天的法兰克福国际产业用纺织品及非织造布展览会及国际纺织品及柔性材料缝制加工展览会(Techtextil&Texprocess 2024)吸引了来自全球53个国家和地区的1700家领先企业参展和来自102个国家和地区的38000名观众,展会规模再创新高... 为期4天的法兰克福国际产业用纺织品及非织造布展览会及国际纺织品及柔性材料缝制加工展览会(Techtextil&Texprocess 2024)吸引了来自全球53个国家和地区的1700家领先企业参展和来自102个国家和地区的38000名观众,展会规模再创新高,充分彰显了纺织行业蓬勃的生命力与持续的创新力。 展开更多
关键词 产业用纺织品 纺织行业 柔性材料 国际纺织品 展会规模 发展新方向 text 法兰克福
下载PDF
多视图融合DJ-TextRCNN的古籍文本主题推荐研究
8
作者 武帅 杨秀璋 何琳 《情报学报》 CSSCI CSCD 北大核心 2024年第1期61-75,共15页
传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需... 传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需求的文本主题内容的方法,以推动数字人文研究的进一步发展。首先,选取本课题组前期标注的古籍语料数据进行主题类别标注和视图分类;其次,构建融合BERT(bidirectional encoder representation from transformers)预训练模型、改进卷积神经网络、循环神经网络和多头注意力机制的语义挖掘模型;最后,融入“主体-关系-客体”多视图的语义增强模型,构建DJ-TextRCNN(DianJi-recurrent convolutional neural networks for text classification)模型实现对典籍文本更细粒度、更深层次、更多维度的语义挖掘。研究结果发现,DJ-TextRCNN模型在不同视图下的古籍主题推荐任务的准确率均为最优。在“主体-关系-客体”视图下,精确率达到88.54%,初步实现了对古籍文本的精准主题推荐,对中华文化深层次、细粒度的语义挖掘具有一定的指导意义。 展开更多
关键词 数字人文 古籍文本 主题推荐 多视图融合 DJ-textRCNN
下载PDF
基于TextCNN-Attention-BiLSTM融合模型的煤矿隐患文本分类研究
9
作者 罗海平 曾向阳 陈勇 《武汉理工大学学报(信息与管理工程版)》 CAS 2024年第2期299-305,共7页
为实现大量煤矿隐患文本的迅速、精确分类,及时了解安全概况并加以管理。首先,选取安全文库网中多个煤矿隐患数据库为实验数据源,对煤矿隐患文本进行预处理,包括去除噪声词、分词和词向量表示等;其次,利用TextCNN对文本进行卷积操作,提... 为实现大量煤矿隐患文本的迅速、精确分类,及时了解安全概况并加以管理。首先,选取安全文库网中多个煤矿隐患数据库为实验数据源,对煤矿隐患文本进行预处理,包括去除噪声词、分词和词向量表示等;其次,利用TextCNN对文本进行卷积操作,提取不同尺寸的特征表示,再利用BiLSTM模型对得到的特征向量进行时序建模,并结合注意力机制(Attention),从而更好地关注文本中关键信息,捕捉文本全局语义信息;最后,利用全连接层的多标签分类器预测文本隐患类别。实验结果表明:TextCNN-Attention-BiLSTM融合模型在准确率、精确率、召回率和F 1值上均达到92%以上,为煤矿隐患文本分类提供了一种更加准确和有效的解决方案,对煤矿安全管理优化具有重要意义。 展开更多
关键词 煤矿安全 textCNN 注意力机制 BiLSTM 文本分类
下载PDF
基于语义增强模式链接的Text-to-SQL模型
10
作者 吴相岚 肖洋 +1 位作者 刘梦莹 刘明铭 《计算机应用》 CSCD 北大核心 2024年第9期2689-2695,共7页
为优化基于异构图编码器的Text-to-SQL生成效果,提出SELSQL模型。首先,模型采用端到端的学习框架,使用双曲空间下的庞加莱距离度量替代欧氏距离度量,以此优化使用探针技术从预训练语言模型中构建的语义增强的模式链接图;其次,利用K头加... 为优化基于异构图编码器的Text-to-SQL生成效果,提出SELSQL模型。首先,模型采用端到端的学习框架,使用双曲空间下的庞加莱距离度量替代欧氏距离度量,以此优化使用探针技术从预训练语言模型中构建的语义增强的模式链接图;其次,利用K头加权的余弦相似度以及图正则化方法学习相似度度量图使得初始模式链接图在训练中迭代优化;最后,使用改良的关系图注意力网络(RGAT)图编码器以及多头注意力机制对两个模块的联合语义模式链接图进行编码,并且使用基于语法的神经语义解码器和预定义的结构化语言进行结构化查询语言(SQL)语句解码。在Spider数据集上的实验结果表明,使用ELECTRA-large预训练模型时,SELSQL模型比最佳基线模型的准确率提升了2.5个百分点,对于复杂SQL语句生成的提升效果很大。 展开更多
关键词 模式链接 图结构学习 预训练语言模型 text-to-SQL 异构图
下载PDF
Text-to-SQL文本信息处理技术研究综述 被引量:1
11
作者 彭钰寒 乔少杰 +5 位作者 薛骐 李江敏 谢添丞 徐康镭 冉黎琼 曾少北 《无线电工程》 2024年第5期1053-1062,共10页
信号与信息处理的需求日益增加,离不开数据处理技术,数据处理需要数据库的支持,然而没有经过训练的使用者会因为不熟悉数据库操作产生诸多问题。文本转结构化查询语言(Text to Structured Query Language,Text-to-SQL)的出现,使用户无... 信号与信息处理的需求日益增加,离不开数据处理技术,数据处理需要数据库的支持,然而没有经过训练的使用者会因为不熟悉数据库操作产生诸多问题。文本转结构化查询语言(Text to Structured Query Language,Text-to-SQL)的出现,使用户无需掌握结构化查询语言(Structured Query Language,SQL)也能够熟练操作数据库。介绍Text-to-SQL的研究背景及面临的挑战;介绍Text-to-SQL关键技术、基准数据集、模型演变及最新研究进展,关键技术包括Transformer等主流技术,用于模型训练的基准数据集包括WikiSQL和Spider;介绍Text-to-SQL不同阶段模型的特点,详细阐述Text-to-SQL最新研究成果的工作原理,包括模型构建、解析器设计及数据集生成;总结Text-to-SQL未来的发展方向及研究重点。 展开更多
关键词 文本转结构化查询语言 解析器 文本信息处理 数据库 深度学习
下载PDF
CVTD: A Robust Car-Mounted Video Text Detector
12
作者 Di Zhou Jianxun Zhang +2 位作者 Chao Li Yifan Guo Bowen Li 《Computers, Materials & Continua》 SCIE EI 2024年第2期1821-1842,共22页
Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted vid... Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios. 展开更多
关键词 Deep learning text detection Car-mounted video text detector intelligent driving assistance arbitrary shape text detector
下载PDF
Identifying multidisciplinary problems from scientific publications based on a text generation method
13
作者 Ziyan Xu Hongqi Han +2 位作者 Linna Li Junsheng Zhang Zexu Zhou 《Journal of Data and Information Science》 CSCD 2024年第3期213-237,共25页
Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the... Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques. 展开更多
关键词 Problem identification MULTIDISCIPLINARY text generation text classification
下载PDF
Assessing trends in wildland-urban interface fire research through text mining: a comprehensive analysis of published literature
14
作者 Hafsae Lamsaf Asmae Lamsaf +1 位作者 Mounir A.Kerroum Miguel Almeida 《Journal of Forestry Research》 SCIE EI CAS CSCD 2024年第4期102-114,共13页
Research on fires at the wildland-urban inter-face(WUI)has generated significant insights and advance-ments across various fields of study.Environmental,agri-culture,and social sciences have played prominent roles in ... Research on fires at the wildland-urban inter-face(WUI)has generated significant insights and advance-ments across various fields of study.Environmental,agri-culture,and social sciences have played prominent roles in understanding the impacts of fires in the environment,in protecting communities,and addressing management challenges.This study aimed to create a database using a text mining technique for global researchers interested in WUI-projects and highlighting the interest of countries in this field.Author’s-Keywords analysis emphasized the dominance of fire science-related terms,especially related to WUI,and identified keyword clusters related to the WUI fire-risk-assessment-system-“exposure”,“danger”,and“vulnerability”within wildfire research.Trends over the past decade showcase shifting research interests with a growing focus on WUI fires,while regional variations highlighted that the“exposure”keyword cluster received greater atten-tion in the southern Europe and South America.However,vulnerability keywords have relatively a lower representation across all regions.The analysis underscores the interdisci-plinary nature of WUI research and emphasizes the need for targeted approaches to address the unique challenges of the wildland-urban interface.Overall,this study provides valu-able insights for researchers and serves as a foundation for further collaboration in this field through the understanding of the trends over recent years and in different regions. 展开更多
关键词 WUI text mining WILDFIRES Fire science State of the art Scientific publications
下载PDF
From text to image:challenges in integrating vision into ChatGPT for medical image interpretation
15
作者 Shunsuke Koga Wei Du 《Neural Regeneration Research》 SCIE CAS 2025年第2期487-488,共2页
Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive te... Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023). 展开更多
关键词 IMAGE DIAGNOSIS text
下载PDF
Relational Turkish Text Classification Using Distant Supervised Entities and Relations
16
作者 Halil Ibrahim Okur Kadir Tohma Ahmet Sertbas 《Computers, Materials & Continua》 SCIE EI 2024年第5期2209-2228,共20页
Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved throu... Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research. 展开更多
关键词 text classification relation extraction NER distant supervision deep learning machine learning
下载PDF
YOLOv5ST:A Lightweight and Fast Scene Text Detector
17
作者 Yiwei Liu Yingnan Zhao +2 位作者 Yi Chen Zheng Hu Min Xia 《Computers, Materials & Continua》 SCIE EI 2024年第4期909-926,共18页
Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal ... Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal is to enhance inference speed without sacrificing significant detection accuracy,thereby enabling robust performance on resource-constrained devices like drones,closed-circuit television cameras,and other embedded systems.To achieve this,we propose key modifications to the network architecture to lighten the original backbone and improve feature aggregation,including replacing standard convolution with depth-wise convolution,adopting the C2 sequence module in place of C3,employing Spatial Pyramid Pooling Global(SPPG)instead of Spatial Pyramid Pooling Fast(SPPF)and integrating Bi-directional Feature Pyramid Network(BiFPN)into the neck.Experimental results demonstrate a remarkable 26%improvement in inference speed compared to the baseline,with only marginal reductions of 1.6%and 4.2%in mean average precision(mAP)at the intersection over union(IoU)thresholds of 0.5 and 0.5:0.95,respectively.Our work represents a significant advancement in scene text detection,striking a balance between speed and accuracy,making it well-suited for performance-constrained environments. 展开更多
关键词 Scene text detection YOLOv5 LIGHTWEIGHT object detection
下载PDF
Leveraging Uncertainty for Depth-Aware Hierarchical Text Classification
18
作者 Zixuan Wu Ye Wang +2 位作者 Lifeng Shen Feng Hu Hong Yu 《Computers, Materials & Continua》 SCIE EI 2024年第9期4111-4127,共17页
Hierarchical Text Classification(HTC)aims to match text to hierarchical labels.Existing methods overlook two critical issues:first,some texts cannot be fully matched to leaf node labels and need to be classified to th... Hierarchical Text Classification(HTC)aims to match text to hierarchical labels.Existing methods overlook two critical issues:first,some texts cannot be fully matched to leaf node labels and need to be classified to the correct parent node instead of treating leaf nodes as the final classification target.Second,error propagation occurs when a misclassification at a parent node propagates down the hierarchy,ultimately leading to inaccurate predictions at the leaf nodes.To address these limitations,we propose an uncertainty-guided HTC depth-aware model called DepthMatch.Specifically,we design an early stopping strategy with uncertainty to identify incomplete matching between text and labels,classifying them into the corresponding parent node labels.This approach allows us to dynamically determine the classification depth by leveraging evidence to quantify and accumulate uncertainty.Experimental results show that the proposed DepthMatch outperforms recent strong baselines on four commonly used public datasets:WOS(Web of Science),RCV1-V2(Reuters Corpus Volume I),AAPD(Arxiv Academic Paper Dataset),and BGC.Notably,on the BGC dataset,it improvesMicro-F1 andMacro-F1 scores by at least 1.09%and 1.74%,respectively. 展开更多
关键词 Hierarchical text classification incomplete text-label matching UNCERTAINTY depth-aware early stopping strategy
下载PDF
Generating Factual Text via Entailment Recognition Task
19
作者 Jinqiao Dai Pengsen Cheng Jiayong Liu 《Computers, Materials & Continua》 SCIE EI 2024年第7期547-565,共19页
Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.Ho... Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.However,existing research predominantly depends on summarizationmodels to offer paragraph-level semantic information for enhancing factual correctness.The challenge lies in effectively generating factual text using sentence-level variational autoencoder-based models.In this paper,a novel model called fact-aware conditional variational autoencoder is proposed to balance the factual correctness and diversity of generated text.Specifically,our model encodes the input sentences and uses them as facts to build a conditional variational autoencoder network.By training a conditional variational autoencoder network,the model is enabled to generate text based on input facts.Building upon this foundation,the input text is passed to the discriminator along with the generated text.By employing adversarial training,the model is encouraged to generate text that is indistinguishable to the discriminator,thereby enhancing the quality of the generated text.To further improve the factual correctness,inspired by the natural language inference system,the entailment recognition task is introduced to be trained together with the discriminator via multi-task learning.Moreover,based on the entailment recognition results,a penalty term is further proposed to reconstruct the loss of our model,forcing the generator to generate text consistent with the facts.Experimental results demonstrate that compared with competitivemodels,ourmodel has achieved substantial improvements in both the quality and factual correctness of the text,despite only sacrificing a small amount of diversity.Furthermore,when considering a comprehensive evaluation of diversity and quality metrics,our model has also demonstrated the best performance. 展开更多
关键词 text generation entailment recognition task natural language processing artificial intelligence
下载PDF
Ensemble Filter-Wrapper Text Feature Selection Methods for Text Classification
20
作者 Oluwaseun Peter Ige Keng Hoon Gan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第11期1847-1865,共19页
Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset’s dimensionality.This involves elim... Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset’s dimensionality.This involves eliminating irrelevant,redundant,and noisy features to streamline the classification process.Various methods,from single feature selection techniques to ensemble filter-wrapper methods,have been used in the literature.Metaheuristic algorithms have become popular due to their ability to handle optimization complexity and the continuous influx of text documents.Feature selection is inherently multi-objective,balancing the enhancement of feature relevance,accuracy,and the reduction of redundant features.This research presents a two-fold objective for feature selection.The first objective is to identify the top-ranked features using an ensemble of three multi-univariate filter methods:Information Gain(Infogain),Chi-Square(Chi^(2)),and Analysis of Variance(ANOVA).This aims to maximize feature relevance while minimizing redundancy.The second objective involves reducing the number of selected features and increasing accuracy through a hybrid approach combining Artificial Bee Colony(ABC)and Genetic Algorithms(GA).This hybrid method operates in a wrapper framework to identify the most informative subset of text features.Support Vector Machine(SVM)was employed as the performance evaluator for the proposed model,tested on two high-dimensional multiclass datasets.The experimental results demonstrated that the ensemble filter combined with the ABC+GA hybrid approach is a promising solution for text feature selection,offering superior performance compared to other existing feature selection algorithms. 展开更多
关键词 Metaheuristic algorithms text classification multi-univariate filter feature selection ensemble filter-wrapper techniques
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部