期刊文献+
共找到818篇文章
< 1 2 41 >
每页显示 20 50 100
A Study on Short Text Matching Method Based on KS-BERT Algorithm
1
作者 YANG Hao-wen SUN Mei-feng 《印刷与数字媒体技术研究》 CAS 北大核心 2024年第5期164-173,共10页
To improve the accuracy of short text matching,a short text matching method with knowledge and structure enhancement for BERT(KS-BERT)was proposed in this study.This method first introduced external knowledge to the i... To improve the accuracy of short text matching,a short text matching method with knowledge and structure enhancement for BERT(KS-BERT)was proposed in this study.This method first introduced external knowledge to the input text,and then sent the expanded text to both the context encoder BERT and the structure encoder GAT to capture the contextual relationship features and structural features of the input text.Finally,the match was determined based on the fusion result of the two features.Experiment results based on the public datasets BQ_corpus and LCQMC showed that KS-BERT outperforms advanced models such as ERNIE 2.0.This Study showed that knowledge enhancement and structure enhancement are two effective ways to improve BERT in short text matching.In BQ_corpus,ACC was improved by 0.2%and 0.3%,respectively,while in LCQMC,ACC was improved by 0.4%and 0.9%,respectively. 展开更多
关键词 Deep learning short text matching Graph attention network Knowledge enhancement
下载PDF
基于DAN与FastText的藏文短文本分类研究
2
作者 李果 陈晨 +1 位作者 杨进 群诺 《计算机科学》 CSCD 北大核心 2024年第S01期103-107,共5页
随着藏文信息不断融入社会生活,越来越多的藏文短文本数据存在网络平台上。针对传统分类方法在藏文短文本上分类性能低的问题,文中提出了一种基于DAN-FastText的藏文短文本分类模型。该模型使用FastText网络在较大规模的藏文语料上进行... 随着藏文信息不断融入社会生活,越来越多的藏文短文本数据存在网络平台上。针对传统分类方法在藏文短文本上分类性能低的问题,文中提出了一种基于DAN-FastText的藏文短文本分类模型。该模型使用FastText网络在较大规模的藏文语料上进行无监督训练获得预训练的藏文音节向量集,使用预训练的音节向量集将藏文短文本信息转化为音节向量,把音节向量送入DAN(Deep Averaging Networks)网络并在输出阶段融合经过FastText网络训练的句向量特征,最后通过全连接层和softmax层完成分类。在公开的TNCC(Tibetan News Classification Corpus)新闻标题数据集上所提模型的Macro-F1是64.53%,比目前最好评测结果TiBERT模型的Macro-F1得分高出2.81%,比GCN模型的Macro-F1得分高出6.14%,融合模型具有较好的藏文短文本分类效果。 展开更多
关键词 藏文短文本分类 特征融合 深度平均网络 快速文本
下载PDF
Convolutional Deep Belief Network Based Short Text Classification on Arabic Corpus
3
作者 Abdelwahed Motwakel Badriyya B.Al-onazi +5 位作者 Jaber S.Alzahrani Radwa Marzouk Amira Sayed A.Aziz Abu Sarwar Zamani Ishfaq Yaseen Amgad Atta Abdelmageed1 《Computer Systems Science & Engineering》 SCIE EI 2023年第6期3097-3113,共17页
With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 mi... With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 million tweets every day.In order to develop a classification system for the Arabic lan-guage there comes a need of understanding the syntactic framework of the words thereby manipulating and representing the words for making their classification effective.In this view,this article introduces a Dolphin Swarm Optimization with Convolutional Deep Belief Network for Short Text Classification(DSOCDBN-STC)model on Arabic Corpus.The presented DSOCDBN-STC model majorly aims to classify Arabic short text in social media.The presented DSOCDBN-STC model encompasses preprocessing and word2vec word embedding at the preliminary stage.Besides,the DSOCDBN-STC model involves CDBN based classification model for Arabic short text.At last,the DSO technique can be exploited for optimal modification of the hyperparameters related to the CDBN method.To establish the enhanced performance of the DSOCDBN-STC model,a wide range of simulations have been performed.The simulation results con-firmed the supremacy of the DSOCDBN-STC model over existing models with improved accuracy of 99.26%. 展开更多
关键词 Arabic text short text classification dolphin swarm optimization deep learning
下载PDF
Short-Term Memory Capacity across Time and Language Estimated from Ancient and Modern Literary Texts. Study-Case: New Testament Translations
4
作者 Emilio Matricciani 《Open Journal of Statistics》 2023年第3期379-403,共25页
We study the short-term memory capacity of ancient readers of the original New Testament written in Greek, of its translations to Latin and to modern languages. To model it, we consider the number of words between any... We study the short-term memory capacity of ancient readers of the original New Testament written in Greek, of its translations to Latin and to modern languages. To model it, we consider the number of words between any two contiguous interpunctions I<sub>p</sub>, because this parameter can model how the human mind memorizes “chunks” of information. Since I<sub>P</sub> can be calculated for any alphabetical text, we can perform experiments—otherwise impossible— with ancient readers by studying the literary works they used to read. The “experiments” compare the I<sub>P</sub> of texts of a language/translation to those of another language/translation by measuring the minimum average probability of finding joint readers (those who can read both texts because of similar short-term memory capacity) and by defining an “overlap index”. We also define the population of universal readers, people who can read any New Testament text in any language. Future work is vast, with many research tracks, because alphabetical literatures are very large and allow many experiments, such as comparing authors, translations or even texts written by artificial intelligence tools. 展开更多
关键词 Alphabetical Languages Artificial Intelligence Writing GREEK LATIN New Testament Readers Overlap Probability short-Term Memory Capacity textS Translation Words Interval
下载PDF
Research of Collaborative Filtering Recommendation Algorithm for Short Text 被引量:2
5
作者 Chunxu Chao Shouning Qu Tao Du 《Journal of Computer and Communications》 2014年第14期59-66,共8页
Short text, based on the platform of web2.0, gained rapid development in a relatively short time. Recommendation systems analyzing user’s interest by short texts becomes more and more important. Collaborative filteri... Short text, based on the platform of web2.0, gained rapid development in a relatively short time. Recommendation systems analyzing user’s interest by short texts becomes more and more important. Collaborative filtering is one of the most promising recommendation technologies. However, the existing collaborative filtering methods don’t consider the drifting of user’s interest. This often leads to a big difference between the result of recommendation and user’s real demands. In this paper, according to the traditional collaborative filtering algorithm, a new personalized recommendation algorithm is proposed. It traced user’s interest by using Ebbinghaus Forgetting Curve. Some experiments have been done. The results demonstrated that the new algorithm could indeed make a contribution to getting rid of user’s overdue interests and discovering their real-time interests for more accurate recommendation. 展开更多
关键词 short text PERSONALIZED RECOMMENDATION Time WEIGHT FUNCTION
下载PDF
Effective short text classification via the fusion of hybrid features for IoT social data 被引量:3
6
作者 Xiong Luo Zhijian Yu +2 位作者 Zhigang Zhao Wenbing Zhao Jenq-Haur Wang 《Digital Communications and Networks》 SCIE CSCD 2022年第6期942-954,共13页
Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Prev... Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Previous studies mainly tackle these problems by enhancing the semantic information or the statistical information individually. However, the improvement achieved by a single type of information is limited, while fusing various information may help to improve the classification accuracy more effectively. To fuse various information for short text classification, this article proposes a feature fusion method that integrates the statistical feature and the comprehensive semantic feature together by using the weighting mechanism and deep learning models. In the proposed method, we apply Bidirectional Encoder Representations from Transformers (BERT) to generate word vectors on the sentence level automatically, and then obtain the statistical feature, the local semantic feature and the overall semantic feature using Term Frequency-Inverse Document Frequency (TF-IDF) weighting approach, Convolutional Neural Network (CNN) and Bidirectional Gate Recurrent Unit (BiGRU). Then, the fusion feature is accordingly obtained for classification. Experiments are conducted on five popular short text classification datasets and a 5G-enabled IoT social dataset and the results show that our proposed method effectively improves the classification performance. 展开更多
关键词 Information fusion short text classi fication BERT Bidirectional encoder representations fr 0om transformers Deep learning Social data
下载PDF
Short Text Classification Based on Improved ITC 被引量:1
7
作者 Liangliang Li Shouning Qu 《Journal of Computer and Communications》 2013年第4期22-27,共6页
The long text classification has got great achievements, but short text classification still needs to be perfected. In this paper, at first, we describe why we select the ITC feature selection algorithm not the conven... The long text classification has got great achievements, but short text classification still needs to be perfected. In this paper, at first, we describe why we select the ITC feature selection algorithm not the conventional TFIDF and the superiority of the ITC compared with the TFIDF, then we conclude the flaws of the conventional ITC algorithm, and then we present an improved ITC feature selection algorithm based on the characteristics of short text classification while combining the concepts of the Documents Distribution Entropy with the Position Distribution Weight. The improved ITC algorithm conforms to the actual situation of the short text classification. The experimental results show that the performance based on the new algorithm was much better than that based on the traditional TFIDF and ITC. 展开更多
关键词 ITC text CLASSIFICATION short text
下载PDF
Sentiment Analysis of Short Texts Based on Parallel DenseNet 被引量:1
8
作者 Luqi Yan Jin Han +2 位作者 Yishi Yue Liu Zhang Yannan Qian 《Computers, Materials & Continua》 SCIE EI 2021年第10期51-65,共15页
Text sentiment analysis is a common problem in the field of natural language processing that is often resolved by using convolutional neural networks(CNNs).However,most of these CNN models focus only on learning local... Text sentiment analysis is a common problem in the field of natural language processing that is often resolved by using convolutional neural networks(CNNs).However,most of these CNN models focus only on learning local features while ignoring global features.In this paper,based on traditional densely connected convolutional networks(DenseNet),a parallel DenseNet is proposed to realize sentiment analysis of short texts.First,this paper proposes two novel feature extraction blocks that are based on DenseNet and a multiscale convolutional neural network.Second,this paper solves the problem of ignoring global features in traditional CNN models by combining the original features with features extracted by the parallel feature extraction block,and then sending the combined features into the final classifier.Last,a model based on parallel DenseNet that is capable of simultaneously learning both local and global features of short texts and shows better performance on six different databases compared to other basic models is proposed. 展开更多
关键词 Sentiment analysis short texts parallel DenseNet
下载PDF
Falcon: A Novel Chinese Short Text Classification Method
9
作者 Haiming Li Haining Huang +1 位作者 Xiang Cao Jingu Qian 《Journal of Computer and Communications》 2018年第11期216-226,共11页
For natural language processing problems, the short text classification is still a research hot topic, with obviously problem in the features sparse, high-dimensional text data and feature representation. In order to ... For natural language processing problems, the short text classification is still a research hot topic, with obviously problem in the features sparse, high-dimensional text data and feature representation. In order to express text directly, a simple but new variation which employs one-hot with low-dimension was proposed. In this paper, a Densenet-based model was proposed to short text classification. Furthermore, the feature diversity and reuse were implemented by the concat and average shuffle operation between Resnet and Densenet for enlarging short text feature selection. Finally, some benchmarks were introduced to evaluate the Falcon. From our experimental results, the Falcon method obtained significant improvements in the state-of-art models on most of them in all respects, especially in the first experiment of error rate. To sum up, the Falcon is an efficient and economical model, whilst requiring less computation to achieve high performance. 展开更多
关键词 short text Classification Word VECTOR Representation One-Hot Densenet NETWORKS Convolutional Neural NETWORKS
下载PDF
An Unsupervised Method for Short-Text Sentiment Analysis Based on Analysis of Massive Data
10
作者 Zhenhua Huang Zhenrong Zhao +1 位作者 Qiong Liu Zhenyu Wang 《国际计算机前沿大会会议论文集》 2015年第1期49-50,共2页
Common forms of short text are microblogs, Twitter posts, short product reviews, short movie reviews and instant messages. Sentiment analysis of them has been a hot topic. A highly-accurate model is proposed in this p... Common forms of short text are microblogs, Twitter posts, short product reviews, short movie reviews and instant messages. Sentiment analysis of them has been a hot topic. A highly-accurate model is proposed in this paper for short-text sentiment analysis. The researches target microblog, product review and movie reviews. Words, symbols or sentences with emotional tendencies are proved important indicators in short-text sentiment analysis based on massive users’ data. It is an effective method to predict emotional tendencies of short text using these features. The model has noticed the phenomenon of polysemy in single-character emotional word in Chinese and discusses singlecharacter and multi-character emotional word separately. The idea of model can be used to deal with various kinds of short-text data. Experiments show that this model performs well in most cases. 展开更多
关键词 SENTIMENT ANALYSIS short text EMOTIONAL WORDS MASSIVE data
下载PDF
Text Extraction with Optimal Bi-LSTM
11
作者 Bahera H.Nayef Siti Norul Huda Sheikh Abdullah +1 位作者 Rossilawati Sulaiman Ashwaq Mukred Saeed 《Computers, Materials & Continua》 SCIE EI 2023年第9期3549-3567,共19页
Text extraction from images using the traditional techniques of image collecting,and pattern recognition using machine learning consume time due to the amount of extracted features from the images.Deep Neural Networks... Text extraction from images using the traditional techniques of image collecting,and pattern recognition using machine learning consume time due to the amount of extracted features from the images.Deep Neural Networks introduce effective solutions to extract text features from images using a few techniques and the ability to train large datasets of images with significant results.This study proposes using Dual Maxpooling and concatenating convolution Neural Networks(CNN)layers with the activation functions Relu and the Optimized Leaky Relu(OLRelu).The proposed method works by dividing the word image into slices that contain characters.Then pass them to deep learning layers to extract feature maps and reform the predicted words.Bidirectional Short Memory(BiLSTM)layers extractmore compelling features and link the time sequence fromforward and backward directions during the training phase.The Connectionist Temporal Classification(CTC)function calcifies the training and validation loss rates.In addition to decoding the extracted feature to reform characters again and linking them according to their time sequence.The proposed model performance is evaluated using training and validation loss errors on the Mjsynth and Integrated Argument Mining Tasks(IAM)datasets.The result of IAM was 2.09%for the average loss errors with the proposed dualMaxpooling and OLRelu.In the Mjsynth dataset,the best validation loss rate shrunk to 2.2%by applying concatenating CNN layers,and Relu. 展开更多
关键词 Deep neural network text features dual max-pooling concatenating convolution neural networks bidirectional long short memory text connector characteristics
下载PDF
基于字词向量融合的民航智慧监管短文本分类 被引量:1
12
作者 王欣 干镞锐 +2 位作者 许雅玺 史珂 郑涛 《中国安全科学学报》 CAS CSCD 北大核心 2024年第2期37-44,共8页
为解决民航监管事项所产生的检查记录仅依靠人工进行分类分析导致效率低的问题,提出一种基于数据增强与字词向量融合的双通道特征提取的短文本分类模型,探讨民航监管事项的分类,包括与人、设备设施环境、制度程序和机构职责等相关问题... 为解决民航监管事项所产生的检查记录仅依靠人工进行分类分析导致效率低的问题,提出一种基于数据增强与字词向量融合的双通道特征提取的短文本分类模型,探讨民航监管事项的分类,包括与人、设备设施环境、制度程序和机构职责等相关问题。为解决类别不平衡问题,采用数据增强算法在原始文本上进行变换,生成新的样本,使各个类别的样本数量更加均衡。将字向量和词向量按字融合拼接,得到具有词特征信息的字向量。将字词融合的向量分别送入到文本卷积神经网络(TextCNN)和双向长短期记忆(BiLSTM)模型中进行不同维度的特征提取,从局部的角度和全局的角度分别提取特征,并在民航监管事项检查记录数据集上进行试验。结果表明:该模型准确率为0.9837,F 1值为0.9836。与一些字嵌入模型和词嵌入模型相对比,准确率提升0.4%。和一些常用的单通道模型相比,准确率提升3%,验证了双通道模型提取的特征具有全面性和有效性。 展开更多
关键词 字词向量融合 民航监管 短文本 文本卷积神经网络(textCNN) 双向长短期记忆(BiLSTM)
下载PDF
金融科技赋能下供应链金融对企业价值的影响 被引量:2
13
作者 成程 杨胜刚 田轩 《管理科学学报》 CSSCI CSCD 北大核心 2024年第2期95-119,共25页
发展供应链金融对于深化金融供给侧结构性改革,增强金融服务实体经济具有不容忽视的重要战略意义.本文通过对2007年-2019年中国A股上市公司全部320.8万篇公告的文本数据信息进行收集整理,实证检验了上市公司发展供应链金融业务对企业价... 发展供应链金融对于深化金融供给侧结构性改革,增强金融服务实体经济具有不容忽视的重要战略意义.本文通过对2007年-2019年中国A股上市公司全部320.8万篇公告的文本数据信息进行收集整理,实证检验了上市公司发展供应链金融业务对企业价值的影响.研究结果表明:发展供应链金融在短期可以促进企业股价的上涨,在长期可以促进企业价值的提升,并且上市公司采用的供应链金融业务越多,公告中提到供应链金融词汇越频繁,这种提升效果越显著.发展供应链金融业务可以通过信号传递效应、风险承担效应、系统管理效应促进企业价值提高.在供应链金融相关公告中提到金融科技词汇的词频越高,上市公司获得的企业价值提升效果越强.金融科技可以为股票流动性较低、风险承担较弱的企业发展供应链金融业务起到更强的赋能效果.在进行一系列稳健性检验之后,上述结论依然成立.本文不仅为中国公司发展供应链金融业务提高企业价值,提供了基于公开市场数据大样本的实证证据,也对国家和地方政府支持供应链金融的发展具有现实启示。 展开更多
关键词 供应链金融 企业价值 文本分析 短期市场反应 金融科技
下载PDF
基于BERT-BiLSTM-CRF模型的畜禽疫病文本分词研究 被引量:2
14
作者 余礼根 郭晓利 +3 位作者 赵红涛 杨淦 张俊 李奇峰 《农业机械学报》 EI CAS CSCD 北大核心 2024年第2期287-294,共8页
针对畜禽疫病文本语料匮乏、文本内包含大量疫病名称及短语等未登录词问题,提出了一种结合词典匹配的BERT-BiLSTM-CRF畜禽疫病文本分词模型。以羊疫病为研究对象,构建了常见疫病文本数据集,将其与通用语料PKU结合,利用BERT(Bidirectiona... 针对畜禽疫病文本语料匮乏、文本内包含大量疫病名称及短语等未登录词问题,提出了一种结合词典匹配的BERT-BiLSTM-CRF畜禽疫病文本分词模型。以羊疫病为研究对象,构建了常见疫病文本数据集,将其与通用语料PKU结合,利用BERT(Bidirectional encoder representation from transformers)预训练语言模型进行文本向量化表示;通过双向长短时记忆网络(Bidirectional long short-term memory network,BiLSTM)获取上下文语义特征;由条件随机场(Conditional random field,CRF)输出全局最优标签序列。基于此,在CRF层后加入畜禽疫病领域词典进行分词匹配修正,减少在分词过程中出现的疫病名称及短语等造成的歧义切分,进一步提高了分词准确率。实验结果表明,结合词典匹配的BERT-BiLSTM-CRF模型在羊常见疫病文本数据集上的F1值为96.38%,与jieba分词器、BiLSTM-Softmax模型、BiLSTM-CRF模型、未结合词典匹配的本文模型相比,分别提升11.01、10.62、8.3、0.72个百分点,验证了方法的有效性。与单一语料相比,通用语料PKU和羊常见疫病文本数据集结合的混合语料,能够同时对畜禽疫病专业术语及疫病文本中常用词进行准确切分,在通用语料及疫病文本数据集上F1值都达到95%以上,具有较好的模型泛化能力。该方法可用于畜禽疫病文本分词。 展开更多
关键词 畜禽疫病 文本分词 预训练语言模型 双向长短时记忆网络 条件随机场
下载PDF
面向短文本的增强上下文神经主题模型
15
作者 刘刚 王同礼 +2 位作者 唐宏伟 战凯 杨雯莉 《计算机工程与应用》 CSCD 北大核心 2024年第1期154-164,共11页
目前的主题模型大多数基于自身文本的词共现信息进行建模,并没有引入主题的稀疏约束来提升模型的主题抽取能力,此外短文本本身存在词共现稀疏的问题,该问题严重影响了短文本主题建模的准确性。针对以上问题,提出了一种增强上下文神经主... 目前的主题模型大多数基于自身文本的词共现信息进行建模,并没有引入主题的稀疏约束来提升模型的主题抽取能力,此外短文本本身存在词共现稀疏的问题,该问题严重影响了短文本主题建模的准确性。针对以上问题,提出了一种增强上下文神经主题模型(enhanced context neural topic model,ECNTM)。ECNTM基于主题控制器对主题进行稀疏性约束,过滤掉不相关的主题,同时模型的输入变成BOW向量和SBERT句子嵌入的拼接,在高斯解码器中,通过在嵌入空间中将单词上的主题分布处理为多元高斯分布或高斯混合分布,显式地丰富了短文本有限的上下文信息,解决了短文本词共现特征稀疏问题。在WS、Reuters、KOS、20 NewsGroups四个公开数据集上的实验结果表明,该模型在困惑度、主题一致性以及文本分类准确率上相较基准模型均有明显提升,证明了引入主题稀疏约束特性以及丰富的上下文信息到短文本主题建模的有效性。 展开更多
关键词 神经主题模型 短文本 稀疏约束 变分自编码器 主题建模
下载PDF
融合多尺度CNN与双向LSTM的唐卡问句分类模型
16
作者 王铁君 闫悦 +2 位作者 郭晓然 王铠杰 饶强 《科学技术与工程》 北大核心 2024年第22期9490-9497,共8页
当前大语言模型的兴起为自然语言处理、搜索引擎、生命科学研究等领域的研究者提供了新思路,但大语言模型存在资源消耗高、推理速度慢,难以在工业场景尤其是垂直领域应用等方面的缺点。针对这一问题,提出了一种多尺度卷积神经网络(convo... 当前大语言模型的兴起为自然语言处理、搜索引擎、生命科学研究等领域的研究者提供了新思路,但大语言模型存在资源消耗高、推理速度慢,难以在工业场景尤其是垂直领域应用等方面的缺点。针对这一问题,提出了一种多尺度卷积神经网络(convolutional neural network,CNN)与双向长短期记忆神经网络(long short term memory,LSTM)融合的唐卡问句分类模型,本文模型将数据的全局特征与局部特征进行融合实现唐卡问句分类任务,全局特征反映数据的本质特点,局部特征关注数据中易被忽视的部分,将二者以拼接的方式融合以丰富句子的特征表示。通过在Thangka数据集与THUCNews数据集上进行实验,结果表明,本文模型相较于Bert模型在精确度上略优,在训练时间上缩短了1/20,运算推理时间缩短了1/3。在公开数据集上的实验表明,本文模型在文本分类任务上也表现出了较好的适用性和有效性。 展开更多
关键词 文本分类 长短期记忆 多尺度卷积神经网络 唐卡
下载PDF
基于链接关系预测的弯曲密集型商品文本检测
17
作者 耿磊 李嘉琛 +2 位作者 刘彦北 李月龙 李晓捷 《天津工业大学学报》 CAS 北大核心 2024年第4期50-59,74,共11页
针对商品包装文本检测任务中弯曲密集型文本导致的错检、漏检问题,提出了一种由2个子网络组成的基于链接关系预测的文本检测框架(text detection network based on relational prediction,RPTNet)。在文本组件检测网络中,下采样采用卷... 针对商品包装文本检测任务中弯曲密集型文本导致的错检、漏检问题,提出了一种由2个子网络组成的基于链接关系预测的文本检测框架(text detection network based on relational prediction,RPTNet)。在文本组件检测网络中,下采样采用卷积神经网络和自注意力并行的双分支结构提取局部和全局特征,并加入空洞特征增强模块(DFM)减少深层特征图在降维过程中信息的丢失;上采样采用特征金字塔与多级注意力融合模块(MAFM)相结合的方式进行多级特征融合以增强文本特征间的潜在联系,通过文本检测器从上采样输出的特征图中检测文本组件;在链接关系预测网络中,采用基于图卷积网络的关系推理框架预测文本组件间的深层相似度,采用双向长短时记忆网络将文本组件聚合为文本实例。为验证RRNet的检测性能,构建了一个由商品包装图片组成的文本检测数据集(text detection dataset composed of commodity packaging,CPTD1500)。实验结果表明:RPTNet不仅在公开文本数据集CTW-1500和Total-Text上取得了优异的性能,而且在CPTD1500数据集上的召回率和F值分别达到了85.4%和87.5%,均优于当前主流算法。 展开更多
关键词 文本检测 卷积神经网络 自注意力 特征融合 图卷积网络 双向长短时记忆网络
下载PDF
基于词-主题-文本异质网络的短文本分类方法
18
作者 徐涛 赵星甲 卢敏 《计算机应用与软件》 北大核心 2024年第1期146-152,182,共8页
针对现有分类方法未考虑长距离词的语义相关性和文本间潜在主题共享的问题,提出一种基于词-主题-文本异质网络(WTDHN)的短文本分类方法。通过Word2vec训练词的上下文语义向量;构建词相关性矩阵以充足的词共现信息增强短文本各级别语义学... 针对现有分类方法未考虑长距离词的语义相关性和文本间潜在主题共享的问题,提出一种基于词-主题-文本异质网络(WTDHN)的短文本分类方法。通过Word2vec训练词的上下文语义向量;构建词相关性矩阵以充足的词共现信息增强短文本各级别语义学;构建以词、主题和文本为节点的异质网络,并采用图卷积学习节点之间的高阶邻域信息,丰富短文本语义。相较于基准分类模型,该方法在五个公开短文本数据集上的分类准确率平均提高1.56%。 展开更多
关键词 词-主题-文本异质网络 词共现 文本-主题分布 短文本分类
下载PDF
基于多元语义特征和图卷积神经网络的短文本分类模型
19
作者 鲁富宇 冷泳林 崔洪霞 《河南科学》 2024年第5期625-630,共6页
在互联网和社交媒体迅猛发展的背景下,网络中出现了大量的短文本数据,这些数据在舆情监控、情感分析和新闻分类等自然语言处理领域展现出了非常高的经济和学术价值.但短文本数据固有的特征给短文本分类带来了不小的挑战,如文本稀疏和缺... 在互联网和社交媒体迅猛发展的背景下,网络中出现了大量的短文本数据,这些数据在舆情监控、情感分析和新闻分类等自然语言处理领域展现出了非常高的经济和学术价值.但短文本数据固有的特征给短文本分类带来了不小的挑战,如文本稀疏和缺乏丰富的上下文语义等.针对这些问题,提出了一种结合多元语义特征和图卷积神经网络(GCN)的短文本分类模型,该模型通过哈尔滨工业大学的语言技术平台获取短文本的多种语义特征,并将这些语义特征同短文本一起构建一个多元异构图,然后将其作为GCN的输入,利用GCN学习短文本更深层特征,最后通过Softmax函数获取每个类别的概率分布,进而实现短文本分类.试验结果表明,本模型在短文本分类的F1评分上比传统单一模型提高了4%. 展开更多
关键词 短文本 多元异构图 语义特征 图卷积神经网络 分类模型
下载PDF
基于Roberta的中文短文本语义相似度计算研究
20
作者 张小艳 李薇 《计算机应用与软件》 北大核心 2024年第8期275-281,366,共8页
针对传统基于孪生网络的文本语义相似度计算模型中存在特征提取能力不足的问题,提出一种融合孪生网络与Roberta预训练模型SRoberta-SelfAtt。在孪生网络架构上,通过Roberta预训练模型分别将原始文本对编码为字级别向量,并使用自注意力... 针对传统基于孪生网络的文本语义相似度计算模型中存在特征提取能力不足的问题,提出一种融合孪生网络与Roberta预训练模型SRoberta-SelfAtt。在孪生网络架构上,通过Roberta预训练模型分别将原始文本对编码为字级别向量,并使用自注意力机制捕获文本内部不同字之间的关联;通过池化策略获取文本对的句向量进而将表示结果交互并融合;在全连接层计算损失值,评价文本对的语义相似度。将此模型在两类任务下的三种数据集上进行实验,其结果相比于其他模型有所提升,为进一步优化文本语义相似度计算的准确率提供有效依据。 展开更多
关键词 孪生神经网络 Roberta 自注意力机制 中文短文本 语义相似度计算
下载PDF
上一页 1 2 41 下一页 到第
使用帮助 返回顶部