The paper describes a texture-based fast text location scheme which operates directly in the Discrete Wavelet Transform (DWT) domain. By the distinguishing texture characteristics encoded in wavelet transform domain, ...The paper describes a texture-based fast text location scheme which operates directly in the Discrete Wavelet Transform (DWT) domain. By the distinguishing texture characteristics encoded in wavelet transform domain, the text is fast detected from complex background images stored in the compressed format such as JPEG2000 without full decompress. Compared with some traditional character location methods, the proposed scheme has the advantages of low computational cost, robust to size and font of characters and high accuracy. Preliminary experimental results show that the proposed scheme is efficient and effective.展开更多
Reading and writing are the main interaction methods with web content.Text simplification tools are helpful for people with cognitive impairments,new language learners,and children as they might find difficulties in u...Reading and writing are the main interaction methods with web content.Text simplification tools are helpful for people with cognitive impairments,new language learners,and children as they might find difficulties in understanding the complex web content.Text simplification is the process of changing complex text intomore readable and understandable text.The recent approaches to text simplification adopted the machine translation concept to learn simplification rules from a parallel corpus of complex and simple sentences.In this paper,we propose two models based on the transformer which is an encoder-decoder structure that achieves state-of-the-art(SOTA)results in machine translation.The training process for our model includes three steps:preprocessing the data using a subword tokenizer,training the model and optimizing it using the Adam optimizer,then using the model to decode the output.The first model uses the transformer only and the second model uses and integrates the Bidirectional Encoder Representations from Transformer(BERT)as encoder to enhance the training time and results.The performance of the proposed model using the transformerwas evaluated using the Bilingual Evaluation Understudy score(BLEU)and recorded(53.78)on the WikiSmall dataset.On the other hand,the experiment on the second model which is integrated with BERT shows that the validation loss decreased very fast compared with the model without the BERT.However,the BLEU score was small(44.54),which could be due to the size of the dataset so the model was overfitting and unable to generalize well.Therefore,in the future,the second model could involve experimenting with a larger dataset such as the WikiLarge.In addition,more analysis has been done on the model’s results and the used dataset using different evaluation metrics to understand their performance.展开更多
Emotion Recognition in Conversations(ERC)is fundamental in creating emotionally intelligentmachines.Graph-BasedNetwork(GBN)models have gained popularity in detecting conversational contexts for ERC tasks.However,their...Emotion Recognition in Conversations(ERC)is fundamental in creating emotionally intelligentmachines.Graph-BasedNetwork(GBN)models have gained popularity in detecting conversational contexts for ERC tasks.However,their limited ability to collect and acquire contextual information hinders their effectiveness.We propose a Text Augmentation-based computational model for recognizing emotions using transformers(TA-MERT)to address this.The proposed model uses the Multimodal Emotion Lines Dataset(MELD),which ensures a balanced representation for recognizing human emotions.Themodel used text augmentation techniques to producemore training data,improving the proposed model’s accuracy.Transformer encoders train the deep neural network(DNN)model,especially Bidirectional Encoder(BE)representations that capture both forward and backward contextual information.This integration improves the accuracy and robustness of the proposed model.Furthermore,we present a method for balancing the training dataset by creating enhanced samples from the original dataset.By balancing the dataset across all emotion categories,we can lessen the adverse effects of data imbalance on the accuracy of the proposed model.Experimental results on the MELD dataset show that TA-MERT outperforms earlier methods,achieving a weighted F1 score of 62.60%and an accuracy of 64.36%.Overall,the proposed TA-MERT model solves the GBN models’weaknesses in obtaining contextual data for ERC.TA-MERT model recognizes human emotions more accurately by employing text augmentation and transformer-based encoding.The balanced dataset and the additional training samples also enhance its resilience.These findings highlight the significance of transformer-based approaches for special emotion recognition in conversations.展开更多
模型可以生成符合用户偏好的摘要.之前的摘要模型侧重于单独控制某个属性,而不是多个属性的组合.传统的Seq2Seq多属性可控文本摘要模型在满足多个控制属性时,存在无法整合所有控制属性、无法准确再现文本中关键信息和无法处理单词表外...模型可以生成符合用户偏好的摘要.之前的摘要模型侧重于单独控制某个属性,而不是多个属性的组合.传统的Seq2Seq多属性可控文本摘要模型在满足多个控制属性时,存在无法整合所有控制属性、无法准确再现文本中关键信息和无法处理单词表外单词等问题.为此,本文提出了一种基于扩展Transformer和指针生成网络(pointer generator network,PGN)的模型.模型中的扩展Transformer将Transformer单编码器-单解码器的模型形式扩展成具有双重文本语义信息提取的双编码器和单个可融合指导信号特征的解码器形式.然后利用指针生成网络模型选择从源文本中复制单词或利用词汇表生成新的摘要信息,以解决摘要任务中常出现的OOV(out of vocabulary)问题.此外,为高效完成位置信息编码,模型在注意力层中使用相对位置表示来引入文本的序列信息.模型可以用于控制摘要的许多重要属性,包括长度、主题和具体性等.通过在公开数据集MACSum上的实验表明,相较以往方法,本文提出的模型在确保摘要质量的同时,更加符合用户给定的属性要求.展开更多
文章研究了基于Transformer模型的中文文本生成方法,重点探讨了Transformer模型的编码器-解码器结构及其工作原理。在详细分析了编码器和解码器的工作机制后,文章利用Hugging Face Transformers开源模型进行了中文文本生成实验。结果表...文章研究了基于Transformer模型的中文文本生成方法,重点探讨了Transformer模型的编码器-解码器结构及其工作原理。在详细分析了编码器和解码器的工作机制后,文章利用Hugging Face Transformers开源模型进行了中文文本生成实验。结果表明,该方法在自制数据集上取得了良好的效果,其准确率、精确率和召回率分别达到92.5%、91.8%和90.6%。该研究不仅拓展了中文自然语言处理的理论基础,还为实际应用提供了高效的技术支持。展开更多
传统的文本摘要方法,如基于循环神经网络和Encoder-Decoder框架构建的摘要生成模型等,在生成文本摘要时存在并行能力不足或长期依赖的性能缺陷,以及文本摘要生成的准确率和流畅度的问题。对此,提出了一种动态词嵌入摘要生成方法。该方...传统的文本摘要方法,如基于循环神经网络和Encoder-Decoder框架构建的摘要生成模型等,在生成文本摘要时存在并行能力不足或长期依赖的性能缺陷,以及文本摘要生成的准确率和流畅度的问题。对此,提出了一种动态词嵌入摘要生成方法。该方法基于改进的Transformer模型,在文本预处理阶段引入先验知识,将ELMo(Embeddings from Language Models)动态词向量作为训练文本的词表征,结合此词对应当句的文本句向量拼接生成输入文本矩阵,将文本矩阵输入到Encoder生成固定长度的文本向量表达,然后通过Decoder将此向量表达解码生成目标文本摘要。实验采用Rouge值作为摘要的评测指标,与其他方法进行的对比实验结果表明,所提方法所生成的文本摘要的准确率和流畅度更高。展开更多
基于目标的情感分析(Target-Based Sentiment Analysis)是情感分析领域最具有挑战性的课题之一,需要同时解决目标提取和特定目标情感分析两个子任务.现有研究工作仍存在两个问题:第一,模型无法充分利用目标边界和情感信息;第二,普遍采...基于目标的情感分析(Target-Based Sentiment Analysis)是情感分析领域最具有挑战性的课题之一,需要同时解决目标提取和特定目标情感分析两个子任务.现有研究工作仍存在两个问题:第一,模型无法充分利用目标边界和情感信息;第二,普遍采用长短期记忆网络提取特征,无法捕抓输入句子的内部关系.为了解决上述问题,本文通过引入方向感知的Transformer,提出一种基于双辅助网络的目标情感分析模型DNTSA(Dual-assist Network based model for Target Sentiment Analysis),其核心思想是使用方向感知的Transformer作为特征提取器有效对齐多个目标词和情感词的内在联系,通过双辅助网络进一步增强模型的情感识别和目标提取能力.本文提出的方法在Laptop,Restaurant,Twitter 3个公开数据集上对比基准方法E2E-TBSA分别提升了2.3%,1.8%,3.9%的F1值.展开更多
针对传统引入注意力机制的Encoder-Decoder模型在摘要生成任务上存在文字冗余、表述不一致、非登录词(out of vocabulary,OOV)等问题,而导致生成摘要准确性较差,对可嵌入文本位置信息的Transformer模型进行了改进。提出引入指针网络帮...针对传统引入注意力机制的Encoder-Decoder模型在摘要生成任务上存在文字冗余、表述不一致、非登录词(out of vocabulary,OOV)等问题,而导致生成摘要准确性较差,对可嵌入文本位置信息的Transformer模型进行了改进。提出引入指针网络帮助解码,利用指针网络生成文本的优势生成摘要,并在LCSTS中文短文本摘要数据集上验证了该模型的有效性。结果表明:改进后的Transformer模型在ROUGE评分上比基准模型平均高出2分,在保证摘要与输入文本一致性的同时,其生成内容的显著性和语言的流畅性提升明显。展开更多
数据到文本的生成是指从结构化数据生成连贯文本的一种自然语言处理方法。近年来,由于端到端训练的深度神经网络的应用,数据到文本生成的方法显示出了巨大潜力。该方法能够处理大量数据自动生成连贯性文本,常用于新闻写作、报告生成等...数据到文本的生成是指从结构化数据生成连贯文本的一种自然语言处理方法。近年来,由于端到端训练的深度神经网络的应用,数据到文本生成的方法显示出了巨大潜力。该方法能够处理大量数据自动生成连贯性文本,常用于新闻写作、报告生成等场景。然而,已有研究中对于数据中具体数值、时间等数据信息的推理存在较大缺陷,无法充分利用数据间的结构信息给出合理的生成指引,并且生成过程容易出现语义与句法分离训练的问题。因此,文中提出一种结合Transformer模型与深度神经网络的数据到文本生成方法,并提出一个用于内容规划的Transformer Text Planning(TTP)算法,有效地解决上述问题。在Rotowire公开数据集上进行方法验证,实验结果表明,文中方法性能优于已有数据到文本生成模型,可直接应用于结构化数据到连贯性文本的生成任务中,具有一定的实际应用价值。展开更多
基金Supported by the National Natural Science Foundation of China(No.60402036)the Natural Science Foundation of Beijing(No.4042008).
文摘The paper describes a texture-based fast text location scheme which operates directly in the Discrete Wavelet Transform (DWT) domain. By the distinguishing texture characteristics encoded in wavelet transform domain, the text is fast detected from complex background images stored in the compressed format such as JPEG2000 without full decompress. Compared with some traditional character location methods, the proposed scheme has the advantages of low computational cost, robust to size and font of characters and high accuracy. Preliminary experimental results show that the proposed scheme is efficient and effective.
文摘Reading and writing are the main interaction methods with web content.Text simplification tools are helpful for people with cognitive impairments,new language learners,and children as they might find difficulties in understanding the complex web content.Text simplification is the process of changing complex text intomore readable and understandable text.The recent approaches to text simplification adopted the machine translation concept to learn simplification rules from a parallel corpus of complex and simple sentences.In this paper,we propose two models based on the transformer which is an encoder-decoder structure that achieves state-of-the-art(SOTA)results in machine translation.The training process for our model includes three steps:preprocessing the data using a subword tokenizer,training the model and optimizing it using the Adam optimizer,then using the model to decode the output.The first model uses the transformer only and the second model uses and integrates the Bidirectional Encoder Representations from Transformer(BERT)as encoder to enhance the training time and results.The performance of the proposed model using the transformerwas evaluated using the Bilingual Evaluation Understudy score(BLEU)and recorded(53.78)on the WikiSmall dataset.On the other hand,the experiment on the second model which is integrated with BERT shows that the validation loss decreased very fast compared with the model without the BERT.However,the BLEU score was small(44.54),which could be due to the size of the dataset so the model was overfitting and unable to generalize well.Therefore,in the future,the second model could involve experimenting with a larger dataset such as the WikiLarge.In addition,more analysis has been done on the model’s results and the used dataset using different evaluation metrics to understand their performance.
文摘Emotion Recognition in Conversations(ERC)is fundamental in creating emotionally intelligentmachines.Graph-BasedNetwork(GBN)models have gained popularity in detecting conversational contexts for ERC tasks.However,their limited ability to collect and acquire contextual information hinders their effectiveness.We propose a Text Augmentation-based computational model for recognizing emotions using transformers(TA-MERT)to address this.The proposed model uses the Multimodal Emotion Lines Dataset(MELD),which ensures a balanced representation for recognizing human emotions.Themodel used text augmentation techniques to producemore training data,improving the proposed model’s accuracy.Transformer encoders train the deep neural network(DNN)model,especially Bidirectional Encoder(BE)representations that capture both forward and backward contextual information.This integration improves the accuracy and robustness of the proposed model.Furthermore,we present a method for balancing the training dataset by creating enhanced samples from the original dataset.By balancing the dataset across all emotion categories,we can lessen the adverse effects of data imbalance on the accuracy of the proposed model.Experimental results on the MELD dataset show that TA-MERT outperforms earlier methods,achieving a weighted F1 score of 62.60%and an accuracy of 64.36%.Overall,the proposed TA-MERT model solves the GBN models’weaknesses in obtaining contextual data for ERC.TA-MERT model recognizes human emotions more accurately by employing text augmentation and transformer-based encoding.The balanced dataset and the additional training samples also enhance its resilience.These findings highlight the significance of transformer-based approaches for special emotion recognition in conversations.
文摘模型可以生成符合用户偏好的摘要.之前的摘要模型侧重于单独控制某个属性,而不是多个属性的组合.传统的Seq2Seq多属性可控文本摘要模型在满足多个控制属性时,存在无法整合所有控制属性、无法准确再现文本中关键信息和无法处理单词表外单词等问题.为此,本文提出了一种基于扩展Transformer和指针生成网络(pointer generator network,PGN)的模型.模型中的扩展Transformer将Transformer单编码器-单解码器的模型形式扩展成具有双重文本语义信息提取的双编码器和单个可融合指导信号特征的解码器形式.然后利用指针生成网络模型选择从源文本中复制单词或利用词汇表生成新的摘要信息,以解决摘要任务中常出现的OOV(out of vocabulary)问题.此外,为高效完成位置信息编码,模型在注意力层中使用相对位置表示来引入文本的序列信息.模型可以用于控制摘要的许多重要属性,包括长度、主题和具体性等.通过在公开数据集MACSum上的实验表明,相较以往方法,本文提出的模型在确保摘要质量的同时,更加符合用户给定的属性要求.
文摘文章研究了基于Transformer模型的中文文本生成方法,重点探讨了Transformer模型的编码器-解码器结构及其工作原理。在详细分析了编码器和解码器的工作机制后,文章利用Hugging Face Transformers开源模型进行了中文文本生成实验。结果表明,该方法在自制数据集上取得了良好的效果,其准确率、精确率和召回率分别达到92.5%、91.8%和90.6%。该研究不仅拓展了中文自然语言处理的理论基础,还为实际应用提供了高效的技术支持。
文摘传统的文本摘要方法,如基于循环神经网络和Encoder-Decoder框架构建的摘要生成模型等,在生成文本摘要时存在并行能力不足或长期依赖的性能缺陷,以及文本摘要生成的准确率和流畅度的问题。对此,提出了一种动态词嵌入摘要生成方法。该方法基于改进的Transformer模型,在文本预处理阶段引入先验知识,将ELMo(Embeddings from Language Models)动态词向量作为训练文本的词表征,结合此词对应当句的文本句向量拼接生成输入文本矩阵,将文本矩阵输入到Encoder生成固定长度的文本向量表达,然后通过Decoder将此向量表达解码生成目标文本摘要。实验采用Rouge值作为摘要的评测指标,与其他方法进行的对比实验结果表明,所提方法所生成的文本摘要的准确率和流畅度更高。
文摘基于目标的情感分析(Target-Based Sentiment Analysis)是情感分析领域最具有挑战性的课题之一,需要同时解决目标提取和特定目标情感分析两个子任务.现有研究工作仍存在两个问题:第一,模型无法充分利用目标边界和情感信息;第二,普遍采用长短期记忆网络提取特征,无法捕抓输入句子的内部关系.为了解决上述问题,本文通过引入方向感知的Transformer,提出一种基于双辅助网络的目标情感分析模型DNTSA(Dual-assist Network based model for Target Sentiment Analysis),其核心思想是使用方向感知的Transformer作为特征提取器有效对齐多个目标词和情感词的内在联系,通过双辅助网络进一步增强模型的情感识别和目标提取能力.本文提出的方法在Laptop,Restaurant,Twitter 3个公开数据集上对比基准方法E2E-TBSA分别提升了2.3%,1.8%,3.9%的F1值.
文摘针对传统引入注意力机制的Encoder-Decoder模型在摘要生成任务上存在文字冗余、表述不一致、非登录词(out of vocabulary,OOV)等问题,而导致生成摘要准确性较差,对可嵌入文本位置信息的Transformer模型进行了改进。提出引入指针网络帮助解码,利用指针网络生成文本的优势生成摘要,并在LCSTS中文短文本摘要数据集上验证了该模型的有效性。结果表明:改进后的Transformer模型在ROUGE评分上比基准模型平均高出2分,在保证摘要与输入文本一致性的同时,其生成内容的显著性和语言的流畅性提升明显。
文摘数据到文本的生成是指从结构化数据生成连贯文本的一种自然语言处理方法。近年来,由于端到端训练的深度神经网络的应用,数据到文本生成的方法显示出了巨大潜力。该方法能够处理大量数据自动生成连贯性文本,常用于新闻写作、报告生成等场景。然而,已有研究中对于数据中具体数值、时间等数据信息的推理存在较大缺陷,无法充分利用数据间的结构信息给出合理的生成指引,并且生成过程容易出现语义与句法分离训练的问题。因此,文中提出一种结合Transformer模型与深度神经网络的数据到文本生成方法,并提出一个用于内容规划的Transformer Text Planning(TTP)算法,有效地解决上述问题。在Rotowire公开数据集上进行方法验证,实验结果表明,文中方法性能优于已有数据到文本生成模型,可直接应用于结构化数据到连贯性文本的生成任务中,具有一定的实际应用价值。