随着计算机算力的提升和智能设备的普及,社会逐步进入智慧化时代。高校图书馆作为高校的文献信息中心,进行智慧化转型提升服务质量是时代所需。因此,文章借助智能问答技术,设计了基于自然语言处理(Natural Language Processing,NLP)的...随着计算机算力的提升和智能设备的普及,社会逐步进入智慧化时代。高校图书馆作为高校的文献信息中心,进行智慧化转型提升服务质量是时代所需。因此,文章借助智能问答技术,设计了基于自然语言处理(Natural Language Processing,NLP)的图书馆智能问答系统,创新图书馆参考咨询服务模式,提高图书馆服务水平和效率。展开更多
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in...As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues.展开更多
在数字化时代,智能语音质检成为企业提升工作效率的重要工具,其中自然语言处理(Natural Language Processing,NLP)技术的应用为智能语音质检提供了技术支持。NLP技术通过情感分析、语义分析等手段,使得质检过程更加高效、准确,并降低了...在数字化时代,智能语音质检成为企业提升工作效率的重要工具,其中自然语言处理(Natural Language Processing,NLP)技术的应用为智能语音质检提供了技术支持。NLP技术通过情感分析、语义分析等手段,使得质检过程更加高效、准确,并降低了质检成本。基于此,探讨了NLP技术在智能语音质检中的应用优势和具体实现方式。展开更多
随着人工智能技术的快速发展,自然语言处理(Natural Language Processing,NLP)技术在各个领域得到了广泛应用。文章提出一种基于NLP技术的智能培训系统中知识点与题库关联方法,该方法利用NLP技术对培训材料进行文本分析,自动提取知识点...随着人工智能技术的快速发展,自然语言处理(Natural Language Processing,NLP)技术在各个领域得到了广泛应用。文章提出一种基于NLP技术的智能培训系统中知识点与题库关联方法,该方法利用NLP技术对培训材料进行文本分析,自动提取知识点,并基于知识点和题库之间建立关联模型,实现试卷题目的自动分配。该方法能够有效提高培训系统的智能化水平,提高培训效率和质量。展开更多
Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis envir...Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis environment based on Python language, and built a corpus based on the core chapters of SESD. The second step was to digitalize the corpus. The main steps included word segmentation, information cleaning and merging, document-entry matrix, dictionary compilation and information conversion. The third step was to mine and display the internal information of SESD corpus by means of word cloud, keyword extraction and visualization. Results NLP played a positive role in computer recognition and comprehension of SESD. Different chapters had different keywords and weights. Deficiency syndrome elements were an important component of SESD, such as "Qi deficiency""Yang deficiency" and "Yin deficiency". The important syndrome elements of substantiality included "Blood stasis""Qi stagnation", etc. Core syndrome elements were closely related. Conclusions Syndrome differentiation and treatment was the core of SESD. Using NLP to excavate syndromes differentiation could help reveal the internal relationship between syndromes differentiation and provide basis for artificial intelligence to learn syndromes differentiation.展开更多
随着自然语言处理(Natural Language Processing,NLP)技术的发展,其对各行各业的发展注入了新的动力,同时在网络教育快速发展的背景下,二者的有机融合也便成为热点。本文提出基于浏览器/服务器(Browser/Server,B/S)模式,通过建立录题模...随着自然语言处理(Natural Language Processing,NLP)技术的发展,其对各行各业的发展注入了新的动力,同时在网络教育快速发展的背景下,二者的有机融合也便成为热点。本文提出基于浏览器/服务器(Browser/Server,B/S)模式,通过建立录题模板实现对试题的分割和录入,借助Textrank4zh和word2vec模块,建立以TextRank算法为基础的隐马尔可夫模型完成组卷功能,完成以Vue.js框架为前端和Flask框架为后端的题库考试系统的设计与实现。该项目在减轻教师工作量的同时可更好地考察学生知识掌握的程度。展开更多
Background:In this investigation,we explore the literature regarding neuroregeneration from the 1700s to the present.The regeneration of central nervous system neurons or the regeneration of axons from cell bodies and...Background:In this investigation,we explore the literature regarding neuroregeneration from the 1700s to the present.The regeneration of central nervous system neurons or the regeneration of axons from cell bodies and their reconnection with other neurons remains a major hurdle.Injuries relating to war and accidents attracted medical professionals throughout early history to regenerate and reconnect nerves.Early literature till 1990 lacked specific molecular details and is likely provide some clues to conditions that promoted neuron and/or axon regeneration.This is an avenue for the application of natural language processing(NLP)to gain actionable intelligence.Post 1990 period saw an explosion of all molecular details.With the advent of genomic,transcriptomics,proteomics,and other omics-there is an emergence of big data sets and is another rich area for application of NLP.How the neuron and/or axon regeneration related keywords have changed over the years is a first step towards this endeavor.Methods:Specifically,this article curates over 600 published works in the field of neuroregeneration.We then apply a dynamic topic modeling algorithm based on the Latent Dirichlet allocation(LDA)algorithm to assess how topics cluster based on topics.Results:Based on how documents are assigned to topics,we then build a recommendation engine to assist researchers to access domain-specific literature based on how their search text matches to recommended document topics.The interface further includes interactive topic visualizations for researchers to understand how topics grow closer and further apart,and how intra-topic composition changes over time.Conclusions:We present a recommendation engine and interactive interface that enables dynamic topic modeling for neuronal regeneration.展开更多
网络流量识别是网络管理和安全服务的基础.随着互联网的不断扩展及其复杂性的增加,传统基于规则的识别方法或流行为特征的方法正在面临着巨大挑战.受自然语言处理(Nature Language Processing, NLP)启发,本文提出了一种多特征融合的加...网络流量识别是网络管理和安全服务的基础.随着互联网的不断扩展及其复杂性的增加,传统基于规则的识别方法或流行为特征的方法正在面临着巨大挑战.受自然语言处理(Nature Language Processing, NLP)启发,本文提出了一种多特征融合的加密流量快速分类方法 .该方法通过融合数据包和字节序列特征来完成网络流的特征表示,采用双元字节编码将所选特征扩展为双字节序列,增加了字节的上下文语义特征;通过与数据包特征处理相适应的池化方法来最大限度保留数据包的特征信息,从而使所提模型具有更强的抗噪能力和更精确的分类能力.本文方法分别在ISCX-2016和一个包含66个热门应用程序的私有数据集(ETD66)上进行验证,并与其他模型展开比较.结果表明:本文所提方法在ISCX-2016及ETD66上的测试精度和性能都明显优于其他流量分类模型,分别取得了98.2%和98.6%的识别准确率,从而证明了所提方法的特征提取能力和强泛化能力.展开更多
针对当前大多数命名实体识别(NER)模型只使用字符级信息编码且缺乏对文本层次信息提取的问题,提出一种融合多粒度语言知识与层级信息的中文NER(CNER)模型(CMH)。首先,使用经过多粒度语言知识预训练的模型编码文本,使模型能够同时捕获文...针对当前大多数命名实体识别(NER)模型只使用字符级信息编码且缺乏对文本层次信息提取的问题,提出一种融合多粒度语言知识与层级信息的中文NER(CNER)模型(CMH)。首先,使用经过多粒度语言知识预训练的模型编码文本,使模型能够同时捕获文本的细粒度和粗粒度语言信息,从而更好地表征语料;其次,使用ON-LSTM(Ordered Neurons Long Short-Term Memory network)模型提取层级信息,利用文本本身的层级结构信息增强编码间的时序关系;最后,在模型的解码端结合文本的分词信息,并将实体识别问题转化为表格填充问题,以更好地解决实体重叠问题并获得更准确的实体识别结果。同时,为解决当前模型在不同领域中的迁移能力较差的问题,提出通用实体识别的理念,通过筛选多领域的通用实体类型,构建一套提升模型在多领域中的泛化能力的通用NER数据集MDNER(Multi-Domain NER dataset)。为验证所提模型的效果,在数据集Resume、Weibo、MSRA上进行实验,与MECT(Multi-metadata Embedding based Cross-Transformer)模型相比,F1值分别提高了0.94、4.95和1.58个百分点。为了验证所提模型在多领域中的实体识别效果,在MDNER上进行实验,F1值达到了95.29%。实验结果表明,多粒度语言知识预训练、文本层级结构信息提取和高效指针解码器对模型的性能提升至关重要。展开更多
近年来,大语言模型(large language model,LLM)在一系列下游任务中得到了广泛应用,并在多个领域表现出了卓越的文本理解、生成与推理能力.然而,越狱攻击正成为大语言模型的新兴威胁.越狱攻击能够绕过大语言模型的安全机制,削弱价值观对...近年来,大语言模型(large language model,LLM)在一系列下游任务中得到了广泛应用,并在多个领域表现出了卓越的文本理解、生成与推理能力.然而,越狱攻击正成为大语言模型的新兴威胁.越狱攻击能够绕过大语言模型的安全机制,削弱价值观对齐的影响,诱使经过对齐的大语言模型产生有害输出.越狱攻击带来的滥用、劫持、泄露等问题已对基于大语言模型的对话系统与应用程序造成了严重威胁.对近年的越狱攻击研究进行了系统梳理,并基于攻击原理将其分为基于人工设计的攻击、基于模型生成的攻击与基于对抗性优化的攻击3类.详细总结了相关研究的基本原理、实施方法与研究结论,全面回顾了大语言模型越狱攻击的发展历程,为后续的研究提供了有效参考.对现有的安全措施进行了简略回顾,从内部防御与外部防御2个角度介绍了能够缓解越狱攻击并提高大语言模型生成内容安全性的相关技术,并对不同方法的利弊进行了罗列与比较.在上述工作的基础上,对大语言模型越狱攻击领域的现存问题与前沿方向进行探讨,并结合多模态、模型编辑、多智能体等方向进行研究展望.展开更多
文摘随着计算机算力的提升和智能设备的普及,社会逐步进入智慧化时代。高校图书馆作为高校的文献信息中心,进行智慧化转型提升服务质量是时代所需。因此,文章借助智能问答技术,设计了基于自然语言处理(Natural Language Processing,NLP)的图书馆智能问答系统,创新图书馆参考咨询服务模式,提高图书馆服务水平和效率。
文摘As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues.
文摘在数字化时代,智能语音质检成为企业提升工作效率的重要工具,其中自然语言处理(Natural Language Processing,NLP)技术的应用为智能语音质检提供了技术支持。NLP技术通过情感分析、语义分析等手段,使得质检过程更加高效、准确,并降低了质检成本。基于此,探讨了NLP技术在智能语音质检中的应用优势和具体实现方式。
文摘随着人工智能技术的快速发展,自然语言处理(Natural Language Processing,NLP)技术在各个领域得到了广泛应用。文章提出一种基于NLP技术的智能培训系统中知识点与题库关联方法,该方法利用NLP技术对培训材料进行文本分析,自动提取知识点,并基于知识点和题库之间建立关联模型,实现试卷题目的自动分配。该方法能够有效提高培训系统的智能化水平,提高培训效率和质量。
基金the funding support from the National Natural Science Foundation of China (No. 81874429)Digital and Applied Research Platform for Diagnosis of Traditional Chinese Medicine (No. 49021003005)+1 种基金2018 Hunan Provincial Postgraduate Research Innovation Project (No. CX2018B465)Excellent Youth Project of Hunan Education Department in 2018 (No. 18B241)
文摘Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis environment based on Python language, and built a corpus based on the core chapters of SESD. The second step was to digitalize the corpus. The main steps included word segmentation, information cleaning and merging, document-entry matrix, dictionary compilation and information conversion. The third step was to mine and display the internal information of SESD corpus by means of word cloud, keyword extraction and visualization. Results NLP played a positive role in computer recognition and comprehension of SESD. Different chapters had different keywords and weights. Deficiency syndrome elements were an important component of SESD, such as "Qi deficiency""Yang deficiency" and "Yin deficiency". The important syndrome elements of substantiality included "Blood stasis""Qi stagnation", etc. Core syndrome elements were closely related. Conclusions Syndrome differentiation and treatment was the core of SESD. Using NLP to excavate syndromes differentiation could help reveal the internal relationship between syndromes differentiation and provide basis for artificial intelligence to learn syndromes differentiation.
文摘随着自然语言处理(Natural Language Processing,NLP)技术的发展,其对各行各业的发展注入了新的动力,同时在网络教育快速发展的背景下,二者的有机融合也便成为热点。本文提出基于浏览器/服务器(Browser/Server,B/S)模式,通过建立录题模板实现对试题的分割和录入,借助Textrank4zh和word2vec模块,建立以TextRank算法为基础的隐马尔可夫模型完成组卷功能,完成以Vue.js框架为前端和Flask框架为后端的题库考试系统的设计与实现。该项目在减轻教师工作量的同时可更好地考察学生知识掌握的程度。
文摘Background:In this investigation,we explore the literature regarding neuroregeneration from the 1700s to the present.The regeneration of central nervous system neurons or the regeneration of axons from cell bodies and their reconnection with other neurons remains a major hurdle.Injuries relating to war and accidents attracted medical professionals throughout early history to regenerate and reconnect nerves.Early literature till 1990 lacked specific molecular details and is likely provide some clues to conditions that promoted neuron and/or axon regeneration.This is an avenue for the application of natural language processing(NLP)to gain actionable intelligence.Post 1990 period saw an explosion of all molecular details.With the advent of genomic,transcriptomics,proteomics,and other omics-there is an emergence of big data sets and is another rich area for application of NLP.How the neuron and/or axon regeneration related keywords have changed over the years is a first step towards this endeavor.Methods:Specifically,this article curates over 600 published works in the field of neuroregeneration.We then apply a dynamic topic modeling algorithm based on the Latent Dirichlet allocation(LDA)algorithm to assess how topics cluster based on topics.Results:Based on how documents are assigned to topics,we then build a recommendation engine to assist researchers to access domain-specific literature based on how their search text matches to recommended document topics.The interface further includes interactive topic visualizations for researchers to understand how topics grow closer and further apart,and how intra-topic composition changes over time.Conclusions:We present a recommendation engine and interactive interface that enables dynamic topic modeling for neuronal regeneration.
文摘网络流量识别是网络管理和安全服务的基础.随着互联网的不断扩展及其复杂性的增加,传统基于规则的识别方法或流行为特征的方法正在面临着巨大挑战.受自然语言处理(Nature Language Processing, NLP)启发,本文提出了一种多特征融合的加密流量快速分类方法 .该方法通过融合数据包和字节序列特征来完成网络流的特征表示,采用双元字节编码将所选特征扩展为双字节序列,增加了字节的上下文语义特征;通过与数据包特征处理相适应的池化方法来最大限度保留数据包的特征信息,从而使所提模型具有更强的抗噪能力和更精确的分类能力.本文方法分别在ISCX-2016和一个包含66个热门应用程序的私有数据集(ETD66)上进行验证,并与其他模型展开比较.结果表明:本文所提方法在ISCX-2016及ETD66上的测试精度和性能都明显优于其他流量分类模型,分别取得了98.2%和98.6%的识别准确率,从而证明了所提方法的特征提取能力和强泛化能力.
文摘针对当前大多数命名实体识别(NER)模型只使用字符级信息编码且缺乏对文本层次信息提取的问题,提出一种融合多粒度语言知识与层级信息的中文NER(CNER)模型(CMH)。首先,使用经过多粒度语言知识预训练的模型编码文本,使模型能够同时捕获文本的细粒度和粗粒度语言信息,从而更好地表征语料;其次,使用ON-LSTM(Ordered Neurons Long Short-Term Memory network)模型提取层级信息,利用文本本身的层级结构信息增强编码间的时序关系;最后,在模型的解码端结合文本的分词信息,并将实体识别问题转化为表格填充问题,以更好地解决实体重叠问题并获得更准确的实体识别结果。同时,为解决当前模型在不同领域中的迁移能力较差的问题,提出通用实体识别的理念,通过筛选多领域的通用实体类型,构建一套提升模型在多领域中的泛化能力的通用NER数据集MDNER(Multi-Domain NER dataset)。为验证所提模型的效果,在数据集Resume、Weibo、MSRA上进行实验,与MECT(Multi-metadata Embedding based Cross-Transformer)模型相比,F1值分别提高了0.94、4.95和1.58个百分点。为了验证所提模型在多领域中的实体识别效果,在MDNER上进行实验,F1值达到了95.29%。实验结果表明,多粒度语言知识预训练、文本层级结构信息提取和高效指针解码器对模型的性能提升至关重要。
文摘近年来,大语言模型(large language model,LLM)在一系列下游任务中得到了广泛应用,并在多个领域表现出了卓越的文本理解、生成与推理能力.然而,越狱攻击正成为大语言模型的新兴威胁.越狱攻击能够绕过大语言模型的安全机制,削弱价值观对齐的影响,诱使经过对齐的大语言模型产生有害输出.越狱攻击带来的滥用、劫持、泄露等问题已对基于大语言模型的对话系统与应用程序造成了严重威胁.对近年的越狱攻击研究进行了系统梳理,并基于攻击原理将其分为基于人工设计的攻击、基于模型生成的攻击与基于对抗性优化的攻击3类.详细总结了相关研究的基本原理、实施方法与研究结论,全面回顾了大语言模型越狱攻击的发展历程,为后续的研究提供了有效参考.对现有的安全措施进行了简略回顾,从内部防御与外部防御2个角度介绍了能够缓解越狱攻击并提高大语言模型生成内容安全性的相关技术,并对不同方法的利弊进行了罗列与比较.在上述工作的基础上,对大语言模型越狱攻击领域的现存问题与前沿方向进行探讨,并结合多模态、模型编辑、多智能体等方向进行研究展望.