期刊文献+
共找到65篇文章
< 1 2 4 >
每页显示 20 50 100
Smaller & Smarter: Score-Driven Network Chaining of Smaller Language Models
1
作者 Gunika Dhingra Siddansh Chawla +1 位作者 Vijay K. Madisetti Arshdeep Bahga 《Journal of Software Engineering and Applications》 2024年第1期23-42,共20页
With the continuous evolution and expanding applications of Large Language Models (LLMs), there has been a noticeable surge in the size of the emerging models. It is not solely the growth in model size, primarily meas... With the continuous evolution and expanding applications of Large Language Models (LLMs), there has been a noticeable surge in the size of the emerging models. It is not solely the growth in model size, primarily measured by the number of parameters, but also the subsequent escalation in computational demands, hardware and software prerequisites for training, all culminating in a substantial financial investment as well. In this paper, we present novel techniques like supervision, parallelization, and scoring functions to get better results out of chains of smaller language models, rather than relying solely on scaling up model size. Firstly, we propose an approach to quantify the performance of a Smaller Language Models (SLM) by introducing a corresponding supervisor model that incrementally corrects the encountered errors. Secondly, we propose an approach to utilize two smaller language models (in a network) performing the same task and retrieving the best relevant output from the two, ensuring peak performance for a specific task. Experimental evaluations establish the quantitative accuracy improvements on financial reasoning and arithmetic calculation tasks from utilizing techniques like supervisor models (in a network of model scenario), threshold scoring and parallel processing over a baseline study. 展开更多
关键词 Large language models (LLMs) Smaller language models (SLMs) FINANCE NETWORKING Supervisor model Scoring Function
下载PDF
Enhancing Relational Triple Extraction in Specific Domains:Semantic Enhancement and Synergy of Large Language Models and Small Pre-Trained Language Models
2
作者 Jiakai Li Jianpeng Hu Geng Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第5期2481-2503,共23页
In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple e... In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach. 展开更多
关键词 Relational triple extraction semantic interaction large language models data augmentation specific domains
下载PDF
Potential use of large language models for mitigating students’problematic social media use:ChatGPT as an example
3
作者 Xin-Qiao Liu Zi-Ru Zhang 《World Journal of Psychiatry》 SCIE 2024年第3期334-341,共8页
The problematic use of social media has numerous negative impacts on individuals'daily lives,interpersonal relationships,physical and mental health,and more.Currently,there are few methods and tools to alleviate p... The problematic use of social media has numerous negative impacts on individuals'daily lives,interpersonal relationships,physical and mental health,and more.Currently,there are few methods and tools to alleviate problematic social media,and their potential is yet to be fully realized.Emerging large language models(LLMs)are becoming increasingly popular for providing information and assistance to people and are being applied in many aspects of life.In mitigating problematic social media use,LLMs such as ChatGPT can play a positive role by serving as conversational partners and outlets for users,providing personalized information and resources,monitoring and intervening in problematic social media use,and more.In this process,we should recognize both the enormous potential and endless possibilities of LLMs such as ChatGPT,leveraging their advantages to better address problematic social media use,while also acknowledging the limitations and potential pitfalls of ChatGPT technology,such as errors,limitations in issue resolution,privacy and security concerns,and potential overreliance.When we leverage the advantages of LLMs to address issues in social media usage,we must adopt a cautious and ethical approach,being vigilant of the potential adverse effects that LLMs may have in addressing problematic social media use to better harness technology to serve individuals and society. 展开更多
关键词 Problematic use of social media Social media Large language models ChatGPT Chatbots
下载PDF
Joint On-Demand Pruning and Online Distillation in Automatic Speech Recognition Language Model Optimization
4
作者 Soonshin Seo Ji-Hwan Kim 《Computers, Materials & Continua》 SCIE EI 2023年第12期2833-2856,共24页
Automatic speech recognition(ASR)systems have emerged as indispensable tools across a wide spectrum of applications,ranging from transcription services to voice-activated assistants.To enhance the performance of these... Automatic speech recognition(ASR)systems have emerged as indispensable tools across a wide spectrum of applications,ranging from transcription services to voice-activated assistants.To enhance the performance of these systems,it is important to deploy efficient models capable of adapting to diverse deployment conditions.In recent years,on-demand pruning methods have obtained significant attention within the ASR domain due to their adaptability in various deployment scenarios.However,these methods often confront substantial trade-offs,particularly in terms of unstable accuracy when reducing the model size.To address challenges,this study introduces two crucial empirical findings.Firstly,it proposes the incorporation of an online distillation mechanism during on-demand pruning training,which holds the promise of maintaining more consistent accuracy levels.Secondly,it proposes the utilization of the Mogrifier long short-term memory(LSTM)language model(LM),an advanced iteration of the conventional LSTM LM,as an effective alternative for pruning targets within the ASR framework.Through rigorous experimentation on the ASR system,employing the Mogrifier LSTM LM and training it using the suggested joint on-demand pruning and online distillation method,this study provides compelling evidence.The results exhibit that the proposed methods significantly outperform a benchmark model trained solely with on-demand pruning methods.Impressively,the proposed strategic configuration successfully reduces the parameter count by approximately 39%,all the while minimizing trade-offs. 展开更多
关键词 Automatic speech recognition neural language model Mogrifier long short-term memory PRUNING DISTILLATION efficient deployment OPTIMIZATION joint training
下载PDF
Language Model Using Differentiable Neural Computer Based on Forget Gate-Based Memory Deallocation
5
作者 Donghyun Lee Hosung Park +4 位作者 Soonshin Seo Changmin Kim Hyunsoo Son Gyujin Kim Ji-Hwan Kim 《Computers, Materials & Continua》 SCIE EI 2021年第7期537-551,共15页
A differentiable neural computer(DNC)is analogous to the Von Neumann machine with a neural network controller that interacts with an external memory through an attention mechanism.Such DNC’s offer a generalized metho... A differentiable neural computer(DNC)is analogous to the Von Neumann machine with a neural network controller that interacts with an external memory through an attention mechanism.Such DNC’s offer a generalized method for task-specific deep learning models and have demonstrated reliability with reasoning problems.In this study,we apply a DNC to a language model(LM)task.The LM task is one of the reasoning problems,because it can predict the next word using the previous word sequence.However,memory deallocation is a problem in DNCs as some information unrelated to the input sequence is not allocated and remains in the external memory,which degrades performance.Therefore,we propose a forget gatebased memory deallocation(FMD)method,which searches for the minimum value of elements in a forget gate-based retention vector.The forget gatebased retention vector indicates the retention degree of information stored in each external memory address.In experiments,we applied our proposed NTM architecture to LM tasks as a task-specific example and to rescoring for speech recognition as a general-purpose example.For LM tasks,we evaluated DNC using the Penn Treebank and enwik8 LM tasks.Although it does not yield SOTA results in LM tasks,the FMD method exhibits relatively improved performance compared with DNC in terms of bits-per-character.For the speech recognition rescoring tasks,FMD again showed a relative improvement using the LibriSpeech data in terms of word error rate. 展开更多
关键词 Forget gate-based memory deallocation differentiable neural computer language model forget gate-based retention vector
下载PDF
Six-Writings multimodal processing with pictophonetic coding to enhance Chinese language models
6
作者 Li WEIGANG Mayara Chew MARINHO +1 位作者 Denise Leyi LI Vitor Vasconcelos DE OLIVEIRA 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2024年第1期84-105,共22页
While large language models(LLMs)have made significant strides in natural language processing(NLP),they continue to face challenges in adequately addressing the intricacies of the Chinese language in certain scenarios... While large language models(LLMs)have made significant strides in natural language processing(NLP),they continue to face challenges in adequately addressing the intricacies of the Chinese language in certain scenarios.We propose a framework called Six-Writings multimodal processing(SWMP)to enable direct integration of Chinese NLP(CNLP)with morphological and semantic elements.The first part of SWMP,known as Six-Writings pictophonetic coding(SWPC),is introduced with a suitable level of granularity for radicals and components,enabling effective representation of Chinese characters and words.We conduct several experimental scenarios,including the following:(1)We establish an experimental database consisting of images and SWPC for Chinese characters,enabling dual-mode processing and matrix generation for CNLP.(2)We characterize various generative modes of Chinese words,such as thousands of Chinese idioms,used as question-and-answer(Q&A)prompt functions,facilitating analogies by SWPC.The experiments achieve 100%accuracy in answering all questions in the Chinese morphological data set(CA8-Mor-10177).(3)A fine-tuning mechanism is proposed to refine word embedding results using SWPC,resulting in an average relative error of≤25%for 39.37%of the questions in the Chinese wOrd Similarity data set(COS960).The results demonstrate that SWMP/SWPC methods effectively capture the distinctive features of Chinese and offer a promising mechanism to enhance CNLP with better efficiency. 展开更多
关键词 Chinese language model Chinese natural language processing(CNLP) Generative language model Multimodal processing Six-Writings
原文传递
The Life Cycle of Knowledge in Big Language Models:A Survey
7
作者 Boxi Cao Hongyu Lin +1 位作者 Xianpei Han Le Sun 《Machine Intelligence Research》 EI CSCD 2024年第2期217-238,共22页
Knowledge plays a critical role in artificial intelligence.Recently,the extensive success of pre-trained language models(PLMs)has raised significant attention about how knowledge can be acquired,maintained,updated and... Knowledge plays a critical role in artificial intelligence.Recently,the extensive success of pre-trained language models(PLMs)has raised significant attention about how knowledge can be acquired,maintained,updated and used by language models.Despite the enormous amount of related studies,there is still a lack of a unified view of how knowledge circulates within language models throughout the learning,tuning,and application processes,which may prevent us from further understanding the connections between current progress or realizing existing limitations.In this survey,we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods,and investigating how knowledge circulates when it is built,maintained and used.To this end,we systematically review existing studies of each period of the knowledge life cycle,summarize the main challenges and current limitations,and discuss future directions. 展开更多
关键词 Pre-trained language model knowledge acquisition knowledge representation knowledge probing knowledge editing knowledge application
原文传递
RecBERT:Semantic recommendation engine with large language model enhanced query segmentation for k-nearest neighbors ranking retrieval
8
作者 Richard Wu 《Intelligent and Converged Networks》 EI 2024年第1期42-52,共11页
The increasing amount of user traffic on Internet discussion forums has led to a huge amount of unstructured natural language data in the form of user comments.Most modern recommendation systems rely on manual tagging... The increasing amount of user traffic on Internet discussion forums has led to a huge amount of unstructured natural language data in the form of user comments.Most modern recommendation systems rely on manual tagging,relying on administrators to label the features of a class,or story,which a user comment corresponds to.Another common approach is to use pre-trained word embeddings to compare class descriptions for textual similarity,then use a distance metric such as cosine similarity or Euclidean distance to find top k neighbors.However,neither approach is able to fully utilize this user-generated unstructured natural language data,reducing the scope of these recommendation systems.This paper studies the application of domain adaptation on a transformer for the set of user comments to be indexed,and the use of simple contrastive learning for the sentence transformer fine-tuning process to generate meaningful semantic embeddings for the various user comments that apply to each class.In order to match a query containing content from multiple user comments belonging to the same class,the construction of a subquery channel for computing class-level similarity is proposed.This channel uses query segmentation of the aggregate query into subqueries,performing k-nearest neighbors(KNN)search on each individual subquery.RecBERT achieves state-of-the-art performance,outperforming other state-of-the-art models in accuracy,precision,recall,and F1 score for classifying comments between four and eight classes,respectively.RecBERT outperforms the most precise state-of-the-art model(distilRoBERTa)in precision by 6.97%for matching comments between eight classes. 展开更多
关键词 sentence transformer simple contrastive learning large language models query segmentation k-nearest neighbors
原文传递
Evaluating Privacy Leakage and Memorization Attacks on Large Language Models (LLMs) in Generative AI Applications
9
作者 Harshvardhan Aditya Siddansh Chawla +6 位作者 Gunika Dhingra Parijat Rai Saumil Sood Tanmay Singh Zeba Mohsin Wase Arshdeep Bahga Vijay K. Madisetti 《Journal of Software Engineering and Applications》 2024年第5期421-447,共27页
The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Infor... The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Information (PII) and other confidential or protected information that may have been memorized during training, specifically during a fine-tuning or customization process. We describe different black-box attacks from potential adversaries and study their impact on the amount and type of information that may be recovered from commonly used and deployed LLMs. Our research investigates the relationship between PII leakage, memorization, and factors such as model size, architecture, and the nature of attacks employed. The study utilizes two broad categories of attacks: PII leakage-focused attacks (auto-completion and extraction attacks) and memorization-focused attacks (various membership inference attacks). The findings from these investigations are quantified using an array of evaluative metrics, providing a detailed understanding of LLM vulnerabilities and the effectiveness of different attacks. 展开更多
关键词 Large language models PII Leakage Privacy Memorization Overfitting Membership Inference Attack (MIA)
下载PDF
Security Vulnerability Analyses of Large Language Models (LLMs) through Extension of the Common Vulnerability Scoring System (CVSS) Framework
10
作者 Alicia Biju Vishnupriya Ramesh Vijay K. Madisetti 《Journal of Software Engineering and Applications》 2024年第5期340-358,共19页
Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, a... Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, and more. However, their widespread usage emphasizes the critical need to enhance their security posture to ensure the integrity and reliability of their outputs and minimize harmful effects. Prompt injections and training data poisoning attacks are two of the most prominent vulnerabilities in LLMs, which could potentially lead to unpredictable and undesirable behaviors, such as biased outputs, misinformation propagation, and even malicious content generation. The Common Vulnerability Scoring System (CVSS) framework provides a standardized approach to capturing the principal characteristics of vulnerabilities, facilitating a deeper understanding of their severity within the security and AI communities. By extending the current CVSS framework, we generate scores for these vulnerabilities such that organizations can prioritize mitigation efforts, allocate resources effectively, and implement targeted security measures to defend against potential risks. 展开更多
关键词 Common Vulnerability Scoring System (CVSS) Large language models (LLMs) DALL-E Prompt Injections Training Data Poisoning CVSS Metrics
下载PDF
Adapter Based on Pre-Trained Language Models for Classification of Medical Text
11
作者 Quan Li 《Journal of Electronic Research and Application》 2024年第3期129-134,共6页
We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract informa... We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract information from medical text,facilitating more accurate classification while minimizing the number of trainable parameters.Extensive experiments conducted on various datasets demonstrate the effectiveness of our approach. 展开更多
关键词 Classification of medical text Adapter Pre-trained language model
下载PDF
NetGO 3.0:Protein Language Model Improves Large-scale Functional Annotations
12
作者 Shaojun Wang Ronghui You +2 位作者 Yunjia Liu Yi Xiong Shanfeng Zhu 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2023年第2期349-358,共10页
As one of the state-of-the-art automated function prediction(AFP)methods,NetGO 2.0 integrates multi-source information to improve the performance.However,it mainly utilizes the proteins with experimentally supported f... As one of the state-of-the-art automated function prediction(AFP)methods,NetGO 2.0 integrates multi-source information to improve the performance.However,it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins.Recently,protein language models have been proposed to learn informative representations[e.g.,Evolutionary Scale Modeling(ESM)-1b embedding] from protein sequences based on self-supervision.Here,we represented each protein by ESM-1b and used logistic regression(LR)to train a new model,LR-ESM,for AFP.The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0.Therefore,by incorporating LR-ESM into NetGO 2.0,we developed NetGO 3.0 to improve the performance of AFP extensively. 展开更多
关键词 Protein function prediction Web service Protein language model Learning to rank Large-scale multi-label learning
原文传递
Unsupervised statistical text simplification using pre-trained language modeling for initialization
13
作者 Jipeng QIANG Feng ZHANG +3 位作者 Yun LI Yunhao YUAN Yi ZHU Xindong WU 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第1期81-90,共10页
Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based mach... Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based machine translation system (UnsupPBMT) achieved good performance, which initializes the phrase tables using the similar words obtained by word embedding modeling. Since word embedding modeling only considers the relevance between words, the phrase table in UnsupPBMT contains a lot of dissimilar words. In this paper, we propose an unsupervised statistical text simplification using pre-trained language modeling BERT for initialization. Specifically, we use BERT as a general linguistic knowledge base for predicting similar words. Experimental results show that our method outperforms the state-of-the-art unsupervised text simplification methods on three benchmarks, even outperforms some supervised baselines. 展开更多
关键词 text simplification pre-trained language modeling BERT word embeddings
原文传递
Vision Enhanced Generative Pre-trained Language Model for Multimodal Sentence Summarization
14
作者 Liqiang Jing Yiren Li +3 位作者 Junhao Xu Yongcan Yu Pei Shen Xuemeng Song 《Machine Intelligence Research》 EI CSCD 2023年第2期289-298,共10页
Multimodal sentence summarization(MMSS)is a new yet challenging task that aims to generate a concise summary of a long sentence and its corresponding image.Although existing methods have gained promising success in MM... Multimodal sentence summarization(MMSS)is a new yet challenging task that aims to generate a concise summary of a long sentence and its corresponding image.Although existing methods have gained promising success in MMSS,they overlook the powerful generation ability of generative pre-trained language models(GPLMs),which have shown to be effective in many text generation tasks.To fill this research gap,we propose to using GPLMs to promote the performance of MMSS.Notably,adopting GPLMs to solve MMSS inevitably faces two challenges:1)What fusion strategy should we use to inject visual information into GPLMs properly?2)How to keep the GPLM′s generation ability intact to the utmost extent when the visual feature is injected into the GPLM.To address these two challenges,we propose a vision enhanced generative pre-trained language model for MMSS,dubbed as Vision-GPLM.In Vision-GPLM,we obtain features of visual and textual modalities with two separate encoders and utilize a text decoder to produce a summary.In particular,we utilize multi-head attention to fuse the features extracted from visual and textual modalities to inject the visual feature into the GPLM.Meanwhile,we train Vision-GPLM in two stages:the vision-oriented pre-training stage and fine-tuning stage.In the vision-oriented pre-training stage,we particularly train the visual encoder by the masked language model task while the other components are frozen,aiming to obtain homogeneous representations of text and image.In the fine-tuning stage,we train all the components of Vision-GPLM by the MMSS task.Extensive experiments on a public MMSS dataset verify the superiority of our model over existing baselines. 展开更多
关键词 Multimodal sentence summarization(MMSS) generative pre-trained language model(GPLM) natural language generation deep learning artificial intelligence
原文传递
Improving Extraction of Chinese Open Relations Using Pre-trained Language Model and Knowledge Enhancement
15
作者 Chaojie Wen Xudong Jia Tao Chen 《Data Intelligence》 EI 2023年第4期962-989,共28页
Open Relation Extraction(ORE)is a task of extracting semantic relations from a text document.Current ORE systems have significantly improved their efficiency in obtaining Chinese relations,when compared with conventio... Open Relation Extraction(ORE)is a task of extracting semantic relations from a text document.Current ORE systems have significantly improved their efficiency in obtaining Chinese relations,when compared with conventional systems which heavily depend on feature engineering or syntactic parsing.However,the ORE systems do not use robust neural networks such as pre-trained language models to take advantage of large-scale unstructured data effectively.In respons to this issue,a new system entitled Chinese Open Relation Extraction with Knowledge Enhancement(CORE-KE)is presented in this paper.The CORE-KE system employs a pre-trained language model(with the support of a Bidirectional Long Short-Term Memory(BiLSTM)layer and a Masked Conditional Random Field(Masked CRF)layer)on unstructured data in order to improve Chinese open relation extraction.Entity descriptions in Wikidata and additional knowledge(in terms of triple facts)extracted from Chinese ORE datasets are used to fine-tune the pre-trained language model.In addition,syntactic features are further adopted in the training stage of the CORE-KE system for knowledge enhancement.Experimental results of the CORE-KE system on two large-scale datasets of open Chinese entities and relations demonstrate that the CORE-KE system is superior to other ORE systems.The F1-scores of the CORE-KE system on the two datasets have given a relative improvement of 20.1%and 1.3%,when compared with benchmark ORE systems,respectively.The source code is available at https:/github.COm/cjwen15/CORE-KE. 展开更多
关键词 Chinese open relation extraction Pre-trained language model Knowledge enhancement
原文传递
Large language models for human–robot interaction:A review
16
作者 Ceng Zhang Junxin Chen +2 位作者 Jiatong Li Yanhong Peng Zebing Mao 《Biomimetic Intelligence & Robotics》 EI 2023年第4期1-15,共15页
The fusion of large language models and robotic systems has introduced a transformative paradigm in human–robot interaction,offering unparalleled capabilities in natural language understanding and task execution.This... The fusion of large language models and robotic systems has introduced a transformative paradigm in human–robot interaction,offering unparalleled capabilities in natural language understanding and task execution.This review paper offers a comprehensive analysis of this nascent but rapidly evolving domain,spotlighting the recent advances of Large Language Models(LLMs)in enhancing their structures and performances,particularly in terms of multimodal input handling,high-level reasoning,and plan generation.Moreover,it probes the current methodologies that integrate LLMs into robotic systems for complex task completion,from traditional probabilistic models to the utilization of value functions and metrics for optimal decision-making.Despite these advancements,the paper also reveals the formidable challenges that confront the field,such as contextual understanding,data privacy and ethical considerations.To our best knowledge,this is the first study to comprehensively analyze the advances and considerations of LLMs in Human–Robot Interaction(HRI)based on recent progress,which provides potential avenues for further research. 展开更多
关键词 Large language models Human-robot interaction Task completion Considerations and challenges
原文传递
PAL-BERT:An Improved Question Answering Model
17
作者 Wenfeng Zheng Siyu Lu +3 位作者 Zhuohang Cai Ruiyang Wang Lei Wang Lirong Yin 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第6期2729-2745,共17页
In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and comput... In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and computing power advance,the issue of increasingly larger models and a growing number of parameters has surfaced.Consequently,model training has become more costly and less efficient.To enhance the efficiency and accuracy of the training process while reducing themodel volume,this paper proposes a first-order pruningmodel PAL-BERT based on the ALBERT model according to the characteristics of question-answering(QA)system and language model.Firstly,a first-order network pruning method based on the ALBERT model is designed,and the PAL-BERT model is formed.Then,the parameter optimization strategy of the PAL-BERT model is formulated,and the Mish function was used as an activation function instead of ReLU to improve the performance.Finally,after comparison experiments with traditional deep learning models TextCNN and BiLSTM,it is confirmed that PALBERT is a pruning model compression method that can significantly reduce training time and optimize training efficiency.Compared with traditional models,PAL-BERT significantly improves the NLP task’s performance. 展开更多
关键词 PAL-BERT question answering model pretraining language models ALBERT pruning model network pruning TextCNN BiLSTM
下载PDF
RoBGP:A Chinese Nested Biomedical Named Entity Recognition Model Based on RoBERTa and Global Pointer
18
作者 Xiaohui Cui Chao Song +4 位作者 Dongmei Li Xiaolong Qu Jiao Long Yu Yang Hanchao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3603-3618,共16页
Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and c... Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction. 展开更多
关键词 BIOMEDICINE knowledge base named entity recognition pretrained language model global pointer
下载PDF
Incorporating Linguistic Rules in Statistical Chinese Language Model for Pinyin-to-character Conversion 被引量:2
19
作者 刘秉权 Wang +2 位作者 Xiaolong Wang Yuying 《High Technology Letters》 EI CAS 2001年第2期8-13,共6页
An N-gram Chinese language model incorporating linguistic rules is presented. By constructing elements lattice, rules information is incorporated in statistical frame. To facilitate the hybrid modeling, novel methods ... An N-gram Chinese language model incorporating linguistic rules is presented. By constructing elements lattice, rules information is incorporated in statistical frame. To facilitate the hybrid modeling, novel methods such as MI-based rule evaluating, weighted rule quantification and element-based n-gram probability approximation are presented. Dynamic Viterbi algorithm is adopted to search the best path in lattice. To strengthen the model, transformation-based error-driven rules learning is adopted. Applying proposed model to Chinese Pinyin-to-character conversion, high performance has been achieved in accuracy, flexibility and robustness simultaneously. Tests show correct rate achieves 94.81% instead of 90.53% using bi-gram Markov model alone. Many long-distance dependency and recursion in language can be processed effectively. 展开更多
关键词 Chinese Pinyin-to-character conversion Rule-based language model N-gram language model Hybrid language model Element lattice Transformation-based error-driven learning
下载PDF
Leveraging Vision-Language Pre-Trained Model and Contrastive Learning for Enhanced Multimodal Sentiment Analysis
20
作者 Jieyu An Wan Mohd Nazmee Wan Zainon Binfen Ding 《Intelligent Automation & Soft Computing》 SCIE 2023年第8期1673-1689,共17页
Multimodal sentiment analysis is an essential area of research in artificial intelligence that combines multiple modes,such as text and image,to accurately assess sentiment.However,conventional approaches that rely on... Multimodal sentiment analysis is an essential area of research in artificial intelligence that combines multiple modes,such as text and image,to accurately assess sentiment.However,conventional approaches that rely on unimodal pre-trained models for feature extraction from each modality often overlook the intrinsic connections of semantic information between modalities.This limitation is attributed to their training on unimodal data,and necessitates the use of complex fusion mechanisms for sentiment analysis.In this study,we present a novel approach that combines a vision-language pre-trained model with a proposed multimodal contrastive learning method.Our approach harnesses the power of transfer learning by utilizing a vision-language pre-trained model to extract both visual and textual representations in a unified framework.We employ a Transformer architecture to integrate these representations,thereby enabling the capture of rich semantic infor-mation in image-text pairs.To further enhance the representation learning of these pairs,we introduce our proposed multimodal contrastive learning method,which leads to improved performance in sentiment analysis tasks.Our approach is evaluated through extensive experiments on two publicly accessible datasets,where we demonstrate its effectiveness.We achieve a significant improvement in sentiment analysis accuracy,indicating the supe-riority of our approach over existing techniques.These results highlight the potential of multimodal sentiment analysis and underscore the importance of considering the intrinsic semantic connections between modalities for accurate sentiment assessment. 展开更多
关键词 Multimodal sentiment analysis vision–language pre-trained model contrastive learning sentiment classification
下载PDF
上一页 1 2 4 下一页 到第
使用帮助 返回顶部