Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requir...Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88.展开更多
Background: Sperm DNA fragmentation(sDF) has been proved to be an important parameter in order to predict in vitro the potential fertility of a semen sample. Colloid centrifugation could be a suitable technique to ...Background: Sperm DNA fragmentation(sDF) has been proved to be an important parameter in order to predict in vitro the potential fertility of a semen sample. Colloid centrifugation could be a suitable technique to select those donkey sperm more resistant to DNA fragmentation after thawing. Previous studies have shown that to elucidate the latent damage of the DNA molecule, sDF should be assessed dynamically, where the rate of fragmentation between treatments indicates how resistant the DNA is to iatrogenic damage. The rate of fragmentation is calculated using the slope of a linear regression equation. However, it has not been studied if s DF dynamics fit this model. The objectives of this study were to evaluate the effect of different after-thawing centrifugation protocols on sperm DNA fragmentation and elucidate the most accurate mathematical model(linear regression, exponential or polynomial) for DNA fragmentation over time in frozen-thawed donkey semen.Results: After submitting post-thaw semen samples to no centrifugation(UDC), sperm washing(SW) or single layer centrifugation(SLC) protocols, sD F values after 6 h of incubation were significantly lower in SLC samples than in SW or UDC.Coefficient of determination(R-2) values were significantly higher for a second order polynomial model than for linear or exponential. The highest values for acceleration of fragmentation(aSDF) were obtained for SW, fol owed by SLC and UDC.Conclusion: SLC after thawing seems to preserve longer DNA longevity in comparison to UDC and SW. Moreover,the fine-tuning of models has shown that sDF dynamics in frozen-thawed donkey semen fit a second order polynomial model, which implies that fragmentation rate is not constant and fragmentation acceleration must be taken into account to elucidate hidden damage in the DNA molecule.展开更多
The existing multi-objective wheel profile optimization methods mainly consist of three sub-modules:(1)wheel profile generation,(2)multi-body dynamics simulation,and(3)an optimization algorithm.For the first module,a ...The existing multi-objective wheel profile optimization methods mainly consist of three sub-modules:(1)wheel profile generation,(2)multi-body dynamics simulation,and(3)an optimization algorithm.For the first module,a comparably conservative rotary-scaling finetuning(RSFT)method,which introduces two design variables and an empirical formula,is proposed to fine-tune the traditional wheel profiles for improving their engineering applicability.For the second module,for the TRAXX locomotives serving on the Blankenburg–Rubeland line,an optimization function representing the relationship between the wheel profile and the wheel–rail wear number is established based on Kriging surrogate model(KSM).For the third module,a method combining the regression capability of KSM with the iterative computing power of particle swarm optimization(PSO)is proposed to quickly and reliably implement the task of optimizing wheel profiles.Finally,with the RSFT–KSM–PSO method,we propose two wear-resistant wheel profiles for the TRAXX locomotives serving on the Blankenburg–Rubeland line,namely S1002-S and S1002-M.The S1002-S profile minimizes the total wear number by 30%,while the S1002-M profile makes the wear distribution more uniform through a proper sacrifice of the tread wear number,and the total wear number is reduced by 21%.The quasi-static and hunting stability tests further demonstrate that the profile designed by the RSFT–KSM–PSO method is promising for practical engineering applications.展开更多
Automatic speech recognition(ASR)systems have emerged as indispensable tools across a wide spectrum of applications,ranging from transcription services to voice-activated assistants.To enhance the performance of these...Automatic speech recognition(ASR)systems have emerged as indispensable tools across a wide spectrum of applications,ranging from transcription services to voice-activated assistants.To enhance the performance of these systems,it is important to deploy efficient models capable of adapting to diverse deployment conditions.In recent years,on-demand pruning methods have obtained significant attention within the ASR domain due to their adaptability in various deployment scenarios.However,these methods often confront substantial trade-offs,particularly in terms of unstable accuracy when reducing the model size.To address challenges,this study introduces two crucial empirical findings.Firstly,it proposes the incorporation of an online distillation mechanism during on-demand pruning training,which holds the promise of maintaining more consistent accuracy levels.Secondly,it proposes the utilization of the Mogrifier long short-term memory(LSTM)language model(LM),an advanced iteration of the conventional LSTM LM,as an effective alternative for pruning targets within the ASR framework.Through rigorous experimentation on the ASR system,employing the Mogrifier LSTM LM and training it using the suggested joint on-demand pruning and online distillation method,this study provides compelling evidence.The results exhibit that the proposed methods significantly outperform a benchmark model trained solely with on-demand pruning methods.Impressively,the proposed strategic configuration successfully reduces the parameter count by approximately 39%,all the while minimizing trade-offs.展开更多
In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and comput...In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and computing power advance,the issue of increasingly larger models and a growing number of parameters has surfaced.Consequently,model training has become more costly and less efficient.To enhance the efficiency and accuracy of the training process while reducing themodel volume,this paper proposes a first-order pruningmodel PAL-BERT based on the ALBERT model according to the characteristics of question-answering(QA)system and language model.Firstly,a first-order network pruning method based on the ALBERT model is designed,and the PAL-BERT model is formed.Then,the parameter optimization strategy of the PAL-BERT model is formulated,and the Mish function was used as an activation function instead of ReLU to improve the performance.Finally,after comparison experiments with traditional deep learning models TextCNN and BiLSTM,it is confirmed that PALBERT is a pruning model compression method that can significantly reduce training time and optimize training efficiency.Compared with traditional models,PAL-BERT significantly improves the NLP task’s performance.展开更多
This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large mode...This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.展开更多
A large language model(LLM)is constructed to address the sophisticated demands of data retrieval and analysis,detailed well profiling,computation of key technical indicators,and the solutions to complex problems in re...A large language model(LLM)is constructed to address the sophisticated demands of data retrieval and analysis,detailed well profiling,computation of key technical indicators,and the solutions to complex problems in reservoir performance analysis(RPA).The LLM is constructed for RPA scenarios with incremental pre-training,fine-tuning,and functional subsystems coupling.Functional subsystem and efficient coupling methods are proposed based on named entity recognition(NER),tool invocation,and Text-to-SQL construction,all aimed at resolving pivotal challenges in developing the specific application of LLMs for RDA.This study conducted a detailed accuracy test on feature extraction models,tool classification models,data retrieval models and analysis recommendation models.The results indicate that these models have demonstrated good performance in various key aspects of reservoir dynamic analysis.The research takes some injection and production well groups in the PK3 Block of the Daqing Oilfield as an example for testing.Testing results show that our model has significant potential and practical value in assisting reservoir engineers with RDA.The research results provide a powerful support to the application of LLM in reservoir performance analysis.展开更多
Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in speci...Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in specific tasks with reduced training costs,the substantial memory requirements during fine-tuning present a barrier to broader deployment.Parameter-Efficient Fine-Tuning(PEFT)techniques,such as Low-Rank Adaptation(LoRA),and parameter quantization methods have emerged as solutions to address these challenges by optimizing memory usage and computational efficiency.Among these,QLoRA,which combines PEFT and quantization,has demonstrated notable success in reducing memory footprints during fine-tuning,prompting the development of various QLoRA variants.Despite these advancements,the quantitative impact of key variables on the fine-tuning performance of quantized LLMs remains underexplored.This study presents a comprehensive analysis of these key variables,focusing on their influence across different layer types and depths within LLM architectures.Our investigation uncovers several critical findings:(1)Larger layers,such as MLP layers,can maintain performance despite reductions in adapter rank,while smaller layers,like self-attention layers,aremore sensitive to such changes;(2)The effectiveness of balancing factors depends more on specific values rather than layer type or depth;(3)In quantization-aware fine-tuning,larger layers can effectively utilize smaller adapters,whereas smaller layers struggle to do so.These insights suggest that layer type is a more significant determinant of fine-tuning success than layer depth when optimizing quantized LLMs.Moreover,for the same discount of trainable parameters,reducing the trainable parameters in a larger layer is more effective in preserving fine-tuning accuracy than in a smaller one.This study provides valuable guidance for more efficient fine-tuning strategies and opens avenues for further research into optimizing LLM fine-tuning in resource-constrained environments.展开更多
Leveraging the Baidu Qianfan model platform,this paper designs and implements a highly efficient and accurate scoring system for subjective questions,focusing primarily on questions in the field of computer network te...Leveraging the Baidu Qianfan model platform,this paper designs and implements a highly efficient and accurate scoring system for subjective questions,focusing primarily on questions in the field of computer network technology.The system enhances the foundational model by utilizing Qianfan’s training tools and integrating advanced techniques,such as supervised fine-tuning.In the data preparation phase,a comprehensive collection of subjective data related to computer network technology is gathered,cleaned,and labeled.During model training and evaluation,optimal hyperparameters and tuning strategies are applied,resulting in a model capable of scoring with high accuracy.Evaluation results demonstrate that the proposed model performs well across multiple dimensions-content,expression,and development scores-yielding results comparable to those of manual scoring.展开更多
A novel deep neural network compression model for airport object detection has been presented.This novel model aims at disadvantages of deep neural network,i.e.the complexity of the model and the great cost of calcula...A novel deep neural network compression model for airport object detection has been presented.This novel model aims at disadvantages of deep neural network,i.e.the complexity of the model and the great cost of calculation.According to the requirement of airport object detection,the model obtains temporal and spatial semantic rules from the uncompressed model.These spatial semantic rules are added to the model after parameter compression to assist the detection.The rules can improve the accuracy of the detection model in order to make up for the loss caused by parameter compression.The experiments show that the effect of the novel compression detection model is no worse than that of the uncompressed original model.Even some of the original model false detection can be eliminated through the prior knowledge.展开更多
Although the deep learning technology has shown great power in solving the complex tasks, these neural network models are large and redundant as a matter of fact, which makes these networks difficult to be placed in e...Although the deep learning technology has shown great power in solving the complex tasks, these neural network models are large and redundant as a matter of fact, which makes these networks difficult to be placed in embedded devices with limited memory and computing resources. In order to compress the neural network to a slimmer and smaller one, the multi-grained network pruning framework is proposed in this paper. In our framework, the pruning process was divided into the filter-level pruning and the weight-level pruning. In the process of the filter-level pruning, the importance of the filter was measured by the entropy of the activation tensor of the filter. In the other process, the dynamic recoverable pruning method was adopted to prune the weights deeply. Different from these popular pruning methods, the weight-level pruning is also taken into account based on the employment of the filter-level pruning to achieve more effectively pruning. The proposed approach is validated on two representative CNN models - AlexNet and VGG16, pre-trained on ILSVRC12. Experimental results show that AlexNet and VGG16 network models are compressed 19.75- and 22.53- respectively by this approach, which are 2.05 and 5.89 higher than the classical approaches of dynamic Network Surgery and ThiNet.展开更多
果园环境下柑橘的快速准确检测是自主采摘机器人作业的关键.针对现有的模型过于冗余、检测速度与精度不平衡等问题,提出一种轻量型果园环境果实检测方法.在YOLOv4算法的基础上引入焦点损失函数(Focal Loss)来提高模型在二分类检测任务...果园环境下柑橘的快速准确检测是自主采摘机器人作业的关键.针对现有的模型过于冗余、检测速度与精度不平衡等问题,提出一种轻量型果园环境果实检测方法.在YOLOv4算法的基础上引入焦点损失函数(Focal Loss)来提高模型在二分类检测任务中的负样本挖掘能力,并针对模型参数冗余等问题提出一种优化的模型剪枝方法.试验结果表明:提出的方法在果园环境中柑橘果实数据集检测得到的平均精度均值(mean average precision,M_(AP))达到94.22%,相较于YOLOv4模型提高了1.18%,模型参数减小了95.22%,模型尺寸为原来的4.84%,检测速度为原来的4.03倍.展开更多
文摘Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88.
基金partially supported by grants RZ2009-00006-00-00(Instituto Nacional de Investigacion y Tecnología Agraria y Alimentaria,Ministerio de Ciencia e Innovación,Spain)AGL-2013-42726-R(Secretaria de Estado de Investigacion,Desarrollo e Innovacion,Ministerio de Economia y Competitividad,Spain)+1 种基金supported by a Ph.D.fellowship from the ceiA3(Andalucia,Spain)with funding provided by Banco Santander through its Global Division,Santander Universidadesfunded by the Swedish Foundation for Equine Research,Stockholm,Sweden(H14-47-008)
文摘Background: Sperm DNA fragmentation(sDF) has been proved to be an important parameter in order to predict in vitro the potential fertility of a semen sample. Colloid centrifugation could be a suitable technique to select those donkey sperm more resistant to DNA fragmentation after thawing. Previous studies have shown that to elucidate the latent damage of the DNA molecule, sDF should be assessed dynamically, where the rate of fragmentation between treatments indicates how resistant the DNA is to iatrogenic damage. The rate of fragmentation is calculated using the slope of a linear regression equation. However, it has not been studied if s DF dynamics fit this model. The objectives of this study were to evaluate the effect of different after-thawing centrifugation protocols on sperm DNA fragmentation and elucidate the most accurate mathematical model(linear regression, exponential or polynomial) for DNA fragmentation over time in frozen-thawed donkey semen.Results: After submitting post-thaw semen samples to no centrifugation(UDC), sperm washing(SW) or single layer centrifugation(SLC) protocols, sD F values after 6 h of incubation were significantly lower in SLC samples than in SW or UDC.Coefficient of determination(R-2) values were significantly higher for a second order polynomial model than for linear or exponential. The highest values for acceleration of fragmentation(aSDF) were obtained for SW, fol owed by SLC and UDC.Conclusion: SLC after thawing seems to preserve longer DNA longevity in comparison to UDC and SW. Moreover,the fine-tuning of models has shown that sDF dynamics in frozen-thawed donkey semen fit a second order polynomial model, which implies that fragmentation rate is not constant and fragmentation acceleration must be taken into account to elucidate hidden damage in the DNA molecule.
基金the Assets4Rail Project which is funded by the Shift2Rail Joint Undertaking under the EU’s H2020 program(Grant No.826250)the Open Research Fund of State Key Laboratory of Traction Power of Southwest Jiaotong University(Grant No.TPL2011)+1 种基金part of the experiment data concerning the railway line is supported by the DynoTRAIN Project,funded by European Commission(Grant No.234079)The first author is also supported by the China Scholarship Council(Grant No.201707000113).
文摘The existing multi-objective wheel profile optimization methods mainly consist of three sub-modules:(1)wheel profile generation,(2)multi-body dynamics simulation,and(3)an optimization algorithm.For the first module,a comparably conservative rotary-scaling finetuning(RSFT)method,which introduces two design variables and an empirical formula,is proposed to fine-tune the traditional wheel profiles for improving their engineering applicability.For the second module,for the TRAXX locomotives serving on the Blankenburg–Rubeland line,an optimization function representing the relationship between the wheel profile and the wheel–rail wear number is established based on Kriging surrogate model(KSM).For the third module,a method combining the regression capability of KSM with the iterative computing power of particle swarm optimization(PSO)is proposed to quickly and reliably implement the task of optimizing wheel profiles.Finally,with the RSFT–KSM–PSO method,we propose two wear-resistant wheel profiles for the TRAXX locomotives serving on the Blankenburg–Rubeland line,namely S1002-S and S1002-M.The S1002-S profile minimizes the total wear number by 30%,while the S1002-M profile makes the wear distribution more uniform through a proper sacrifice of the tread wear number,and the total wear number is reduced by 21%.The quasi-static and hunting stability tests further demonstrate that the profile designed by the RSFT–KSM–PSO method is promising for practical engineering applications.
基金supported by Institute of Information&communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.2022-0-00377,Development of Intelligent Analysis and Classification Based Contents Class Categorization Technique to Prevent Imprudent Harmful Media Distribution).
文摘Automatic speech recognition(ASR)systems have emerged as indispensable tools across a wide spectrum of applications,ranging from transcription services to voice-activated assistants.To enhance the performance of these systems,it is important to deploy efficient models capable of adapting to diverse deployment conditions.In recent years,on-demand pruning methods have obtained significant attention within the ASR domain due to their adaptability in various deployment scenarios.However,these methods often confront substantial trade-offs,particularly in terms of unstable accuracy when reducing the model size.To address challenges,this study introduces two crucial empirical findings.Firstly,it proposes the incorporation of an online distillation mechanism during on-demand pruning training,which holds the promise of maintaining more consistent accuracy levels.Secondly,it proposes the utilization of the Mogrifier long short-term memory(LSTM)language model(LM),an advanced iteration of the conventional LSTM LM,as an effective alternative for pruning targets within the ASR framework.Through rigorous experimentation on the ASR system,employing the Mogrifier LSTM LM and training it using the suggested joint on-demand pruning and online distillation method,this study provides compelling evidence.The results exhibit that the proposed methods significantly outperform a benchmark model trained solely with on-demand pruning methods.Impressively,the proposed strategic configuration successfully reduces the parameter count by approximately 39%,all the while minimizing trade-offs.
基金Supported by Sichuan Science and Technology Program(2021YFQ0003,2023YFSY0026,2023YFH0004).
文摘In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and computing power advance,the issue of increasingly larger models and a growing number of parameters has surfaced.Consequently,model training has become more costly and less efficient.To enhance the efficiency and accuracy of the training process while reducing themodel volume,this paper proposes a first-order pruningmodel PAL-BERT based on the ALBERT model according to the characteristics of question-answering(QA)system and language model.Firstly,a first-order network pruning method based on the ALBERT model is designed,and the PAL-BERT model is formed.Then,the parameter optimization strategy of the PAL-BERT model is formulated,and the Mish function was used as an activation function instead of ReLU to improve the performance.Finally,after comparison experiments with traditional deep learning models TextCNN and BiLSTM,it is confirmed that PALBERT is a pruning model compression method that can significantly reduce training time and optimize training efficiency.Compared with traditional models,PAL-BERT significantly improves the NLP task’s performance.
基金Supported by the National Natural Science Foundation of China(72088101,42372175)PetroChina Science and Technology Innovation Fund Program(2021DQ02-0904)。
文摘This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.
基金Supported by the National Talent Fund of the Ministry of Science and Technology of China(20230240011)China University of Geosciences(Wuhan)Research Fund(162301192687)。
文摘A large language model(LLM)is constructed to address the sophisticated demands of data retrieval and analysis,detailed well profiling,computation of key technical indicators,and the solutions to complex problems in reservoir performance analysis(RPA).The LLM is constructed for RPA scenarios with incremental pre-training,fine-tuning,and functional subsystems coupling.Functional subsystem and efficient coupling methods are proposed based on named entity recognition(NER),tool invocation,and Text-to-SQL construction,all aimed at resolving pivotal challenges in developing the specific application of LLMs for RDA.This study conducted a detailed accuracy test on feature extraction models,tool classification models,data retrieval models and analysis recommendation models.The results indicate that these models have demonstrated good performance in various key aspects of reservoir dynamic analysis.The research takes some injection and production well groups in the PK3 Block of the Daqing Oilfield as an example for testing.Testing results show that our model has significant potential and practical value in assisting reservoir engineers with RDA.The research results provide a powerful support to the application of LLM in reservoir performance analysis.
基金supported by the National Key R&D Program of China(No.2021YFB0301200)National Natural Science Foundation of China(No.62025208).
文摘Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in specific tasks with reduced training costs,the substantial memory requirements during fine-tuning present a barrier to broader deployment.Parameter-Efficient Fine-Tuning(PEFT)techniques,such as Low-Rank Adaptation(LoRA),and parameter quantization methods have emerged as solutions to address these challenges by optimizing memory usage and computational efficiency.Among these,QLoRA,which combines PEFT and quantization,has demonstrated notable success in reducing memory footprints during fine-tuning,prompting the development of various QLoRA variants.Despite these advancements,the quantitative impact of key variables on the fine-tuning performance of quantized LLMs remains underexplored.This study presents a comprehensive analysis of these key variables,focusing on their influence across different layer types and depths within LLM architectures.Our investigation uncovers several critical findings:(1)Larger layers,such as MLP layers,can maintain performance despite reductions in adapter rank,while smaller layers,like self-attention layers,aremore sensitive to such changes;(2)The effectiveness of balancing factors depends more on specific values rather than layer type or depth;(3)In quantization-aware fine-tuning,larger layers can effectively utilize smaller adapters,whereas smaller layers struggle to do so.These insights suggest that layer type is a more significant determinant of fine-tuning success than layer depth when optimizing quantized LLMs.Moreover,for the same discount of trainable parameters,reducing the trainable parameters in a larger layer is more effective in preserving fine-tuning accuracy than in a smaller one.This study provides valuable guidance for more efficient fine-tuning strategies and opens avenues for further research into optimizing LLM fine-tuning in resource-constrained environments.
文摘Leveraging the Baidu Qianfan model platform,this paper designs and implements a highly efficient and accurate scoring system for subjective questions,focusing primarily on questions in the field of computer network technology.The system enhances the foundational model by utilizing Qianfan’s training tools and integrating advanced techniques,such as supervised fine-tuning.In the data preparation phase,a comprehensive collection of subjective data related to computer network technology is gathered,cleaned,and labeled.During model training and evaluation,optimal hyperparameters and tuning strategies are applied,resulting in a model capable of scoring with high accuracy.Evaluation results demonstrate that the proposed model performs well across multiple dimensions-content,expression,and development scores-yielding results comparable to those of manual scoring.
文摘A novel deep neural network compression model for airport object detection has been presented.This novel model aims at disadvantages of deep neural network,i.e.the complexity of the model and the great cost of calculation.According to the requirement of airport object detection,the model obtains temporal and spatial semantic rules from the uncompressed model.These spatial semantic rules are added to the model after parameter compression to assist the detection.The rules can improve the accuracy of the detection model in order to make up for the loss caused by parameter compression.The experiments show that the effect of the novel compression detection model is no worse than that of the uncompressed original model.Even some of the original model false detection can be eliminated through the prior knowledge.
文摘Although the deep learning technology has shown great power in solving the complex tasks, these neural network models are large and redundant as a matter of fact, which makes these networks difficult to be placed in embedded devices with limited memory and computing resources. In order to compress the neural network to a slimmer and smaller one, the multi-grained network pruning framework is proposed in this paper. In our framework, the pruning process was divided into the filter-level pruning and the weight-level pruning. In the process of the filter-level pruning, the importance of the filter was measured by the entropy of the activation tensor of the filter. In the other process, the dynamic recoverable pruning method was adopted to prune the weights deeply. Different from these popular pruning methods, the weight-level pruning is also taken into account based on the employment of the filter-level pruning to achieve more effectively pruning. The proposed approach is validated on two representative CNN models - AlexNet and VGG16, pre-trained on ILSVRC12. Experimental results show that AlexNet and VGG16 network models are compressed 19.75- and 22.53- respectively by this approach, which are 2.05 and 5.89 higher than the classical approaches of dynamic Network Surgery and ThiNet.
文摘果园环境下柑橘的快速准确检测是自主采摘机器人作业的关键.针对现有的模型过于冗余、检测速度与精度不平衡等问题,提出一种轻量型果园环境果实检测方法.在YOLOv4算法的基础上引入焦点损失函数(Focal Loss)来提高模型在二分类检测任务中的负样本挖掘能力,并针对模型参数冗余等问题提出一种优化的模型剪枝方法.试验结果表明:提出的方法在果园环境中柑橘果实数据集检测得到的平均精度均值(mean average precision,M_(AP))达到94.22%,相较于YOLOv4模型提高了1.18%,模型参数减小了95.22%,模型尺寸为原来的4.84%,检测速度为原来的4.03倍.