We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract informa...We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract information from medical text,facilitating more accurate classification while minimizing the number of trainable parameters.Extensive experiments conducted on various datasets demonstrate the effectiveness of our approach.展开更多
Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requir...Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88.展开更多
This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large mode...This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.展开更多
As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate unders...As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.展开更多
This letter evaluates the article by Gravina et al on ChatGPT’s potential in providing medical information for inflammatory bowel disease patients.While promising,it highlights the need for advanced techniques like r...This letter evaluates the article by Gravina et al on ChatGPT’s potential in providing medical information for inflammatory bowel disease patients.While promising,it highlights the need for advanced techniques like reasoning+action and retrieval-augmented generation to improve accuracy and reliability.Emphasizing that simple question and answer testing is insufficient,it calls for more nuanced evaluation methods to truly gauge large language models’capabilities in clinical applications.展开更多
With the construction of new power systems,the power grid has become extremely large,with an increasing proportion of new energy and AC/DC hybrid connections.The dynamic characteristics and fault patterns of the power...With the construction of new power systems,the power grid has become extremely large,with an increasing proportion of new energy and AC/DC hybrid connections.The dynamic characteristics and fault patterns of the power grid are complex;additionally,power grid control is difficult,operation risks are high,and the task of fault handling is arduous.Traditional power-grid fault handling relies primarily on human experience.The difference in and lack of knowledge reserve of control personnel restrict the accuracy and timeliness of fault handling.Therefore,this mode of operation is no longer suitable for the requirements of new systems.Based on the multi-source heterogeneous data of power grid dispatch,this paper proposes a joint entity–relationship extraction method for power-grid dispatch fault processing based on a pre-trained model,constructs a knowledge graph of power-grid dispatch fault processing and designs,and develops a fault-processing auxiliary decision-making system based on the knowledge graph.It was applied to study a provincial dispatch control center,and it effectively improved the accident processing ability and intelligent level of accident management and control of the power grid.展开更多
Multimodal sentiment analysis is an essential area of research in artificial intelligence that combines multiple modes,such as text and image,to accurately assess sentiment.However,conventional approaches that rely on...Multimodal sentiment analysis is an essential area of research in artificial intelligence that combines multiple modes,such as text and image,to accurately assess sentiment.However,conventional approaches that rely on unimodal pre-trained models for feature extraction from each modality often overlook the intrinsic connections of semantic information between modalities.This limitation is attributed to their training on unimodal data,and necessitates the use of complex fusion mechanisms for sentiment analysis.In this study,we present a novel approach that combines a vision-language pre-trained model with a proposed multimodal contrastive learning method.Our approach harnesses the power of transfer learning by utilizing a vision-language pre-trained model to extract both visual and textual representations in a unified framework.We employ a Transformer architecture to integrate these representations,thereby enabling the capture of rich semantic infor-mation in image-text pairs.To further enhance the representation learning of these pairs,we introduce our proposed multimodal contrastive learning method,which leads to improved performance in sentiment analysis tasks.Our approach is evaluated through extensive experiments on two publicly accessible datasets,where we demonstrate its effectiveness.We achieve a significant improvement in sentiment analysis accuracy,indicating the supe-riority of our approach over existing techniques.These results highlight the potential of multimodal sentiment analysis and underscore the importance of considering the intrinsic semantic connections between modalities for accurate sentiment assessment.展开更多
The Coronavirus Disease 2019(COVID-19)is wreaking havoc around the world,bring out that the enormous pressure on national health and medical staff systems.One of the most effective and critical steps in the fight agai...The Coronavirus Disease 2019(COVID-19)is wreaking havoc around the world,bring out that the enormous pressure on national health and medical staff systems.One of the most effective and critical steps in the fight against COVID-19,is to examine the patient’s lungs based on the Chest X-ray and CT generated by radiation imaging.In this paper,five keras-related deep learning models:ResNet50,InceptionResNetV2,Xception,transfer learning and pre-trained VGGNet16 is applied to formulate an classification-detection approaches of COVID-19.Two benchmark methods SVM(Support Vector Machine),CNN(Conventional Neural Networks)are provided to compare with the classification-detection approaches based on the performance indicators,i.e.,precision,recall,F1 scores,confusion matrix,classification accuracy and three types of AUC(Area Under Curve).The highest classification accuracy derived by classification-detection based on 5857 Chest X-rays and 767 Chest CTs are respectively 84%and 75%,which shows that the keras-related deep learning approaches facilitate accurate and effective COVID-19-assisted detection.展开更多
With current success of large-scale pre-trained models(PTMs),how efficiently adapting PTMs to downstream tasks has attracted tremendous attention,especially for PTMs with billions of parameters.Previous work focuses o...With current success of large-scale pre-trained models(PTMs),how efficiently adapting PTMs to downstream tasks has attracted tremendous attention,especially for PTMs with billions of parameters.Previous work focuses on designing parameter-efficient tuning paradigms but needs to save and compute the gradient of the whole computational graph.In this paper,we propose y-Tuning,an efficient yet effective paradigm to adapt frozen large-scale PTMs to specific downstream tasks.y-Tuning learns dense representations for labels y defined in a given task and aligns them to fixed feature representation.Without computing the gradients of text encoder at training phrase,y-Tuning is not only parameterefficient but also training-efficient.Experimental results show that for DeBERTaxxL with 1.6 billion parameters,y-Tuning achieves performance more than 96%of full fine-tuning on GLUE Benchmark with only 2%tunable parameters and much fewer training costs.展开更多
With the urgent demand for generalized deep models,many pre-trained big models are proposed,such as bidirectional encoder representations(BERT),vision transformer(ViT),generative pre-trained transformers(GPT),etc.Insp...With the urgent demand for generalized deep models,many pre-trained big models are proposed,such as bidirectional encoder representations(BERT),vision transformer(ViT),generative pre-trained transformers(GPT),etc.Inspired by the success of these models in single domains(like computer vision and natural language processing),the multi-modal pre-trained big models have also drawn more and more attention in recent years.In this work,we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cutting-edge works.Specifically,we firstly introduce the background of multi-modal pre-training by reviewing the conventional deep learning,pre-training works in natural language process,computer vision,and speech.Then,we introduce the task definition,key challenges,and advantages of multi-modal pre-training models(MM-PTMs),and discuss the MM-PTMs with a focus on data,objectives,network architectures,and knowledge enhanced pre-training.After that,we introduce the downstream tasks used for the validation of large-scale MM-PTMs,including generative,classification,and regression tasks.We also give visualization and analysis of the model parameters and results on representative downstream tasks.Finally,we point out possible research directions for this topic that may benefit future works.In addition,we maintain a continuously updated paper list for large-scale pre-trained multi-modal big models:https://github.com/wangxiao5791509/MultiModal_BigModels_Survey.展开更多
The pre-training-then-fine-tuning paradigm has been widely used in deep learning.Due to the huge computation cost for pre-training,practitioners usually download pre-trained models from the Internet and fine-tune them...The pre-training-then-fine-tuning paradigm has been widely used in deep learning.Due to the huge computation cost for pre-training,practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets,while the downloaded models may suffer backdoor attacks.Different from previous attacks aiming at a target task,we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task information.Attackers can restrict the output representations(the values of output neurons)of trigger-embedded samples to arbitrary predefined values through additional training,namely neuron-level backdoor attack(NeuBA).Since fine-tuning has little effect on model parameters,the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same trigger.To provoke multiple labels in a specific task,attackers can introduce several triggers with predefined contrastive values.In the experiments of both natural language processing(NLP)and computer vision(CV),we show that NeuBA can well control the predictions for trigger-embedded instances with different trigger designs.Our findings sound a red alarm for the wide use of pre-trained models.Finally,we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons.展开更多
In recent years,with the great success of pre-trained language models,the pre-trained BERT model has been gradually applied to the field of source code understanding.However,the time cost of training a language model ...In recent years,with the great success of pre-trained language models,the pre-trained BERT model has been gradually applied to the field of source code understanding.However,the time cost of training a language model from zero is very high,and how to transfer the pre-trained language model to the field of smart contract vulnerability detection is a hot research direction at present.In this paper,we propose a hybrid model to detect common vulnerabilities in smart contracts based on a lightweight pre-trained languagemodel BERT and connected to a bidirectional gate recurrent unitmodel.The downstream neural network adopts the bidirectional gate recurrent unit neural network model with a hierarchical attention mechanism to mine more semantic features contained in the source code of smart contracts by using their characteristics.Our experiments show that our proposed hybrid neural network model SolBERT-BiGRU-Attention is fitted by a large number of data samples with smart contract vulnerabilities,and it is found that compared with the existing methods,the accuracy of our model can reach 93.85%,and the Micro-F1 Score is 94.02%.展开更多
Black fungus is a rare and dangerous mycology that usually affects the brain and lungs and could be life-threatening in diabetic cases.Recently,some COVID-19 survivors,especially those with co-morbid diseases,have bee...Black fungus is a rare and dangerous mycology that usually affects the brain and lungs and could be life-threatening in diabetic cases.Recently,some COVID-19 survivors,especially those with co-morbid diseases,have been susceptible to black fungus.Therefore,recovered COVID-19 patients should seek medical support when they notice mucormycosis symptoms.This paper proposes a novel ensemble deep-learning model that includes three pre-trained models:reset(50),VGG(19),and Inception.Our approach is medically intuitive and efficient compared to the traditional deep learning models.An image dataset was aggregated from various resources and divided into two classes:a black fungus class and a skin infection class.To the best of our knowledge,our study is the first that is concerned with building black fungus detection models based on deep learning algorithms.The proposed approach can significantly improve the performance of the classification task and increase the generalization ability of such a binary classification task.According to the reported results,it has empirically achieved a sensitivity value of 0.9907,a specificity value of 0.9938,a precision value of 0.9938,and a negative predictive value of 0.9907.展开更多
As an essential category of public event management and control,sentiment analysis of online public opinion text plays a vital role in public opinion early warning,network rumor management,and netizens’person-ality p...As an essential category of public event management and control,sentiment analysis of online public opinion text plays a vital role in public opinion early warning,network rumor management,and netizens’person-ality portraits under massive public opinion data.The traditional sentiment analysis model is not sensitive to the location information of words,it is difficult to solve the problem of polysemy,and the learning representation ability of long and short sentences is very different,which leads to the low accuracy of sentiment classification.This paper proposes a sentiment analysis model PERT-BiLSTM-Att for public opinion text based on the pre-training model of the disordered language model,bidirectional long-term and short-term memory network and attention mechanism.The model first uses the PERT model pre-trained from the lexical location information of a large amount of corpus to process the text data and obtain the dynamic feature representation of the text.Then the semantic features are input into BiLSTM to learn context sequence information and enhance the model’s ability to represent long sequences.Finally,the attention mechanism is used to focus on the words that contribute more to the overall emotional tendency to make up for the lack of short text representation ability of the traditional model,and then the classification results are output through the fully connected network.The experimental results show that the classification accuracy of the model on NLPCC14 and weibo_senti_100k public data sets reach 88.56%and 97.05%,respectively,and the accuracy reaches 95.95%on the data set MDC22 composed of Meituan,Dianping and Ctrip comment.It proves that the model has a good effect on sentiment analysis of online public opinion texts on different platforms.The experimental results on different datasets verify the model’s effectiveness in applying sentiment analysis of texts.At the same time,the model has a strong generalization ability and can achieve good results for sentiment analysis of datasets in different fields.展开更多
Artificial Intelligence(AI)and Computer Vision(CV)advancements have led to many useful methodologies in recent years,particularly to help visually-challenged people.Object detection includes a variety of challenges,fo...Artificial Intelligence(AI)and Computer Vision(CV)advancements have led to many useful methodologies in recent years,particularly to help visually-challenged people.Object detection includes a variety of challenges,for example,handlingmultiple class images,images that get augmented when captured by a camera and so on.The test images include all these variants as well.These detection models alert them about their surroundings when they want to walk independently.This study compares four CNN-based pre-trainedmodels:ResidualNetwork(ResNet-50),Inception v3,DenseConvolutional Network(DenseNet-121),and SqueezeNet,predominantly used in image recognition applications.Based on the analysis performed on these test images,the study infers that Inception V3 outperformed other pre-trained models in terms of accuracy and speed.To further improve the performance of the Inception v3 model,the thermal exchange optimization(TEO)algorithm is applied to tune the hyperparameters(number of epochs,batch size,and learning rate)showing the novelty of the work.Better accuracy was achieved owing to the inclusion of an auxiliary classifier as a regularizer,hyperparameter optimizer,and factorization approach.Additionally,Inception V3 can handle images of different sizes.This makes Inception V3 the optimum model for assisting visually challenged people in real-world communication when integrated with Internet of Things(IoT)-based devices.展开更多
Corona Virus(COVID-19)is a novel virus that crossed an animal-human barrier and emerged in Wuhan,China.Until now it has affected more than 119 million people.Detection of COVID-19 is a critical task and due to a large...Corona Virus(COVID-19)is a novel virus that crossed an animal-human barrier and emerged in Wuhan,China.Until now it has affected more than 119 million people.Detection of COVID-19 is a critical task and due to a large number of patients,a shortage of doctors has occurred for its detection.In this paper,a model has been suggested that not only detects the COVID-19 using X-ray and CT-Scan images but also shows the affected areas.Three classes have been defined;COVID-19,normal,and Pneumonia for X-ray images.For CT-Scan images,2 classes have been defined COVID-19 and non-COVID-19.For classi-fication purposes,pretrained models like ResNet50,VGG-16,and VGG19 have been used with some tuning.For detecting the affected areas Gradient-weighted Class Activation Mapping(GradCam)has been used.As the X-rays and ct images are taken at different intensities,so the contrast limited adaptive histogram equalization(CLAHE)has been applied to see the effect on the training of the models.As a result of these experiments,we achieved a maximum validation accuracy of 88.10%with a training accuracy of 88.48%for CT-Scan images using the ResNet50 model.While for X-ray images we achieved a maximum validation accuracy of 97.31%with a training accuracy of 95.64%using the VGG16 model.展开更多
文摘We present an approach to classify medical text at a sentence level automatically.Given the inherent complexity of medical text classification,we employ adapters based on pre-trained language models to extract information from medical text,facilitating more accurate classification while minimizing the number of trainable parameters.Extensive experiments conducted on various datasets demonstrate the effectiveness of our approach.
文摘Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88.
基金Supported by the National Natural Science Foundation of China(72088101,42372175)PetroChina Science and Technology Innovation Fund Program(2021DQ02-0904)。
文摘This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.
基金financially supported by the Natural Science Foundation of China(Grant No.42301492)the National Key R&D Program of China(Grant Nos.2022YFF0711600,2022YFF0801201,2022YFF0801200)+3 种基金the Major Special Project of Xinjiang(Grant No.2022A03009-3)the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources(Grant No.KF-2022-07014)the Opening Fund of the Key Laboratory of the Geological Survey and Evaluation of the Ministry of Education(Grant No.GLAB 2023ZR01)the Fundamental Research Funds for the Central Universities。
文摘As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.
文摘This letter evaluates the article by Gravina et al on ChatGPT’s potential in providing medical information for inflammatory bowel disease patients.While promising,it highlights the need for advanced techniques like reasoning+action and retrieval-augmented generation to improve accuracy and reliability.Emphasizing that simple question and answer testing is insufficient,it calls for more nuanced evaluation methods to truly gauge large language models’capabilities in clinical applications.
基金supported by the Science and Technology Project of the State Grid Corporation“Research on Key Technologies of Power Artificial Intelligence Open Platform”(5700-202155260A-0-0-00).
文摘With the construction of new power systems,the power grid has become extremely large,with an increasing proportion of new energy and AC/DC hybrid connections.The dynamic characteristics and fault patterns of the power grid are complex;additionally,power grid control is difficult,operation risks are high,and the task of fault handling is arduous.Traditional power-grid fault handling relies primarily on human experience.The difference in and lack of knowledge reserve of control personnel restrict the accuracy and timeliness of fault handling.Therefore,this mode of operation is no longer suitable for the requirements of new systems.Based on the multi-source heterogeneous data of power grid dispatch,this paper proposes a joint entity–relationship extraction method for power-grid dispatch fault processing based on a pre-trained model,constructs a knowledge graph of power-grid dispatch fault processing and designs,and develops a fault-processing auxiliary decision-making system based on the knowledge graph.It was applied to study a provincial dispatch control center,and it effectively improved the accident processing ability and intelligent level of accident management and control of the power grid.
基金supported by Science and Technology Research Project of Jiangxi Education Department.Project Grant No.GJJ2203306.
文摘Multimodal sentiment analysis is an essential area of research in artificial intelligence that combines multiple modes,such as text and image,to accurately assess sentiment.However,conventional approaches that rely on unimodal pre-trained models for feature extraction from each modality often overlook the intrinsic connections of semantic information between modalities.This limitation is attributed to their training on unimodal data,and necessitates the use of complex fusion mechanisms for sentiment analysis.In this study,we present a novel approach that combines a vision-language pre-trained model with a proposed multimodal contrastive learning method.Our approach harnesses the power of transfer learning by utilizing a vision-language pre-trained model to extract both visual and textual representations in a unified framework.We employ a Transformer architecture to integrate these representations,thereby enabling the capture of rich semantic infor-mation in image-text pairs.To further enhance the representation learning of these pairs,we introduce our proposed multimodal contrastive learning method,which leads to improved performance in sentiment analysis tasks.Our approach is evaluated through extensive experiments on two publicly accessible datasets,where we demonstrate its effectiveness.We achieve a significant improvement in sentiment analysis accuracy,indicating the supe-riority of our approach over existing techniques.These results highlight the potential of multimodal sentiment analysis and underscore the importance of considering the intrinsic semantic connections between modalities for accurate sentiment assessment.
基金This project is supported by National Natural Science Foundation of China(NSFC)(Nos.61902158,61806087)Graduate student innovation program for academic degrees in general university in Jiangsu Province(No.KYZZ16-0337).
文摘The Coronavirus Disease 2019(COVID-19)is wreaking havoc around the world,bring out that the enormous pressure on national health and medical staff systems.One of the most effective and critical steps in the fight against COVID-19,is to examine the patient’s lungs based on the Chest X-ray and CT generated by radiation imaging.In this paper,five keras-related deep learning models:ResNet50,InceptionResNetV2,Xception,transfer learning and pre-trained VGGNet16 is applied to formulate an classification-detection approaches of COVID-19.Two benchmark methods SVM(Support Vector Machine),CNN(Conventional Neural Networks)are provided to compare with the classification-detection approaches based on the performance indicators,i.e.,precision,recall,F1 scores,confusion matrix,classification accuracy and three types of AUC(Area Under Curve).The highest classification accuracy derived by classification-detection based on 5857 Chest X-rays and 767 Chest CTs are respectively 84%and 75%,which shows that the keras-related deep learning approaches facilitate accurate and effective COVID-19-assisted detection.
基金National Key R&D Program of China(No.2020AAA0108702)National Natural Science Foundation of China(Grant No.62022027).
文摘With current success of large-scale pre-trained models(PTMs),how efficiently adapting PTMs to downstream tasks has attracted tremendous attention,especially for PTMs with billions of parameters.Previous work focuses on designing parameter-efficient tuning paradigms but needs to save and compute the gradient of the whole computational graph.In this paper,we propose y-Tuning,an efficient yet effective paradigm to adapt frozen large-scale PTMs to specific downstream tasks.y-Tuning learns dense representations for labels y defined in a given task and aligns them to fixed feature representation.Without computing the gradients of text encoder at training phrase,y-Tuning is not only parameterefficient but also training-efficient.Experimental results show that for DeBERTaxxL with 1.6 billion parameters,y-Tuning achieves performance more than 96%of full fine-tuning on GLUE Benchmark with only 2%tunable parameters and much fewer training costs.
基金supported by National Natural Science Foundation of China(Nos.61872256 and 62102205)Key-Area Research and Development Program of Guangdong Province,China(No.2021B0101400002)+1 种基金Peng Cheng Laboratory Key Research Project,China(No.PCL 2021A07)Multi-source Cross-platform Video Analysis and Understanding for Intelligent Perception in Smart City,China(No.U20B2052).
文摘With the urgent demand for generalized deep models,many pre-trained big models are proposed,such as bidirectional encoder representations(BERT),vision transformer(ViT),generative pre-trained transformers(GPT),etc.Inspired by the success of these models in single domains(like computer vision and natural language processing),the multi-modal pre-trained big models have also drawn more and more attention in recent years.In this work,we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cutting-edge works.Specifically,we firstly introduce the background of multi-modal pre-training by reviewing the conventional deep learning,pre-training works in natural language process,computer vision,and speech.Then,we introduce the task definition,key challenges,and advantages of multi-modal pre-training models(MM-PTMs),and discuss the MM-PTMs with a focus on data,objectives,network architectures,and knowledge enhanced pre-training.After that,we introduce the downstream tasks used for the validation of large-scale MM-PTMs,including generative,classification,and regression tasks.We also give visualization and analysis of the model parameters and results on representative downstream tasks.Finally,we point out possible research directions for this topic that may benefit future works.In addition,we maintain a continuously updated paper list for large-scale pre-trained multi-modal big models:https://github.com/wangxiao5791509/MultiModal_BigModels_Survey.
基金supported by the National Key Research and Development Program of China(No.2020AAA0106500)the National Natural Science Foundation of China(NSFC No.62236004).
文摘The pre-training-then-fine-tuning paradigm has been widely used in deep learning.Due to the huge computation cost for pre-training,practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets,while the downloaded models may suffer backdoor attacks.Different from previous attacks aiming at a target task,we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task information.Attackers can restrict the output representations(the values of output neurons)of trigger-embedded samples to arbitrary predefined values through additional training,namely neuron-level backdoor attack(NeuBA).Since fine-tuning has little effect on model parameters,the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same trigger.To provoke multiple labels in a specific task,attackers can introduce several triggers with predefined contrastive values.In the experiments of both natural language processing(NLP)and computer vision(CV),we show that NeuBA can well control the predictions for trigger-embedded instances with different trigger designs.Our findings sound a red alarm for the wide use of pre-trained models.Finally,we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons.
基金supported by the National Natural Science Foundation of China(Grant Nos.62272120,62106030,U20B2046,62272119,61972105)the Technology Innovation and Application Development Projects of Chongqing(Grant Nos.cstc2021jscx-gksbX0032,cstc2021jscxgksbX0029).
文摘In recent years,with the great success of pre-trained language models,the pre-trained BERT model has been gradually applied to the field of source code understanding.However,the time cost of training a language model from zero is very high,and how to transfer the pre-trained language model to the field of smart contract vulnerability detection is a hot research direction at present.In this paper,we propose a hybrid model to detect common vulnerabilities in smart contracts based on a lightweight pre-trained languagemodel BERT and connected to a bidirectional gate recurrent unitmodel.The downstream neural network adopts the bidirectional gate recurrent unit neural network model with a hierarchical attention mechanism to mine more semantic features contained in the source code of smart contracts by using their characteristics.Our experiments show that our proposed hybrid neural network model SolBERT-BiGRU-Attention is fitted by a large number of data samples with smart contract vulnerabilities,and it is found that compared with the existing methods,the accuracy of our model can reach 93.85%,and the Micro-F1 Score is 94.02%.
基金supported by the MSIT (Ministry of Science and ICT),Korea,under the ICAN (ICT Challenge and Advanced Network of HRD)Program (IITP-2023-2020-0-01832)supervised by the IITP (Institute of Information&Communications Technology Planning&Evaluation)and the Soonchunhyang University Research Fund.
文摘Black fungus is a rare and dangerous mycology that usually affects the brain and lungs and could be life-threatening in diabetic cases.Recently,some COVID-19 survivors,especially those with co-morbid diseases,have been susceptible to black fungus.Therefore,recovered COVID-19 patients should seek medical support when they notice mucormycosis symptoms.This paper proposes a novel ensemble deep-learning model that includes three pre-trained models:reset(50),VGG(19),and Inception.Our approach is medically intuitive and efficient compared to the traditional deep learning models.An image dataset was aggregated from various resources and divided into two classes:a black fungus class and a skin infection class.To the best of our knowledge,our study is the first that is concerned with building black fungus detection models based on deep learning algorithms.The proposed approach can significantly improve the performance of the classification task and increase the generalization ability of such a binary classification task.According to the reported results,it has empirically achieved a sensitivity value of 0.9907,a specificity value of 0.9938,a precision value of 0.9938,and a negative predictive value of 0.9907.
基金supported by the Chongqing Natural Science Foundation of China (Grant No.CSTB2022NSCQ-MSX1417)the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No.KJZD-K202200513)Chongqing Normal University Fund (Grant No.22XLB003).
文摘As an essential category of public event management and control,sentiment analysis of online public opinion text plays a vital role in public opinion early warning,network rumor management,and netizens’person-ality portraits under massive public opinion data.The traditional sentiment analysis model is not sensitive to the location information of words,it is difficult to solve the problem of polysemy,and the learning representation ability of long and short sentences is very different,which leads to the low accuracy of sentiment classification.This paper proposes a sentiment analysis model PERT-BiLSTM-Att for public opinion text based on the pre-training model of the disordered language model,bidirectional long-term and short-term memory network and attention mechanism.The model first uses the PERT model pre-trained from the lexical location information of a large amount of corpus to process the text data and obtain the dynamic feature representation of the text.Then the semantic features are input into BiLSTM to learn context sequence information and enhance the model’s ability to represent long sequences.Finally,the attention mechanism is used to focus on the words that contribute more to the overall emotional tendency to make up for the lack of short text representation ability of the traditional model,and then the classification results are output through the fully connected network.The experimental results show that the classification accuracy of the model on NLPCC14 and weibo_senti_100k public data sets reach 88.56%and 97.05%,respectively,and the accuracy reaches 95.95%on the data set MDC22 composed of Meituan,Dianping and Ctrip comment.It proves that the model has a good effect on sentiment analysis of online public opinion texts on different platforms.The experimental results on different datasets verify the model’s effectiveness in applying sentiment analysis of texts.At the same time,the model has a strong generalization ability and can achieve good results for sentiment analysis of datasets in different fields.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2023R191)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4310373DSR61)This study is supported via funding from Prince Sattam bin Abdulaziz University project number(PSAU/2023/R/1444).
文摘Artificial Intelligence(AI)and Computer Vision(CV)advancements have led to many useful methodologies in recent years,particularly to help visually-challenged people.Object detection includes a variety of challenges,for example,handlingmultiple class images,images that get augmented when captured by a camera and so on.The test images include all these variants as well.These detection models alert them about their surroundings when they want to walk independently.This study compares four CNN-based pre-trainedmodels:ResidualNetwork(ResNet-50),Inception v3,DenseConvolutional Network(DenseNet-121),and SqueezeNet,predominantly used in image recognition applications.Based on the analysis performed on these test images,the study infers that Inception V3 outperformed other pre-trained models in terms of accuracy and speed.To further improve the performance of the Inception v3 model,the thermal exchange optimization(TEO)algorithm is applied to tune the hyperparameters(number of epochs,batch size,and learning rate)showing the novelty of the work.Better accuracy was achieved owing to the inclusion of an auxiliary classifier as a regularizer,hyperparameter optimizer,and factorization approach.Additionally,Inception V3 can handle images of different sizes.This makes Inception V3 the optimum model for assisting visually challenged people in real-world communication when integrated with Internet of Things(IoT)-based devices.
文摘Corona Virus(COVID-19)is a novel virus that crossed an animal-human barrier and emerged in Wuhan,China.Until now it has affected more than 119 million people.Detection of COVID-19 is a critical task and due to a large number of patients,a shortage of doctors has occurred for its detection.In this paper,a model has been suggested that not only detects the COVID-19 using X-ray and CT-Scan images but also shows the affected areas.Three classes have been defined;COVID-19,normal,and Pneumonia for X-ray images.For CT-Scan images,2 classes have been defined COVID-19 and non-COVID-19.For classi-fication purposes,pretrained models like ResNet50,VGG-16,and VGG19 have been used with some tuning.For detecting the affected areas Gradient-weighted Class Activation Mapping(GradCam)has been used.As the X-rays and ct images are taken at different intensities,so the contrast limited adaptive histogram equalization(CLAHE)has been applied to see the effect on the training of the models.As a result of these experiments,we achieved a maximum validation accuracy of 88.10%with a training accuracy of 88.48%for CT-Scan images using the ResNet50 model.While for X-ray images we achieved a maximum validation accuracy of 97.31%with a training accuracy of 95.64%using the VGG16 model.