期刊文献+
共找到20篇文章
< 1 >
每页显示 20 50 100
Learning Top-K Subtask Planning Tree Based on Discriminative Representation Pretraining for Decision-making
1
作者 Jingqing Ruan Kaishen Wang +2 位作者 Qingyang Zhang Dengpeng Xing Bo Xu 《Machine Intelligence Research》 EI CSCD 2024年第4期782-800,共19页
Decomposing complex real-world tasks into simpler subtasks and devising a subtask execution plan is critical for humans to achieve effective decision-making.However,replicating this process remains challenging for AI ... Decomposing complex real-world tasks into simpler subtasks and devising a subtask execution plan is critical for humans to achieve effective decision-making.However,replicating this process remains challenging for AI agents and naturally raises two questions:(1)How to extract discriminative knowledge representation from priors?(2)How to develop a rational plan to decompose complex problems?To address these issues,we introduce a groundbreaking framework that incorporates two main contributions.First,our multiple-encoder and individual-predictor regime goes beyond traditional architectures to extract nuanced task-specific dynamics from datasets,enriching the feature space for subtasks.Second,we innovate in planning by introducing a top-K subtask planning tree generated through an attention mechanism,which allows for dynamic adaptability and forward-looking decision-making.Our framework is empirically validated against challenging benchmarks BabyAI including multiple combinatorially rich synthetic tasks(e.g.,GoToSeq,SynthSeq,BossLevel),where it not only outperforms competitive baselines but also demonstrates superior adaptability and effectiveness incomplex task decomposition. 展开更多
关键词 Reinforcement learning representation learning subtask planning task decomposition pretraining.
原文传递
Multimodal Pretraining from Monolingual to Multilingual 被引量:1
2
作者 Liang Zhang Ludan Ruan +1 位作者 Anwen Hu Qin Jin 《Machine Intelligence Research》 EI CSCD 2023年第2期220-232,共13页
Multimodal pretraining has made convincing achievements in various downstream tasks in recent years.However,since the majority of the existing works construct models based on English,their applications are limited by ... Multimodal pretraining has made convincing achievements in various downstream tasks in recent years.However,since the majority of the existing works construct models based on English,their applications are limited by language.In this work,we address this issue by developing models with multimodal and multilingual capabilities.We explore two types of methods to extend multimodal pretraining model from monolingual to multilingual.Specifically,we propose a pretraining-based model named multilingual multimodal pretraining(MLMM),and two generalization-based models named multilingual CLIP(M-CLIP)and multilingual acquisition(MLA).In addition,we further extend the generalization-based models to incorporate the audio modality and develop the multilingual CLIP for vision,language,and audio(CLIP4VLA).Our models achieve state-of-the-art performances on multilingual vision-text retrieval,visual question answering,and image captioning benchmarks.Based on the experimental results,we discuss the pros and cons of the two types of models and their potential practical applications. 展开更多
关键词 Multilingual pretraining multimodal pretraining cross-lingual transfer multilingual generation cross-modal retrieval
原文传递
SRS-Net: Training object detectors from scratch for remote sensing images without pretraining 被引量:1
3
作者 Haining WANG Yang LI +4 位作者 Yuqiang FANG Yurong LIAO Bitao JIANG Xitao ZHANG Shuyan NI 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第8期269-283,共15页
Most of the current object detection algorithms use pretrained models that are trained on ImageNet and then fine-tuned in the network,which can achieve good performance in terms of general object detectors.However,in ... Most of the current object detection algorithms use pretrained models that are trained on ImageNet and then fine-tuned in the network,which can achieve good performance in terms of general object detectors.However,in the field of remote sensing image object detection,as pretrained models are significantly different from remote sensing data,it is meaningful to explore a train-fromscratch technique for remote sensing images.This paper proposes an object detection framework trained from scratch,SRS-Net,and describes the design of a densely connected backbone network to provide integrated hidden layer supervision for the convolution module.Then,two necessary improvement principles are proposed:studying the role of normalization in the network structure,and improving data augmentation methods for remote sensing images.To evaluate the proposed framework,we performed many ablation experiments on the DIOR,DOTA,and AS datasets.The results show that whether using the improved backbone network,the normalization method or training data enhancement strategy,the performance of the object detection network trained from scratch increased.These principles compensate for the lack of pretrained models.Furthermore,we found that SRS-Net could achieve similar to or slightly better performance than baseline methods,and surpassed most advanced general detectors. 展开更多
关键词 Denseconnection Object detection pretraining Remote sensing image Trainfrom scratch
原文传递
MVContrast:Unsupervised Pretraining for Multi-view 3D Object Recognition
4
作者 Luequan Wang Hongbin Xu Wenxiong Kang 《Machine Intelligence Research》 EI CSCD 2023年第6期872-883,共12页
3D shape recognition has drawn much attention in recent years.The view-based approach performs best of all.However,the current multi-view methods are almost all fully supervised,and the pretraining models are almost a... 3D shape recognition has drawn much attention in recent years.The view-based approach performs best of all.However,the current multi-view methods are almost all fully supervised,and the pretraining models are almost all based on ImageNet.Although the pretraining results of ImageNet are quite impressive,there is still a significant discrepancy between multi-view datasets and ImageNet.Multi-view datasets naturally retain rich 3D information.In addition,large-scale datasets such as ImageNet require considerable cleaning and annotation work,so it is difficult to regenerate a second dataset.In contrast,unsupervised learning methods can learn general feature representations without any extra annotation.To this end,we propose a three-stage unsupervised joint pretraining model.Specifically,we decouple the final representations into three fine-grained representations.Data augmentation is utilized to obtain pixel-level representations within each view.And we boost the spatial invariant features from the view level.Finally,we exploit global information at the shape level through a novel extract-and-swap module.Experimental results demonstrate that the proposed method gains significantly in 3D object classification and retrieval tasks,and shows generalization to cross-dataset tasks. 展开更多
关键词 Multi view unsupervised pretraining contrastive learning 3D vision shape recognition
原文传递
Classification of Conversational Sentences Using an Ensemble Pre-Trained Language Model with the Fine-Tuned Parameter
5
作者 R.Sujatha K.Nimala 《Computers, Materials & Continua》 SCIE EI 2024年第2期1669-1686,共18页
Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requir... Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88. 展开更多
关键词 Bidirectional encoder for representation of transformer conversation ensemble model fine-tuning generalized autoregressive pretraining for language understanding generative pre-trained transformer hyperparameter tuning natural language processing robustly optimized BERT pretraining approach sentence classification transformer models
下载PDF
PAL-BERT:An Improved Question Answering Model
6
作者 Wenfeng Zheng Siyu Lu +3 位作者 Zhuohang Cai Ruiyang Wang Lei Wang Lirong Yin 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第6期2729-2745,共17页
In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and comput... In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and computing power advance,the issue of increasingly larger models and a growing number of parameters has surfaced.Consequently,model training has become more costly and less efficient.To enhance the efficiency and accuracy of the training process while reducing themodel volume,this paper proposes a first-order pruningmodel PAL-BERT based on the ALBERT model according to the characteristics of question-answering(QA)system and language model.Firstly,a first-order network pruning method based on the ALBERT model is designed,and the PAL-BERT model is formed.Then,the parameter optimization strategy of the PAL-BERT model is formulated,and the Mish function was used as an activation function instead of ReLU to improve the performance.Finally,after comparison experiments with traditional deep learning models TextCNN and BiLSTM,it is confirmed that PALBERT is a pruning model compression method that can significantly reduce training time and optimize training efficiency.Compared with traditional models,PAL-BERT significantly improves the NLP task’s performance. 展开更多
关键词 PAL-BERT question answering model pretraining language models ALBERT pruning model network pruning TextCNN BiLSTM
下载PDF
Unlocking the Potential:A Comprehensive Systematic Review of ChatGPT in Natural Language Processing Tasks
7
作者 Ebtesam Ahmad Alomari 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第10期43-85,共43页
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in... As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues. 展开更多
关键词 Generative AI large languagemodel(LLM) natural language processing(NLP) ChatGPT GPT(generative pretraining transformer) GPT-4 sentiment analysis NER information extraction ANNOTATION text classification
下载PDF
RoBGP:A Chinese Nested Biomedical Named Entity Recognition Model Based on RoBERTa and Global Pointer
8
作者 Xiaohui Cui Chao Song +4 位作者 Dongmei Li Xiaolong Qu Jiao Long Yu Yang Hanchao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3603-3618,共16页
Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and c... Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction. 展开更多
关键词 BIOMEDICINE knowledge base named entity recognition pretrained language model global pointer
下载PDF
An Efficient and Robust Hand Gesture Recognition System of Sign Language Employing Finetuned Inception-V3 and Efficientnet-B0 Network
9
作者 Adnan Hussain Sareer Ul Amin +1 位作者 Muhammad Fayaz Sanghyun Seo 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期3509-3525,共17页
Hand Gesture Recognition(HGR)is a promising research area with an extensive range of applications,such as surgery,video game techniques,and sign language translation,where sign language is a complicated structured for... Hand Gesture Recognition(HGR)is a promising research area with an extensive range of applications,such as surgery,video game techniques,and sign language translation,where sign language is a complicated structured form of hand gestures.The fundamental building blocks of structured expressions in sign language are the arrangement of the fingers,the orientation of the hand,and the hand’s position concerning the body.The importance of HGR has increased due to the increasing number of touchless applications and the rapid growth of the hearing-impaired population.Therefore,real-time HGR is one of the most effective interaction methods between computers and humans.Developing a user-free interface with good recognition performance should be the goal of real-time HGR systems.Nowadays,Convolutional Neural Network(CNN)shows great recognition rates for different image-level classification tasks.It is challenging to train deep CNN networks like VGG-16,VGG-19,Inception-v3,and Efficientnet-B0 from scratch because only some significant labeled image datasets are available for static hand gesture images.However,an efficient and robust hand gesture recognition system of sign language employing finetuned Inception-v3 and Efficientnet-Bo network is proposed to identify hand gestures using a comparative small HGR dataset.Experiments show that Inception-v3 achieved 90%accuracy and 0.93%precision,0.91%recall,and 0.90%f1-score,respectively,while EfficientNet-B0 achieved 99%accuracy and 0.98%,0.97%,0.98%,precision,recall,and f1-score respectively. 展开更多
关键词 Pretrained CNN hand gesture recognition transfer learning
下载PDF
Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling 被引量:4
10
作者 Liangping Ding Zhixiong Zhang +2 位作者 Huan Liu Jie Li GaihongYu 《Journal of Data and Information Science》 CSCD 2021年第3期35-57,共23页
Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to p... Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research.Design/methodology/approach:We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT,which was released by Google in 2018.We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain,which contains 100,000 abstracts as training set,6,000 abstracts as development set and 3,094 abstracts as test set.We use unsupervised keyphrase extraction methods including term frequency(TF),TF-IDF,TextRank and supervised machine learning methods including Conditional Random Field(CRF),Bidirectional Long Short Term Memory Network(BiLSTM),and BiLSTM-CRF as baselines.Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models.Findings:Compared with character-level BiLSTM-CRF,the best baseline model with F1 score of 50.16%,our character-level sequence labeling model based on BERT obtains F1 score of 59.80%,getting 9.64%absolute improvement.Research limitations:We just consider automatic keyphrase extraction task rather than keyphrase generation task,so only keyphrases that are occurred in the given text can be extracted.In addition,our proposed dataset is not suitable for dealing with nested keyphrases.Practical implications:We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts(CAKE)publicly available for the benefits of research community,which is available at:https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction.Originality/value:By designing comparative experiments,our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models.And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent. 展开更多
关键词 Automatic keyphrase extraction Character-level sequence labeling Pretrained language model Scientific chinese medical abstracts
下载PDF
DI-VTR:Dual inter-modal interaction model for video-text retrieval
11
作者 Jie Guo Mengying Wang +2 位作者 Wenwei Wang Yan Zhou Bin Song 《Journal of Information and Intelligence》 2024年第5期388-403,共16页
Video-text retrieval is a challenging task for multimodal information processing due to the semantic gap between different modalities.However,most existing methods do not fully mine the intra-modal interactions,as wit... Video-text retrieval is a challenging task for multimodal information processing due to the semantic gap between different modalities.However,most existing methods do not fully mine the intra-modal interactions,as with the temporal correlation of video frames,which results in poor matching performance.Additionally,the imbalanced semantic information between videos and texts also leads to difficulty in the alignment of the two modalities.To this end,we propose a dual inter-modal interaction network for video-text retrieval,i.e.,DI-vTR.To learn the intra-modal interaction of video frames,we design a contextual-related video encoder to obtain more fine-grained content-oriented video representations.We also propose a dual inter-modal interaction module to accomplish accurate multilingual alignment between the video and text modalities by introducing multilingual text to improve the representation ability of text semantic features.Extensive experimental results on commonly-used video-text retrieval datasets,including MSR-VTT,MSVD and VATEX,show that the proposed method achieves significantly improved performance compared with state-of-the-art methods. 展开更多
关键词 Video-text retrieval Multilingual text Dual interaction Contrastivelanguage-image pretraining(CLIP) Cross-modal retrieval
原文传递
Generative pretrained transformer 4:an innovative approach to facilitate value-based healthcare
12
作者 Han Lyu Zhixiang Wang +6 位作者 Jia Li Jing Sun Xinghao Wang Pengling Ren Linkun Cai Zhenchang Wang Max Wintermark 《Intelligent Medicine》 EI CSCD 2024年第1期10-15,共6页
Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing approp... Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing appropriate medical imaging automatically in different clinical scenarios.Methods Institutional Review Boards(IRB)approval was not required due to the use of nonidentifiable data.Instead,we used 112 questions from the American College of Radiology(ACR)Radiology-TEACHES Program as prompts,which is an open-sourced question and answer program to guide appropriate medical imaging.We included 69 free-text case vignettes and 43 simplified cases.For the performance evaluation of GPT-4 and GPT-3.5,we considered the recommendations of ACR guidelines as the gold standard,and then three radiologists analyzed the consistency of the responses from the GPT models with those of the ACR.We set a five-score criterion for the evaluation of the consistency.A paired t-test was applied to assess the statistical significance of the findings.Results For the performance of the GPT models in free-text case vignettes,the accuracy of GPT-4 was 92.9%,whereas the accuracy of GPT-3.5 was just 78.3%.GPT-4 can provide more appropriate suggestions to reduce the overutilization of medical imaging than GPT-3.5(t=3.429,P=0.001).For the performance of the GPT models in simplified scenarios,the accuracy of GPT-4 and GPT-3.5 was 66.5%and 60.0%,respectively.The differences were not statistically significant(t=1.858,P=0.070).GPT-4 was characterized by longer reaction times(27.1 s in average)and extensive responses(137.1 words on average)than GPT-3.5.Conclusion As an advanced tool for improving value-based healthcare in clinics,GPT-4 may guide appropriate medical imaging accurately and efficiently。 展开更多
关键词 Generative pretrained transformer 4 model Natural language processing Medical imaging APPROPRIATENESS
原文传递
LKMT:Linguistics Knowledge-Driven Multi-Task Neural Machine Translation for Urdu and English
13
作者 Muhammad Naeem Ul Hassan Zhengtao Yu +4 位作者 Jian Wang Ying Li Shengxiang Gao Shuwan Yang Cunli Mao 《Computers, Materials & Continua》 SCIE EI 2024年第10期951-969,共19页
Thanks to the strong representation capability of pre-trained language models,supervised machine translation models have achieved outstanding performance.However,the performances of these models drop sharply when the ... Thanks to the strong representation capability of pre-trained language models,supervised machine translation models have achieved outstanding performance.However,the performances of these models drop sharply when the scale of the parallel training corpus is limited.Considering the pre-trained language model has a strong ability for monolingual representation,it is the key challenge for machine translation to construct the in-depth relationship between the source and target language by injecting the lexical and syntactic information into pre-trained language models.To alleviate the dependence on the parallel corpus,we propose a Linguistics Knowledge-Driven MultiTask(LKMT)approach to inject part-of-speech and syntactic knowledge into pre-trained models,thus enhancing the machine translation performance.On the one hand,we integrate part-of-speech and dependency labels into the embedding layer and exploit large-scale monolingual corpus to update all parameters of pre-trained language models,thus ensuring the updated language model contains potential lexical and syntactic information.On the other hand,we leverage an extra self-attention layer to explicitly inject linguistic knowledge into the pre-trained language model-enhanced machine translation model.Experiments on the benchmark dataset show that our proposed LKMT approach improves the Urdu-English translation accuracy by 1.97 points and the English-Urdu translation accuracy by 2.42 points,highlighting the effectiveness of our LKMT framework.Detailed ablation experiments confirm the positive impact of part-of-speech and dependency parsing on machine translation. 展开更多
关键词 Urdu NMT(neural machine translation) Urdu natural language processing Urdu Linguistic features low resources language linguistic features pretrain model
下载PDF
Computer-aided Detection of Tuberculosis from Microbiological and Radiographic Images 被引量:1
14
作者 Abdullahi Umar Ibrahim Ayse Gunnay Kibarer Fadi Al-Turjman 《Data Intelligence》 EI 2023年第4期1008-1032,共25页
Tuberculosis caused by Mycobacterium tuberculosis have been a major challenge for medical and healthcare sectors in many underdeveloped countries with limited diagnosis tools.Tuberculosis can be detected from microsco... Tuberculosis caused by Mycobacterium tuberculosis have been a major challenge for medical and healthcare sectors in many underdeveloped countries with limited diagnosis tools.Tuberculosis can be detected from microscopic slides and chest X-ray but as a result of the high cases of tuberculosis,this method can be tedious for both Microbiologists and Radiologists and can lead to miss-diagnosis.These challenges can be solved by employing Computer-Aided Detection(CAD)via Al-driven models which learn features based on convolution and result in an output with high accuracy.In this paper,we described automated discrimination of X-ray and microscope slide images into tuberculosis and non-tuberculosis cases using pretrained AlexNet Models.The study employed Chest X-ray dataset made available on Kaggle repository and microscopic slide images from both Near East University Hospital and Kaggle repository.For classification of tuberculosis using microscopic slide images,the model achieved 90.56%accuracy,97.78%sensitivity and 83.33%specificity for 70:30 splits.For classification of tuberculosis using X-ray images,the model achieved 93.89%accuracy,96.67%sensitivity and 91.11%specificity for 70:30 splits.Our result is in line with the notion that CNN models can be used for classifying medical images with higher accuracy and precision. 展开更多
关键词 TUBERCULOSIS Deep Learning Pretrained AlexNet Chest X-ray Microscopic slide
原文传递
Compositional Prompting Video-language Models to Understand Procedure in Instructional Videos
15
作者 Guyue Hu Bin He Hanwang Zhang 《Machine Intelligence Research》 EI CSCD 2023年第2期249-262,共14页
Instructional videos are very useful for completing complex daily tasks,which naturally contain abundant clip-narration pairs.Existing works for procedure understanding are keen on pretraining various video-language m... Instructional videos are very useful for completing complex daily tasks,which naturally contain abundant clip-narration pairs.Existing works for procedure understanding are keen on pretraining various video-language models with these pairs and then finetuning downstream classifiers and localizers in predetermined category space.These video-language models are proficient at representing short-term actions,basic objects,and their combinations,but they are still far from understanding long-term procedures.In addition,the predetermined procedure category faces the problem of combination disaster and is inherently inapt to unseen procedures.Therefore,we propose a novel compositional prompt learning(CPL)framework to understand long-term procedures by prompting short-term video-language models and reformulating several classical procedure understanding tasks into general video-text matching problems.Specifically,the proposed CPL consists of one visual prompt and three compositional textual prompts(including the action prompt,object prompt,and procedure prompt),which could compositionally distill knowledge from short-term video-language models to facilitate long-term procedure understanding.Besides,the task reformulation enables our CPL to perform well in all zero-shot,few-shot,and fully-supervised settings.Extensive experiments on two widely-used datasets for procedure understanding demonstrate the effectiveness of the proposed approach. 展开更多
关键词 Prompt learning video-language pretrained models instructional videos procedure understanding knowledge distilling
原文传递
An approach based on deep learning for Indian sign language translation
16
作者 Kinjal Bhargavkumar Mistree Devendra Thakor Brijesh Bhatt 《International Journal of Intelligent Computing and Cybernetics》 EI 2023年第3期397-419,共23页
Purpose–According to the Indian Sign Language Research and Training Centre(ISLRTC),India has approximately 300 certified human interpreters to help people with hearing loss.This paper aims to address the issue of Ind... Purpose–According to the Indian Sign Language Research and Training Centre(ISLRTC),India has approximately 300 certified human interpreters to help people with hearing loss.This paper aims to address the issue of Indian Sign Language(ISL)sentence recognition and translation into semantically equivalent English text in a signer-independent mode.Design/methodology/approach–This study presents an approach that translates ISL sentences into English text using the MobileNetV2 model and Neural Machine Translation(NMT).The authors have created an ISL corpus from the Brown corpus using ISL grammar rules to perform machine translation.The authors’approach converts ISL videos of the newly created dataset into ISL gloss sequences using the MobileNetV2 model and the recognized ISL gloss sequence is then fed to a machine translation module that generates an English sentence for each ISL sentence.Findings–As per the experimental results,pretrained MobileNetV2 model was proven the best-suited model for the recognition of ISL sentences and NMT provided better results than Statistical Machine Translation(SMT)to convert ISL text into English text.The automatic and human evaluation of the proposed approach yielded accuracies of 83.3 and 86.1%,respectively.Research limitations/implications–It can be seen that the neural machine translation systems produced translations with repetitions of other translated words,strange translations when the total number of words per sentence is increased and one or more unexpected terms that had no relation to the source text on occasion.The most common type of error is the mistranslation of places,numbers and dates.Although this has little effect on the overall structure of the translated sentence,it indicates that the embedding learned for these few words could be improved.Originality/value–Sign language recognition and translation is a crucial step toward improving communication between the deaf and the rest of society.Because of the shortage of human interpreters,an alternative approach is desired to help people achieve smooth communication with the Deaf.To motivate research in this field,the authors generated an ISL corpus of 13,720 sentences and a video dataset of 47,880 ISL videos.As there is no public dataset available for ISl videos incorporating signs released by ISLRTC,the authors created a new video dataset and ISL corpus. 展开更多
关键词 Indian sign language Neural machine translation ISL corpus Pretrained models Sign language recognition Sign language translation Paper type Research paper
原文传递
Optimization of deep network models through fine tuning
17
作者 M.Arif Wani Saduf Afzal 《International Journal of Intelligent Computing and Cybernetics》 EI 2018年第3期386-403,共18页
Purpose–Many strategies have been put forward for training deep network models,however,stacking of several layers of non-linearities typically results in poor propagation of gradients and activations.The purpose of t... Purpose–Many strategies have been put forward for training deep network models,however,stacking of several layers of non-linearities typically results in poor propagation of gradients and activations.The purpose of this paper is to explore the use of two steps strategy where initial deep learning model is obtained first by unsupervised learning and then optimizing the initial deep learning model by fine tuning.A number of fine tuning algorithms are explored in this work for optimizing deep learning models.This includes proposing a new algorithm where Backpropagation with adaptive gain algorithm is integrated with Dropout technique and the authors evaluate its performance in the fine tuning of the pretrained deep network.Design/methodology/approach–The parameters of deep neural networks are first learnt using greedy layer-wise unsupervised pretraining.The proposed technique is then used to perform supervised fine tuning of the deep neural network model.Extensive experimental study is performed to evaluate the performance of the proposed fine tuning technique on three benchmark data sets:USPS,Gisette and MNIST.The authors have tested the approach on varying size data sets which include randomly chosen training samples of size 20,50,70 and 100 percent from the original data set.Findings–Through extensive experimental study,it is concluded that the two steps strategy and the proposed fine tuning technique significantly yield promising results in optimization of deep network models.Originality/value–This paper proposes employing several algorithms for fine tuning of deep network model.A new approach that integrates adaptive gain Backpropagation(BP)algorithm with Dropout technique is proposed for fine tuning of deep networks.Evaluation and comparison of various algorithms proposed for fine tuning on three benchmark data sets is presented in the paper. 展开更多
关键词 DROPOUT Deep neural network Contrastive divergence Fine tuning of deep neural network Restricted Boltzmann machine Unsupervised pretraining Backpropagation
原文传递
Overview of SMP-CAIL2020-Argmine:The Interactive Argument-Pair Extraction in Judgement Document Challenge 被引量:4
18
作者 Jian Yuan Zhongyu Wei +8 位作者 Yixu Gao Wei Chen Yun Song Donghua Zhao Jinglei Ma Zhen Hu Shaokun Zou Donghai Li Xuanjing Huang 《Data Intelligence》 2021年第2期287-307,共21页
In this paper we present the results of the Interactive Argument-Pair Extraction in Judgement Document Challenge held by both the Chinese AI and Law Challenge(CAIL)and the Chinese National Social Media Processing Conf... In this paper we present the results of the Interactive Argument-Pair Extraction in Judgement Document Challenge held by both the Chinese AI and Law Challenge(CAIL)and the Chinese National Social Media Processing Conference(SMP),and introduce the related data set-SMP-CAIL2020-Argmine.The task challenged participants to choose the correct argument among five candidates proposed by the defense to refute or acknowledge the given argument made by the plaintiff,providing the full context recorded in the judgement documents of both parties.We received entries from 63 competing teams,38 of which scored higher than the provided baseline model(BERT)in the first phase and entered the second phase.The best performing system in the two phases achieved accuracy of 0.856 and 0.905,respectively.In this paper,we will present the results of the competition and a summary of the systems,highlighting commonalities and innovations among participating systems.The SMP-CAIL2020-Argmine data set and baseline modelshave been already released. 展开更多
关键词 Argumentation mining Judgement documents Natural language understanding Pretrained language model
原文传递
Intelligent Prescription-Generating Models of Traditional Chinese Medicine Based on Deep Learning 被引量:2
19
作者 Qing-Yang Shi Li-Zi Tan +1 位作者 Lim Lian Seng Hui-Jun Wang 《World Journal of Traditional Chinese Medicine》 2021年第3期361-369,共9页
Objective:This study aimed to construct an intelligent prescription-generating(IPG)model based on deep-learning natural language processing(NLP)technology for multiple prescriptions in Chinese medicine.Materials and M... Objective:This study aimed to construct an intelligent prescription-generating(IPG)model based on deep-learning natural language processing(NLP)technology for multiple prescriptions in Chinese medicine.Materials and Methods:We selected the Treatise on Febrile Diseases and the Synopsis of Golden Chamber as basic datasets with EDA data augmentation,and the Yellow Emperor’s Canon of Internal Medicine,the Classic of the Miraculous Pivot,and the Classic on Medical Problems as supplementary datasets for fine-tuning.We selected the word-embedding model based on the Imperial Collection of Four,the bidirectional encoder representations from transformers(BERT)model based on the Chinese Wikipedia,and the robustly optimized BERT approach(RoBERTa)model based on the Chinese Wikipedia and a general database.In addition,the BERT model was fine-tuned using the supplementary datasets to generate a Traditional Chinese Medicine-BERT model.Multiple IPG models were constructed based on the pretraining strategy and experiments were performed.Metrics of precision,recall,and F1-score were used to assess the model performance.Based on the trained models,we extracted and visualized the semantic features of some typical texts from treatise on febrile diseases and investigated the patterns.Results:Among all the trained models,the RoBERTa-large model performed the best,with a test set precision of 92.22%,recall of 86.71%,and F1-score of 89.38%and 10-fold cross-validation precision of 94.5%±2.5%,recall of 90.47%±4.1%,and F1-score of 92.38%±2.8%.The semantic feature extraction results based on this model showed that the model was intelligently stratified based on different meanings such that the within-layer’s patterns showed the associations of symptom–symptoms,disease–symptoms,and symptom–punctuations,while the between-layer’s patterns showed a progressive or dynamic symptom and disease transformation.Conclusions:Deep-learning-based NLP technology significantly improves the performance of IPG model.In addition,NLP-based semantic feature extraction may be vital to further investigate the ancient Chinese medicine texts. 展开更多
关键词 Ancient books of Chinese medicine bidirectional encoder representations from transformers deep learning intelligent prescription-generating models pretrained models
原文传递
Pretrained Models and Evaluation Data for the Khmer Language
20
作者 Shengyi Jiang Sihui Fu +1 位作者 Nankai Lin Yingwen Fu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第4期709-718,共10页
Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(N... Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(NLP)tasks.In recent years,PTMs have been widely used in most NLP applications,especially for high-resource languages,such as English and Chinese.However,scarce resources have discouraged the progress of PTMs for low-resource languages.Transformer-based PTMs for the Khmer language are presented in this work for the first time.We evaluate our models on two downstream tasks:Part-of-speech tagging and news categorization.The dataset for the latter task is self-constructed.Experiments demonstrate the effectiveness of the Khmer models.In addition,we find that the current Khmer word segmentation technology does not aid performance improvement.We aim to release our models and datasets to the community in hopes of facilitating the future development of Khmer NLP applications. 展开更多
关键词 pretrained models Khmer language word segmentation part-of-speech(POS)tagging news categorization
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部