Earth observations,especially satellite data,have produced a wealth of methods and results in meeting global challenges,often presented in unstructured texts such as papers or reports.Accurate extraction of satellite ...Earth observations,especially satellite data,have produced a wealth of methods and results in meeting global challenges,often presented in unstructured texts such as papers or reports.Accurate extraction of satellite and instrument entities from these unstructured texts can help to link and reuse Earth observation resources.The direct use of an existing dictionary to extract satellite and instrument entities suffers from the problem of poor matching,which leads to low recall.In this study,we present a named entity recognition model to automatically extract satellite and instrument entities from unstructured texts.Due to the lack of manually labeled data,we apply distant supervision to automatically generate labeled training data.Accordingly,we fine-tune the pre-trained language model with early stopping and a weighted cross-entropy loss function.We propose the dictionary-based self-training method to correct the incomplete annotations caused by the distant supervision method.Experiments demonstrate that our method achieves significant improvements in both precision and recall compared to dictionary matching or standard adaptation of pre-trained language models.展开更多
Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved throu...Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research.展开更多
Recently,many researchers have concentrated on using neural networks to learn features for Distant Supervised Relation Extraction(DSRE).These approaches generally use a softmax classifier with cross-entropy loss,which...Recently,many researchers have concentrated on using neural networks to learn features for Distant Supervised Relation Extraction(DSRE).These approaches generally use a softmax classifier with cross-entropy loss,which inevitably brings the noise of artificial class NA into classification process.To address the shortcoming,the classifier with ranking loss is employed to DSRE.Uniformly randomly selecting a relation or heuristically selecting the highest score among all incorrect relations are two common methods for generating a negative class in the ranking loss function.However,the majority of the generated negative class can be easily discriminated from positive class and will contribute little towards the training.Inspired by Generative Adversarial Networks(GANs),we use a neural network as the negative class generator to assist the training of our desired model,which acts as the discriminator in GANs.Through the alternating optimization of generator and discriminator,the generator is learning to produce more and more discriminable negative classes and the discriminator has to become better as well.This framework is independent of the concrete form of generator and discriminator.In this paper,we use a two layers fully-connected neural network as the generator and the Piecewise Convolutional Neural Networks(PCNNs)as the discriminator.Experiment results show that our proposed GAN-based method is effective and performs better than state-of-the-art methods.展开更多
基金supported by the National Key Research and Development Program of China:[grant number 2019YFE0126400].
文摘Earth observations,especially satellite data,have produced a wealth of methods and results in meeting global challenges,often presented in unstructured texts such as papers or reports.Accurate extraction of satellite and instrument entities from these unstructured texts can help to link and reuse Earth observation resources.The direct use of an existing dictionary to extract satellite and instrument entities suffers from the problem of poor matching,which leads to low recall.In this study,we present a named entity recognition model to automatically extract satellite and instrument entities from unstructured texts.Due to the lack of manually labeled data,we apply distant supervision to automatically generate labeled training data.Accordingly,we fine-tune the pre-trained language model with early stopping and a weighted cross-entropy loss function.We propose the dictionary-based self-training method to correct the incomplete annotations caused by the distant supervision method.Experiments demonstrate that our method achieves significant improvements in both precision and recall compared to dictionary matching or standard adaptation of pre-trained language models.
文摘Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research.
基金This research work is supported by the National Natural Science Foundation of China(NO.61772454,6171101570,61602059)Hunan Provincial Natural Science Foundation of China(No.2017JJ3334)+1 种基金the Research Foundation of Education Bureau of Hunan Province,China(No.16C0045)the Open Project Program of the National Laboratory of Pattern Recognition(NLPR).Professor Jin Wang is the corresponding author.
文摘Recently,many researchers have concentrated on using neural networks to learn features for Distant Supervised Relation Extraction(DSRE).These approaches generally use a softmax classifier with cross-entropy loss,which inevitably brings the noise of artificial class NA into classification process.To address the shortcoming,the classifier with ranking loss is employed to DSRE.Uniformly randomly selecting a relation or heuristically selecting the highest score among all incorrect relations are two common methods for generating a negative class in the ranking loss function.However,the majority of the generated negative class can be easily discriminated from positive class and will contribute little towards the training.Inspired by Generative Adversarial Networks(GANs),we use a neural network as the negative class generator to assist the training of our desired model,which acts as the discriminator in GANs.Through the alternating optimization of generator and discriminator,the generator is learning to produce more and more discriminable negative classes and the discriminator has to become better as well.This framework is independent of the concrete form of generator and discriminator.In this paper,we use a two layers fully-connected neural network as the generator and the Piecewise Convolutional Neural Networks(PCNNs)as the discriminator.Experiment results show that our proposed GAN-based method is effective and performs better than state-of-the-art methods.