Purpose:The purpose of this study is to serve as a comprehensive review of the existing annotated corpora.This review study aims to provide information on the existing annotated corpora for event extraction,which are ...Purpose:The purpose of this study is to serve as a comprehensive review of the existing annotated corpora.This review study aims to provide information on the existing annotated corpora for event extraction,which are limited but essential for training and improving the existing event extraction algorithms.In addition to the primary goal of this study,it provides guidelines for preparing an annotated corpus and suggests suitable tools for the annotation task.Design/methodology/approach:This study employs an analytical approach to examine available corpus that is suitable for event extraction tasks.It offers an in-depth analysis of existing event extraction corpora and provides systematic guidelines for researchers to develop accurate,high-quality corpora.This ensures the reliability of the created corpus and its suitability for training machine learning algorithms.Findings:Our exploration reveals a scarcity of annotated corpora for event extraction tasks.In particular,the English corpora are mainly focused on the biomedical and general domains.Despite the issue of annotated corpora scarcity,there are several high-quality corpora available and widely used as benchmark datasets.However,access to some of these corpora might be limited owing to closed-access policies or discontinued maintenance after being initially released,rendering them inaccessible owing to broken links.Therefore,this study documents the available corpora for event extraction tasks.Research limitations:Our study focuses only on well-known corpora available in English and Chinese.Nevertheless,this study places a strong emphasis on the English corpora due to its status as a global lingua franca,making it widely understood compared to other languages.Practical implications:We genuinely believe that this study provides valuable knowledge that can serve as a guiding framework for preparing and accurately annotating events from text corpora.It provides comprehensive guidelines for researchers to improve the quality of corpus annotations,especially for event extraction tasks across various domains.Originality/value:This study comprehensively compiled information on the existing annotated corpora for event extraction tasks and provided preparation guidelines.展开更多
Supervised machine learning approaches are effective in text mining,but their success relies heavily on manually annotated corpora.However,there are limited numbers of annotated biomedical event corpora,and the availa...Supervised machine learning approaches are effective in text mining,but their success relies heavily on manually annotated corpora.However,there are limited numbers of annotated biomedical event corpora,and the available datasets contain insufficient examples for training classifiers;the common cure is to seek large amounts of training samples from unlabeled data,but such data sets often contain many mislabeled samples,which will degrade the performance of classifiers.Therefore,this study proposes a novel error data detection approach suitable for reducing noise in unlabeled biomedical event data.First,we construct the mislabeled dataset through error data analysis with the development dataset.The sample pairs’vector representations are then obtained by the means of sequence patterns and the joint model of convolutional neural network and long short-term memory recurrent neural network.Following this,the sample identification strategy is proposed,using error detection based on pair representation for unlabeled data.With the latter,the selected samples are added to enrich the training dataset and improve the classification performance.In the BioNLP Shared Task GENIA,the experiments results indicate that the proposed approach is competent in extract the biomedical event from biomedical literature.Our approach can effectively filter some noisy examples and build a satisfactory prediction model.展开更多
Event extraction is one of the most challenging tasks in information extraction.It is a common phenomenon where multiple events exist in the same sentence.However,extracting multiple events is more difficult than extr...Event extraction is one of the most challenging tasks in information extraction.It is a common phenomenon where multiple events exist in the same sentence.However,extracting multiple events is more difficult than extracting a single event.Existing event extraction methods based on sequence models ignore the interrelated information between events because the sequence is too long.In addition,the current argument extraction relies on the results of syntactic dependency analysis,which is complicated and prone to error trans-mission.In order to solve the above problems,a joint event extraction method based on global event-type guidance and attention enhancement was proposed in this work.Specifically,for multiple event detection,we propose a global-type guidance method that can detect event types in the candidate sequence in advance to enhance the correlation information between events.For argument extraction,we converted it into a table-flling problem,and proposed a table-flling method of the attention mechanism,that is simple and can enhance the correlation between trigger words and arguments.The experimental results based on the ACE 2005 dataset showed that the proposed method achieved 1.6%improvement in the task of event detection,and obtained state-of-the-art results in the argument extraction task,which proved the effectiveness of the method.展开更多
Event extraction stands as a significant endeavor within the realm of information extraction,aspiring to automatically extract structured event information from vast volumes of unstructured text.Extracting event eleme...Event extraction stands as a significant endeavor within the realm of information extraction,aspiring to automatically extract structured event information from vast volumes of unstructured text.Extracting event elements from multi-modal data remains a challenging task due to the presence of a large number of images and overlapping event elements in the data.Although researchers have proposed various methods to accomplish this task,most existing event extraction models cannot address these challenges because they are only applicable to text scenarios.To solve the above issues,this paper proposes a multi-modal event extraction method based on knowledge fusion.Specifically,for event-type recognition,we use a meticulous pipeline approach that integrates multiple pre-trained models.This approach enables a more comprehensive capture of the multidimensional event semantic features present in military texts,thereby enhancing the interconnectedness of information between trigger words and events.For event element extraction,we propose a method for constructing a priori templates that combine event types with corresponding trigger words.This approach facilitates the acquisition of fine-grained input samples containing event trigger words,thus enabling the model to understand the semantic relationships between elements in greater depth.Furthermore,a fusion method for spatial mapping of textual event elements and image elements is proposed to reduce the category number overload and effectively achieve multi-modal knowledge fusion.The experimental results based on the CCKS 2022 dataset show that our method has achieved competitive results,with a comprehensive evaluation value F1-score of 53.4%for the model.These results validate the effectiveness of our method in extracting event elements from multi-modal data.展开更多
Event Extraction(EE)is a key task in information extraction,which requires high-quality annotated data that are often costly to obtain.Traditional classification-based methods suffer from low-resource scenarios due to...Event Extraction(EE)is a key task in information extraction,which requires high-quality annotated data that are often costly to obtain.Traditional classification-based methods suffer from low-resource scenarios due to the lack of label semantics and fine-grained annotations.While recent approaches have endeavored to address EE through a more data-efficient generative process,they often overlook event keywords,which are vital for EE.To tackle these challenges,we introduce KeyEE,a multi-prompt learning strategy that improves low-resource event extraction by Event Keywords Extraction(EKE).We suggest employing an auxiliary EKE sub-prompt and concurrently training both EE and EKE with a shared pre-trained language model.With the auxiliary sub-prompt,KeyEE learns event keywords knowledge implicitly,thereby reducing the dependence on annotated data.Furthermore,we investigate and analyze various EKE sub-prompt strategies to encourage further research in this area.Our experiments on benchmark datasets ACE2005 and ERE show that KeyEE achieves significant improvement in low-resource settings and sets new state-of-the-art results.展开更多
Traditional event extraction systems focus mainly on event type identification and event participant extraction based on pre-specified event type paradigms and manually annotated corpora. However, different domains ha...Traditional event extraction systems focus mainly on event type identification and event participant extraction based on pre-specified event type paradigms and manually annotated corpora. However, different domains have different event type paradigms. When transferring to a new domain, we have to build a new event type paradigm and annotate a new corpus from scratch. This kind of conventional event extraction system requires massive human effort, and hence prevents event extraction from being widely applicable. In this paper, we present BUEES, a bottom-up event extraction system, which extracts events from the web in a completely unsupervised way. The system automatically builds an event type paradigm in the input corpus, and then proceeds to extract a large number of instance patterns of these events. Subsequently, the system extracts event arguments according to these patterns. By conducting a series of experiments, we demonstrate the good performance of BUEES and compare it to a state-of-the-art Chinese event extraction system, i.e., a supervised event extraction system. Experimental results show that BUEES performs comparably (5% higher F-measure in event type identification and 3-0 higher F-measure in event argument extraction), but without any human effort.展开更多
We propose a new framework for entity and event extraction based on generative adversarial imitation learning-an inverse reinforcement learning method using a generative adversarial network(GAN).We assume that instanc...We propose a new framework for entity and event extraction based on generative adversarial imitation learning-an inverse reinforcement learning method using a generative adversarial network(GAN).We assume that instances and labels yield to various extents of difficulty and the gains and penalties(rewards)are expected to be diverse.We utilize discriminators to estimate proper rewards according to the difference between the labels committed by the ground-truth(expert)and the extractor(agent).Our experiments demonstrate that the proposed framework outperforms state-of-the-art methods.展开更多
The China Conference on Knowledge Graph and Semantic Computing(CCKS)2020 Evaluation Task 3 presented clinical named entity recognition and event extraction for the Chinese electronic medical records.Two annotated data...The China Conference on Knowledge Graph and Semantic Computing(CCKS)2020 Evaluation Task 3 presented clinical named entity recognition and event extraction for the Chinese electronic medical records.Two annotated data sets and some other additional resources for these two subtasks were provided for participators.This evaluation competition attracted 354 teams and 46 of them successfully submitted the valid results.The pre-trained language models are widely applied in this evaluation task.Data argumentation and external resources are also helpful.展开更多
This paper presents a winning solution for the CCKS-2020 financial event extraction task, where the goal is to identify event types, triggers and arguments in sentences across multiple event types. In this task, we fo...This paper presents a winning solution for the CCKS-2020 financial event extraction task, where the goal is to identify event types, triggers and arguments in sentences across multiple event types. In this task, we focus on resolving two challenging problems(i.e., low resources and element overlapping) by proposing a joint learning framework, named SaltyFishes. We first formulate the event extraction task as a joint probability model. By sharing parameters in the model across different types, we can learn to adapt to low-resource events based on high-resource events. We further address the element overlapping problems by a mechanism of Conditional Layer Normalization, achieving even better extraction accuracy. The overall approach achieves an F1-score of 87.8% which ranks the first place in the competition.展开更多
In this paper, we present a new challenging task for emotion analysis, namely emotion cause extraction.In this task, we focus on the detection of emotion cause a.k.a the reason or the stimulant of an emotion, rather t...In this paper, we present a new challenging task for emotion analysis, namely emotion cause extraction.In this task, we focus on the detection of emotion cause a.k.a the reason or the stimulant of an emotion, rather than the regular emotion classification or emotion component extraction. Since there is no open dataset for this task available, we first designed and annotated an emotion cause dataset which follows the scheme of W3 C Emotion Markup Language. We then present an emotion cause detection method by using event extraction framework,where a tree structure-based representation method is used to represent the events. Since the distribution of events is imbalanced in the training data, we propose an under-sampling-based bagging algorithm to solve this problem. Even with a limited training set, the proposed approach may still extract sufficient features for analysis by a bagging of multi-kernel based SVMs method. Evaluations show that our approach achieves an F-measure 7.04%higher than the state-of-the-art methods.展开更多
Document-level financial event extraction(DFEE) is the task of detecting events and extracting the corresponding event arguments in financial documents, which plays an important role in information extraction in the f...Document-level financial event extraction(DFEE) is the task of detecting events and extracting the corresponding event arguments in financial documents, which plays an important role in information extraction in the financial domain. This task is challenging as the financial documents are generally long text and event arguments of one event may be scattered in different sentences. To address this issue, we proposed a novel Prior Information Enhanced Extraction framework(PIEE) for DFEE, leveraging prior information from both event types and pre-trained language models. Specifically, PIEE consists of three components: event detection, event argument extraction, and event table filling. In event detection, we identify the event type. Then, the event type is explicitly used for event argument extraction. Meanwhile, the implicit information within language models also provides considerable cues for event arguments localization. Finally, all the event arguments are filled in an event table by a set of predefined heuristic rules. To demonstrate the effectiveness of our proposed framework, we participated in the share task of CCKS2020 Task 4-2: Documentlevel Event Arguments Extraction. On both Leaderboard A and Leaderboard B, PIEE took the first place and significantly outperformed the other systems.展开更多
Nursing records contain information on patients’treatment processes,which reflect the changes in patients’conditions and have legal effects.However,some of the written records of intensive care unit(ICU)nurses are i...Nursing records contain information on patients’treatment processes,which reflect the changes in patients’conditions and have legal effects.However,some of the written records of intensive care unit(ICU)nurses are incomplete according to our observations.This paper proposes an approach extracting structured nursing events from unstructured nursing records for detecting the missing items automatically.According to the PIO(problem,intervention,outcome)principle in the field of medical care,we propose event schemas for nursing records and annotate a Chinese nursing event extraction dataset(CNEED)on ICU nursing records.We find that several events may occur in a nursing record.Therefore,we present a multi-event extraction model for the nursing records.The experimental results demonstrate that our model achieves good results on CNEED and outperforms competitive methods on the multi-event argument attribution problem.By observing the results of automatic event extraction by our model,we detect missing items in the existing nursing records.This proves that our model can be used to help nurses check and improve the method of recording nursing processes.展开更多
Relation extraction is a key task for knowledge graph construction and natural language processing,which aims to extract meaningful relational information between entities from plain texts.With the development of deep...Relation extraction is a key task for knowledge graph construction and natural language processing,which aims to extract meaningful relational information between entities from plain texts.With the development of deep learning,many neural relation extraction models were proposed recently.This paper introduces a survey on the task of neural relation extraction,including task description,widely used evaluation datasets,metrics,typical methods,challenges and recent research progresses.We mainly focus on four recent research problems:(1)how to learn the semantic representations from the given sentences for the target relation,(2)how to train a neural relation extraction model based on insufficient labeled instances,(3)how to extract relations across sentences or in a document and(4)how to jointly extract relations and corresponding entities?Finally,we give out our conclusion and future research issues.展开更多
Event extraction is an important research point in information extraction, which includes two important sub-tasks of event type recognition and event argument recognition. This paper describes a method based on automa...Event extraction is an important research point in information extraction, which includes two important sub-tasks of event type recognition and event argument recognition. This paper describes a method based on automatic expansion of the event triggers for event type recognition. The event triggers are first extended through a thesaurus to enable the extraction of the candidate events and their candidate types. Then, a binary classification method is used to recognize the candidate event types. This method effectively improves the unbalanced data problem in training models and the data sparseness problem with a small corpus. Evaluations on the ACE2005 dataset give a final F-score of 61.24%, which outperforms traditional methods based on pure machine learning.展开更多
We describe a gold standard corpus of protest events that comprise various local and international English language sources from various countries.The corpus contains document-,sentence-,and token-level annotations.Th...We describe a gold standard corpus of protest events that comprise various local and international English language sources from various countries.The corpus contains document-,sentence-,and token-level annotations.This corpus facilitates creating machine learning models that automatically classify news articles and extract protest event-related information,constructing knowledge bases that enable comparative social and political science studies.For each news source,the annotation starts with random samples of news articles and continues with samples drawn using active learning.Each batch of samples is annotated by two social and political scientists,adjudicated by an annotation supervisor,and improved by identifying annotation errors semi-automatically.We found that the corpus possesses the variety and quality that are necessary to develop and benchmark text classification and event extraction systems in a cross-context setting,contributing to the generalizability and robustness of automated text processing systems.This corpus and the reported results will establish a common foundation in automated protest event collection studies,which is currently lacking in the literature.展开更多
The construction of a case event logic graph for the judgment documentcan more intuitively retrospect the development of the case. This paperproposes a joint model of event extraction and relationship recognition for ...The construction of a case event logic graph for the judgment documentcan more intuitively retrospect the development of the case. This paperproposes a joint model of event extraction and relationship recognition for judgmentdocuments. By extracting the case information in the judgment document,a case event logic graph was constructed. The development process of the casewas shown, and a reference was provided for the analysis of the context of thecase. The experimental results show that the proposed method can extract eventsand identify the relationship between events, and the F1 value reaches 0.809. Thecase event logic graph reveals the development context of the case accurately andvividly.展开更多
Event-based surveillance systems are at the crossroads of human and animal(and plant and ecosystem)health,epidemiology,statistics,and informatics.Thus,their deployment faces many challenges specific to each domain and...Event-based surveillance systems are at the crossroads of human and animal(and plant and ecosystem)health,epidemiology,statistics,and informatics.Thus,their deployment faces many challenges specific to each domain and their intersections,such as relations among automation,artificial intelligence,and expertise.In this context,ourwork pertins to the extraction of epidemiological events in textual data(i.e.news)by unsupervised methods.We define the event extraction task as detecting pairs of epidemiological entities(e.g.a disease name and location).The quality of the ranked lists of pairs was evaluated using specific ranking evaluation metrics.We used a publicly available annotated corpus of 438 documents(i.e.news articles)related to animal disease events.The statistical approach was able to detect event-related pairs of epidemiological features with a good trade-off between precision and recall.Our results showed that using a window of words outperformed document-based and sentence-based approaches,while reducing the probability of detecting false pairs.Our results indicated that Mutual Information was less adapted than the Dice coefficient for ranking pairs of features in the event extraction framework.We believe that Mutual Information would be more relevant for rare pair detection(i.e.weak signals),but requires higher manual curation to avoid false positive extraction pairs.Moreover,generalising the country-level spatial features enabled better discrimination(i.e.ranking)of relevant disease-location pairs for event extraction.展开更多
文摘Purpose:The purpose of this study is to serve as a comprehensive review of the existing annotated corpora.This review study aims to provide information on the existing annotated corpora for event extraction,which are limited but essential for training and improving the existing event extraction algorithms.In addition to the primary goal of this study,it provides guidelines for preparing an annotated corpus and suggests suitable tools for the annotation task.Design/methodology/approach:This study employs an analytical approach to examine available corpus that is suitable for event extraction tasks.It offers an in-depth analysis of existing event extraction corpora and provides systematic guidelines for researchers to develop accurate,high-quality corpora.This ensures the reliability of the created corpus and its suitability for training machine learning algorithms.Findings:Our exploration reveals a scarcity of annotated corpora for event extraction tasks.In particular,the English corpora are mainly focused on the biomedical and general domains.Despite the issue of annotated corpora scarcity,there are several high-quality corpora available and widely used as benchmark datasets.However,access to some of these corpora might be limited owing to closed-access policies or discontinued maintenance after being initially released,rendering them inaccessible owing to broken links.Therefore,this study documents the available corpora for event extraction tasks.Research limitations:Our study focuses only on well-known corpora available in English and Chinese.Nevertheless,this study places a strong emphasis on the English corpora due to its status as a global lingua franca,making it widely understood compared to other languages.Practical implications:We genuinely believe that this study provides valuable knowledge that can serve as a guiding framework for preparing and accurately annotating events from text corpora.It provides comprehensive guidelines for researchers to improve the quality of corpus annotations,especially for event extraction tasks across various domains.Originality/value:This study comprehensively compiled information on the existing annotated corpora for event extraction tasks and provided preparation guidelines.
基金This work was supported by the National Natural Science Foundation of China(No.61672301)Jilin Provincial Science&Technology Development(20180101054JC)+1 种基金Science and Technology Innovation Guide Project of Inner Mongolia Autonomous Region of China(2017)Talent Development Fund of Jilin Province(2018).
文摘Supervised machine learning approaches are effective in text mining,but their success relies heavily on manually annotated corpora.However,there are limited numbers of annotated biomedical event corpora,and the available datasets contain insufficient examples for training classifiers;the common cure is to seek large amounts of training samples from unlabeled data,but such data sets often contain many mislabeled samples,which will degrade the performance of classifiers.Therefore,this study proposes a novel error data detection approach suitable for reducing noise in unlabeled biomedical event data.First,we construct the mislabeled dataset through error data analysis with the development dataset.The sample pairs’vector representations are then obtained by the means of sequence patterns and the joint model of convolutional neural network and long short-term memory recurrent neural network.Following this,the sample identification strategy is proposed,using error detection based on pair representation for unlabeled data.With the latter,the selected samples are added to enrich the training dataset and improve the classification performance.In the BioNLP Shared Task GENIA,the experiments results indicate that the proposed approach is competent in extract the biomedical event from biomedical literature.Our approach can effectively filter some noisy examples and build a satisfactory prediction model.
基金This work was supported by the Hunan Provincial Natural Science Foundation of China(Grant No.2020JJ4624,2019JJ50655)the Scientific Research Fund of Hunan Provincial Education Department(Grant No.19A020)the National Social Science Fund of China(Grant No.20&ZD047)。
文摘Event extraction is one of the most challenging tasks in information extraction.It is a common phenomenon where multiple events exist in the same sentence.However,extracting multiple events is more difficult than extracting a single event.Existing event extraction methods based on sequence models ignore the interrelated information between events because the sequence is too long.In addition,the current argument extraction relies on the results of syntactic dependency analysis,which is complicated and prone to error trans-mission.In order to solve the above problems,a joint event extraction method based on global event-type guidance and attention enhancement was proposed in this work.Specifically,for multiple event detection,we propose a global-type guidance method that can detect event types in the candidate sequence in advance to enhance the correlation information between events.For argument extraction,we converted it into a table-flling problem,and proposed a table-flling method of the attention mechanism,that is simple and can enhance the correlation between trigger words and arguments.The experimental results based on the ACE 2005 dataset showed that the proposed method achieved 1.6%improvement in the task of event detection,and obtained state-of-the-art results in the argument extraction task,which proved the effectiveness of the method.
基金supported by the National Natural Science Foundation of China(Grant No.81973695)Discipline with Strong Characteristics of Liaocheng University-Intelligent Science and Technology(Grant No.319462208).
文摘Event extraction stands as a significant endeavor within the realm of information extraction,aspiring to automatically extract structured event information from vast volumes of unstructured text.Extracting event elements from multi-modal data remains a challenging task due to the presence of a large number of images and overlapping event elements in the data.Although researchers have proposed various methods to accomplish this task,most existing event extraction models cannot address these challenges because they are only applicable to text scenarios.To solve the above issues,this paper proposes a multi-modal event extraction method based on knowledge fusion.Specifically,for event-type recognition,we use a meticulous pipeline approach that integrates multiple pre-trained models.This approach enables a more comprehensive capture of the multidimensional event semantic features present in military texts,thereby enhancing the interconnectedness of information between trigger words and events.For event element extraction,we propose a method for constructing a priori templates that combine event types with corresponding trigger words.This approach facilitates the acquisition of fine-grained input samples containing event trigger words,thus enabling the model to understand the semantic relationships between elements in greater depth.Furthermore,a fusion method for spatial mapping of textual event elements and image elements is proposed to reduce the category number overload and effectively achieve multi-modal knowledge fusion.The experimental results based on the CCKS 2022 dataset show that our method has achieved competitive results,with a comprehensive evaluation value F1-score of 53.4%for the model.These results validate the effectiveness of our method in extracting event elements from multi-modal data.
基金supported by the National Key Research and Development Program of China(No.2021YFF1201200)the Science and Technology Major Project of Changsha(No.kh2202004)the Natural Science Foundation of China(No.62006251)。
文摘Event Extraction(EE)is a key task in information extraction,which requires high-quality annotated data that are often costly to obtain.Traditional classification-based methods suffer from low-resource scenarios due to the lack of label semantics and fine-grained annotations.While recent approaches have endeavored to address EE through a more data-efficient generative process,they often overlook event keywords,which are vital for EE.To tackle these challenges,we introduce KeyEE,a multi-prompt learning strategy that improves low-resource event extraction by Event Keywords Extraction(EKE).We suggest employing an auxiliary EKE sub-prompt and concurrently training both EE and EKE with a shared pre-trained language model.With the auxiliary sub-prompt,KeyEE learns event keywords knowledge implicitly,thereby reducing the dependence on annotated data.Furthermore,we investigate and analyze various EKE sub-prompt strategies to encourage further research in this area.Our experiments on benchmark datasets ACE2005 and ERE show that KeyEE achieves significant improvement in low-resource settings and sets new state-of-the-art results.
基金Project supported by the National Natural Science Foundation of China (Nos. 61133012 and 61472107) and the National Basic Research Program (973) of China (No. 2014CB340503)
文摘Traditional event extraction systems focus mainly on event type identification and event participant extraction based on pre-specified event type paradigms and manually annotated corpora. However, different domains have different event type paradigms. When transferring to a new domain, we have to build a new event type paradigm and annotate a new corpus from scratch. This kind of conventional event extraction system requires massive human effort, and hence prevents event extraction from being widely applicable. In this paper, we present BUEES, a bottom-up event extraction system, which extracts events from the web in a completely unsupervised way. The system automatically builds an event type paradigm in the input corpus, and then proceeds to extract a large number of instance patterns of these events. Subsequently, the system extracts event arguments according to these patterns. By conducting a series of experiments, we demonstrate the good performance of BUEES and compare it to a state-of-the-art Chinese event extraction system, i.e., a supervised event extraction system. Experimental results show that BUEES performs comparably (5% higher F-measure in event type identification and 3-0 higher F-measure in event argument extraction), but without any human effort.
文摘We propose a new framework for entity and event extraction based on generative adversarial imitation learning-an inverse reinforcement learning method using a generative adversarial network(GAN).We assume that instances and labels yield to various extents of difficulty and the gains and penalties(rewards)are expected to be diverse.We utilize discriminators to estimate proper rewards according to the difference between the labels committed by the ground-truth(expert)and the extractor(agent).Our experiments demonstrate that the proposed framework outperforms state-of-the-art methods.
文摘The China Conference on Knowledge Graph and Semantic Computing(CCKS)2020 Evaluation Task 3 presented clinical named entity recognition and event extraction for the Chinese electronic medical records.Two annotated data sets and some other additional resources for these two subtasks were provided for participators.This evaluation competition attracted 354 teams and 46 of them successfully submitted the valid results.The pre-trained language models are widely applied in this evaluation task.Data argumentation and external resources are also helpful.
基金This work is supported by the National Key Research and Development Program of China(No.2016YFB1000105)the National Natural Science Foundation of China(No.61772151)+1 种基金This work’s computing device is also supported by Beijing Advanced Innovation Center of Big Data and Brain Computing,Beihang UniversityThe author Shu Guo is supported by“Zhizi Program”.
文摘This paper presents a winning solution for the CCKS-2020 financial event extraction task, where the goal is to identify event types, triggers and arguments in sentences across multiple event types. In this task, we focus on resolving two challenging problems(i.e., low resources and element overlapping) by proposing a joint learning framework, named SaltyFishes. We first formulate the event extraction task as a joint probability model. By sharing parameters in the model across different types, we can learn to adapt to low-resource events based on high-resource events. We further address the element overlapping problems by a mechanism of Conditional Layer Normalization, achieving even better extraction accuracy. The overall approach achieves an F1-score of 87.8% which ranks the first place in the competition.
基金supported by the National Natural Science Foundation of China(Nos.61370165,U1636103,and 61632011)Shenzhen Foundational Research Funding(Nos.JCYJ20150625142543470 and JCYJ20170307150024907)Guangdong Provincial Engineering Technology Research Center for Data Science(No.2016KF09)
文摘In this paper, we present a new challenging task for emotion analysis, namely emotion cause extraction.In this task, we focus on the detection of emotion cause a.k.a the reason or the stimulant of an emotion, rather than the regular emotion classification or emotion component extraction. Since there is no open dataset for this task available, we first designed and annotated an emotion cause dataset which follows the scheme of W3 C Emotion Markup Language. We then present an emotion cause detection method by using event extraction framework,where a tree structure-based representation method is used to represent the events. Since the distribution of events is imbalanced in the training data, we propose an under-sampling-based bagging algorithm to solve this problem. Even with a limited training set, the proposed approach may still extract sufficient features for analysis by a bagging of multi-kernel based SVMs method. Evaluations show that our approach achieves an F-measure 7.04%higher than the state-of-the-art methods.
基金The research is supported by the National Natural Science Foundation of China(No.61936010 and No.61876115)This work was partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization.
文摘Document-level financial event extraction(DFEE) is the task of detecting events and extracting the corresponding event arguments in financial documents, which plays an important role in information extraction in the financial domain. This task is challenging as the financial documents are generally long text and event arguments of one event may be scattered in different sentences. To address this issue, we proposed a novel Prior Information Enhanced Extraction framework(PIEE) for DFEE, leveraging prior information from both event types and pre-trained language models. Specifically, PIEE consists of three components: event detection, event argument extraction, and event table filling. In event detection, we identify the event type. Then, the event type is explicitly used for event argument extraction. Meanwhile, the implicit information within language models also provides considerable cues for event arguments localization. Finally, all the event arguments are filled in an event table by a set of predefined heuristic rules. To demonstrate the effectiveness of our proposed framework, we participated in the share task of CCKS2020 Task 4-2: Documentlevel Event Arguments Extraction. On both Leaderboard A and Leaderboard B, PIEE took the first place and significantly outperformed the other systems.
基金supported by the National Key R&D Program of China (No.2020AAA0106600).
文摘Nursing records contain information on patients’treatment processes,which reflect the changes in patients’conditions and have legal effects.However,some of the written records of intensive care unit(ICU)nurses are incomplete according to our observations.This paper proposes an approach extracting structured nursing events from unstructured nursing records for detecting the missing items automatically.According to the PIO(problem,intervention,outcome)principle in the field of medical care,we propose event schemas for nursing records and annotate a Chinese nursing event extraction dataset(CNEED)on ICU nursing records.We find that several events may occur in a nursing record.Therefore,we present a multi-event extraction model for the nursing records.The experimental results demonstrate that our model achieves good results on CNEED and outperforms competitive methods on the multi-event argument attribution problem.By observing the results of automatic event extraction by our model,we detect missing items in the existing nursing records.This proves that our model can be used to help nurses check and improve the method of recording nursing processes.
基金the National Natural Science Foundation of China(Grant Nos.61922085 and 61533018)the Natural Key R&D Program of China(Grant No.2018YFC0830101)+3 种基金the Key Research Program of the Chinese Academy of Sciences(Grant No.ZDBS-SSW-JSC006)Beijing Academy of Artificial Intelligence(BAAI2019QN0301)the Open Project of Beijing Key Laboratory of Mental Disorders(2019JSJB06)the independent research project of National Laboratory of Pattern Recognition。
文摘Relation extraction is a key task for knowledge graph construction and natural language processing,which aims to extract meaningful relational information between entities from plain texts.With the development of deep learning,many neural relation extraction models were proposed recently.This paper introduces a survey on the task of neural relation extraction,including task description,widely used evaluation datasets,metrics,typical methods,challenges and recent research progresses.We mainly focus on four recent research problems:(1)how to learn the semantic representations from the given sentences for the target relation,(2)how to train a neural relation extraction model based on insufficient labeled instances,(3)how to extract relations across sentences or in a document and(4)how to jointly extract relations and corresponding entities?Finally,we give out our conclusion and future research issues.
基金Supported by the National Natural Science Foundation of China(Nos. 60975055 and 60803093)the National High-Tech Research and Development (863) Program of China (No.2008AA01Z144)
文摘Event extraction is an important research point in information extraction, which includes two important sub-tasks of event type recognition and event argument recognition. This paper describes a method based on automatic expansion of the event triggers for event type recognition. The event triggers are first extended through a thesaurus to enable the extraction of the candidate events and their candidate types. Then, a binary classification method is used to recognize the candidate event types. This method effectively improves the unbalanced data problem in training models and the data sparseness problem with a small corpus. Evaluations on the ACE2005 dataset give a final F-score of 61.24%, which outperforms traditional methods based on pure machine learning.
基金funded by the European Research Council(ERC)Starting Grant 714868 awarded to Dr.Erdem Yörük for his project Emerging Welfare。
文摘We describe a gold standard corpus of protest events that comprise various local and international English language sources from various countries.The corpus contains document-,sentence-,and token-level annotations.This corpus facilitates creating machine learning models that automatically classify news articles and extract protest event-related information,constructing knowledge bases that enable comparative social and political science studies.For each news source,the annotation starts with random samples of news articles and continues with samples drawn using active learning.Each batch of samples is annotated by two social and political scientists,adjudicated by an annotation supervisor,and improved by identifying annotation errors semi-automatically.We found that the corpus possesses the variety and quality that are necessary to develop and benchmark text classification and event extraction systems in a cross-context setting,contributing to the generalizability and robustness of automated text processing systems.This corpus and the reported results will establish a common foundation in automated protest event collection studies,which is currently lacking in the literature.
基金This work was supported in part by the National Key R&D Program of China under Grant 2018YFC0830104.
文摘The construction of a case event logic graph for the judgment documentcan more intuitively retrospect the development of the case. This paperproposes a joint model of event extraction and relationship recognition for judgmentdocuments. By extracting the case information in the judgment document,a case event logic graph was constructed. The development process of the casewas shown, and a reference was provided for the analysis of the context of thecase. The experimental results show that the proposed method can extract eventsand identify the relationship between events, and the F1 value reaches 0.809. Thecase event logic graph reveals the development context of the case accurately andvividly.
基金by the French General Directorate for Food(DGAL),the French Agricultural Research Centre for International Development(CIRAD)and the SONGES Project(FEDER and Occitanie)supported by the French National Research Agency under the Investments for the Future Program,referred to as ANR-16-CONV-0004.by EU grant 874850 MOOD and is catalogued as MOOD010.
文摘Event-based surveillance systems are at the crossroads of human and animal(and plant and ecosystem)health,epidemiology,statistics,and informatics.Thus,their deployment faces many challenges specific to each domain and their intersections,such as relations among automation,artificial intelligence,and expertise.In this context,ourwork pertins to the extraction of epidemiological events in textual data(i.e.news)by unsupervised methods.We define the event extraction task as detecting pairs of epidemiological entities(e.g.a disease name and location).The quality of the ranked lists of pairs was evaluated using specific ranking evaluation metrics.We used a publicly available annotated corpus of 438 documents(i.e.news articles)related to animal disease events.The statistical approach was able to detect event-related pairs of epidemiological features with a good trade-off between precision and recall.Our results showed that using a window of words outperformed document-based and sentence-based approaches,while reducing the probability of detecting false pairs.Our results indicated that Mutual Information was less adapted than the Dice coefficient for ranking pairs of features in the event extraction framework.We believe that Mutual Information would be more relevant for rare pair detection(i.e.weak signals),but requires higher manual curation to avoid false positive extraction pairs.Moreover,generalising the country-level spatial features enabled better discrimination(i.e.ranking)of relevant disease-location pairs for event extraction.