期刊文献+
共找到1,241篇文章
< 1 2 63 >
每页显示 20 50 100
GeoNER:Geological Named Entity Recognition with Enriched Domain Pre-Training Model and Adversarial Training
1
作者 MA Kai HU Xinxin +4 位作者 TIAN Miao TAN Yongjian ZHENG Shuai TAO Liufeng QIU Qinjun 《Acta Geologica Sinica(English Edition)》 SCIE CAS CSCD 2024年第5期1404-1417,共14页
As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate unders... As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information. 展开更多
关键词 geological named entity recognition geological report adversarial training confrontation training global pointer pre-training model
下载PDF
RoBGP:A Chinese Nested Biomedical Named Entity Recognition Model Based on RoBERTa and Global Pointer
2
作者 Xiaohui Cui Chao Song +4 位作者 Dongmei Li Xiaolong Qu Jiao Long Yu Yang Hanchao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3603-3618,共16页
Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and c... Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction. 展开更多
关键词 BIOMEDICINE knowledge base named entity recognition pretrained language model global pointer
下载PDF
SciCN:A Scientific Dataset for Chinese Named Entity Recognition
3
作者 Jing Yang Bin Ji +2 位作者 Shasha Li Jun Ma Jie Yu 《Computers, Materials & Continua》 SCIE EI 2024年第3期4303-4315,共13页
Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly prom... Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly promoted the NER research in the English field.By contrast,much fewer efforts are made to the Chinese NER research,especially in the scientific domain,due to the scarcity of Chinese NER datasets.To alleviate this problem,we present aChinese scientificNER dataset–SciCN,which contains entity annotations of titles and abstracts derived from 3,500 scientific papers.We manually annotate a total of 62,059 entities,and these entities are classified into six types.Compared to English scientific NER datasets,SciCN has a larger scale and is more diverse,for it not only contains more paper abstracts but these abstracts are derived from more research fields.To investigate the properties of SciCN and provide baselines for future research,we adapt a number of previous state-of-theart Chinese NER models to evaluate SciCN.Experimental results show that SciCN is more challenging than other Chinese NER datasets.In addition,previous studies have proven the effectiveness of using lexicons to enhance Chinese NER models.Motivated by this fact,we provide a scientific domain-specific lexicon.Validation results demonstrate that our lexicon delivers better performance gains than lexicons of other domains.We hope that the SciCN dataset and the lexicon will enable us to benchmark the NER task regarding the Chinese scientific domain and make progress for future research.The dataset and lexicon are available at:https://github.com/yangjingla/SciCN.git. 展开更多
关键词 named entity recognition DATASET scientific information extraction LEXICON
下载PDF
A U-Shaped Network-Based Grid Tagging Model for Chinese Named Entity Recognition
4
作者 Yan Xiang Xuedong Zhao +3 位作者 Junjun Guo Zhiliang Shi Enbang Chen Xiaobo Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第6期4149-4167,共19页
Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or d... Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively. 展开更多
关键词 Chinese named entity recognition character-pair relation classification grid tagging U-shaped segmentation network
下载PDF
A Novel Optimization Scheme for Named Entity Recognition with Pre-trained Language Models
5
作者 Shuanglong Li Xulong Zhang Jianzong Wang 《Journal of Electronic Research and Application》 2024年第5期125-133,共9页
Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained La... Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained Language Models(PLMs)offers new possibilities.PLMs excel at contextual learning,potentially simplifying many natural language processing tasks.However,their application to NER remains underexplored.This paper investigates leveraging the GPT-3 PLM for NER without fine-tuning.We propose a novel scheme that utilizes carefully crafted templates and context examples selected based on semantic similarity.Our experimental results demonstrate the feasibility of this approach,suggesting a promising direction for harnessing PLMs in NER. 展开更多
关键词 GPT-3 named Entity recognition Sentence-BERT model In-context example
下载PDF
RIB-NER:基于跨度的中文命名实体识别模型
6
作者 田红鹏 吴璟玮 《计算机工程与科学》 CSCD 北大核心 2024年第7期1311-1320,共10页
命名实体识别是自然语言处理领域中诸多下游任务的重要基础。汉语作为重要的国际语言,在许多方面具有独特性。传统上,中文命名实体识别任务模型使用序列标记机制,该机制需要条件随机场捕获标签的依赖性,然而,这种方法容易出现标签的错... 命名实体识别是自然语言处理领域中诸多下游任务的重要基础。汉语作为重要的国际语言,在许多方面具有独特性。传统上,中文命名实体识别任务模型使用序列标记机制,该机制需要条件随机场捕获标签的依赖性,然而,这种方法容易出现标签的错误分类。针对这个问题,提出基于跨度的命名实体识别模型RIB-NER。首先,以RoBERTa-wwm-ext作为模型嵌入层,提供字符级嵌入,以获得更多的上下文语义信息和词汇信息。其次,利用IDCNN的并行卷积核来增强词之间的位置信息,从而使词与词之间联系更加紧密。同时,在模型中融合BiLSTM网络来获取上下文信息。最后,采用双仿射模型对句子中的开始标记和结束标记评分,使用这些标记探索跨度。在MSRA和Weibo 2个语料库上的实验结果表明,RIB-NER能够较为准确地识别实体边界,并分别获得了95.11%和73.94%的F1值。与传统深度学习相比,有更好的识别效果。 展开更多
关键词 中文命名实体识别 双仿射模型 迭代膨胀卷积神经网络 预训练模型 跨度
下载PDF
A Federated Named Entity Recognition Model with Explicit Relation for Power Grid 被引量:2
7
作者 Jingtang Luo Shiying Yao +2 位作者 Changming Zhao Jie Xu Jim Feng 《Computers, Materials & Continua》 SCIE EI 2023年第5期4207-4216,共10页
The power grid operation process is complex,and many operation process data involve national security,business secrets,and user privacy.Meanwhile,labeled datasets may exist in many different operation platforms,but th... The power grid operation process is complex,and many operation process data involve national security,business secrets,and user privacy.Meanwhile,labeled datasets may exist in many different operation platforms,but they cannot be directly shared since power grid data is highly privacysensitive.How to use these multi-source heterogeneous data as much as possible to build a power grid knowledge map under the premise of protecting privacy security has become an urgent problem in developing smart grid.Therefore,this paper proposes federated learning named entity recognition method for the power grid field,aiming to solve the problem of building a named entity recognition model covering the entire power grid process training by data with different security requirements.We decompose the named entity recognition(NER)model FLAT(Chinese NER Using Flat-Lattice Transformer)in each platform into a global part and a local part.The local part is used to capture the characteristics of the local data in each platform and is updated using locally labeled data.The global part is learned across different operation platforms to capture the shared NER knowledge.Its local gradients fromdifferent platforms are aggregated to update the global model,which is further delivered to each platform to update their global part.Experiments on two publicly available Chinese datasets and one power grid dataset validate the effectiveness of our method. 展开更多
关键词 Power grid named entity recognition federal learning
下载PDF
Chinese Cyber Threat Intelligence Named Entity Recognition via RoBERTa-wwm-RDCNN-CRF 被引量:1
8
作者 Zhen Zhen Jian Gao 《Computers, Materials & Continua》 SCIE EI 2023年第10期299-323,共25页
In recent years,cyber attacks have been intensifying and causing great harm to individuals,companies,and countries.The mining of cyber threat intelligence(CTI)can facilitate intelligence integration and serve well in ... In recent years,cyber attacks have been intensifying and causing great harm to individuals,companies,and countries.The mining of cyber threat intelligence(CTI)can facilitate intelligence integration and serve well in combating cyber attacks.Named Entity Recognition(NER),as a crucial component of text mining,can structure complex CTI text and aid cybersecurity professionals in effectively countering threats.However,current CTI NER research has mainly focused on studying English CTI.In the limited studies conducted on Chinese text,existing models have shown poor performance.To fully utilize the power of Chinese pre-trained language models(PLMs)and conquer the problem of lengthy infrequent English words mixing in the Chinese CTIs,we propose a residual dilated convolutional neural network(RDCNN)with a conditional random field(CRF)based on a robustly optimized bidirectional encoder representation from transformers pre-training approach with whole word masking(RoBERTa-wwm),abbreviated as RoBERTa-wwm-RDCNN-CRF.We are the first to experiment on the relevant open source dataset and achieve an F1-score of 82.35%,which exceeds the common baseline model bidirectional encoder representation from transformers(BERT)-bidirectional long short-term memory(BiLSTM)-CRF in this field by about 19.52%and exceeds the current state-of-the-art model,BERT-RDCNN-CRF,by about 3.53%.In addition,we conducted an ablation study on the encoder part of the model to verify the effectiveness of the proposed model and an in-depth investigation of the PLMs and encoder part of the model to verify the effectiveness of the proposed model.The RoBERTa-wwm-RDCNN-CRF model,the shared pre-processing,and augmentation methods can serve the subsequent fundamental tasks such as cybersecurity information extraction and knowledge graph construction,contributing to important applications in downstream tasks such as intrusion detection and advanced persistent threat(APT)attack detection. 展开更多
关键词 CYBERSECURITY cyber threat intelligence named entity recognition
下载PDF
Data Masking for Chinese Electronic Medical Records with Named Entity Recognition 被引量:1
9
作者 Tianyu He Xiaolong Xu +3 位作者 Zhichen Hu Qingzhan Zhao Jianguo Dai Fei Dai 《Intelligent Automation & Soft Computing》 SCIE 2023年第6期3657-3673,共17页
With the rapid development of information technology,the electronifi-cation of medical records has gradually become a trend.In China,the population base is huge and the supporting medical institutions are numerous,so ... With the rapid development of information technology,the electronifi-cation of medical records has gradually become a trend.In China,the population base is huge and the supporting medical institutions are numerous,so this reality drives the conversion of paper medical records to electronic medical records.Electronic medical records are the basis for establishing a smart hospital and an important guarantee for achieving medical intelligence,and the massive amount of electronic medical record data is also an important data set for conducting research in the medical field.However,electronic medical records contain a large amount of private patient information,which must be desensitized before they are used as open resources.Therefore,to solve the above problems,data masking for Chinese electronic medical records with named entity recognition is proposed in this paper.Firstly,the text is vectorized to satisfy the required format of the model input.Secondly,since the input sentences may have a long or short length and the relationship between sentences in context is not negligible.To this end,a neural network model for named entity recognition based on bidirectional long short-term memory(BiLSTM)with conditional random fields(CRF)is constructed.Finally,the data masking operation is performed based on the named entity recog-nition results,mainly using regular expression filtering encryption and principal component analysis(PCA)word vector compression and replacement.In addi-tion,comparison experiments with the hidden markov model(HMM)model,LSTM-CRF model,and BiLSTM model are conducted in this paper.The experi-mental results show that the method used in this paper achieves 92.72%Accuracy,92.30%Recall,and 92.51%F1_score,which has higher accuracy compared with other models. 展开更多
关键词 named entity recognition Chinese electronic medical records data masking principal component analysis regular expression
下载PDF
Corpus of Carbonate Platforms with Lexical Annotations for Named Entity Recognition
10
作者 Zhichen Hu Huali Ren +3 位作者 Jielin Jiang Yan Cui Xiumian Hu Xiaolong Xu 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第4期91-108,共18页
An obviously challenging problem in named entity recognition is the construction of the kind data set of entities.Although some research has been conducted on entity database construction,the majority of them are dire... An obviously challenging problem in named entity recognition is the construction of the kind data set of entities.Although some research has been conducted on entity database construction,the majority of them are directed at Wikipedia or the minority at structured entities such as people,locations and organizational nouns in the news.This paper focuses on the identification of scientific entities in carbonate platforms in English literature,using the example of carbonate platforms in sedimentology.Firstly,based on the fact that the reasons for writing literature in key disciplines are likely to be provided by multidisciplinary experts,this paper designs a literature content extraction method that allows dealing with complex text structures.Secondly,based on the literature extraction content,we formalize the entity extraction task(lexicon and lexical-based entity extraction)for entity extraction.Furthermore,for testing the accuracy of entity extraction,three currently popular recognition methods are chosen to perform entity detection in this paper.Experiments show that the entity data set provided by the lexicon and lexical-based entity extraction method is of significant assistance for the named entity recognition task.This study presents a pilot study of entity extraction,which involves the use of a complex structure and specialized literature on carbonate platforms in English. 展开更多
关键词 named entity recognition carbonate platform corpus entity extraction english literature detection
下载PDF
Dart Games Optimizer with Deep Learning-Based Computational Linguistics Named Entity Recognition
11
作者 Mesfer Al Duhayyim Hala J.Alshahrani +5 位作者 Khaled Tarmissi Heyam H.Al-Baity Abdullah Mohamed Ishfaq Yaseen Amgad Atta Abdelmageed Mohamed IEldesouki 《Intelligent Automation & Soft Computing》 SCIE 2023年第9期2549-2566,共18页
Computational linguistics is an engineering-based scientific discipline.It deals with understanding written and spoken language from a computational viewpoint.Further,the domain also helps construct the artefacts that... Computational linguistics is an engineering-based scientific discipline.It deals with understanding written and spoken language from a computational viewpoint.Further,the domain also helps construct the artefacts that are useful in processing and producing a language either in bulk or in a dialogue setting.Named Entity Recognition(NER)is a fundamental task in the data extraction process.It concentrates on identifying and labelling the atomic components from several texts grouped under different entities,such as organizations,people,places,and times.Further,the NER mechanism identifies and removes more types of entities as per the requirements.The significance of the NER mechanism has been well-established in Natural Language Processing(NLP)tasks,and various research investigations have been conducted to develop novel NER methods.The conventional ways of managing the tasks range from rule-related and hand-crafted feature-related Machine Learning(ML)techniques to Deep Learning(DL)techniques.In this aspect,the current study introduces a novel Dart Games Optimizer with Hybrid Deep Learning-Driven Computational Linguistics(DGOHDL-CL)model for NER.The presented DGOHDL-CL technique aims to determine and label the atomic components from several texts as a collection of the named entities.In the presented DGOHDL-CL technique,the word embed-ding process is executed at the initial stage with the help of the word2vec model.For the NER mechanism,the Convolutional Gated Recurrent Unit(CGRU)model is employed in this work.At last,the DGO technique is used as a hyperparameter tuning strategy for the CGRU algorithm to boost the NER’s outcomes.No earlier studies integrated the DGO mechanism with the CGRU model for NER.To exhibit the superiority of the proposed DGOHDL-CL technique,a widespread simulation analysis was executed on two datasets,CoNLL-2003 and OntoNotes 5.0.The experimental outcomes establish the promising performance of the DGOHDL-CL technique over other models. 展开更多
关键词 named entity recognition deep learning natural language processing computational linguistics dart games optimizer
下载PDF
CMNER:基于微博的中文多模态实体识别数据集
12
作者 季源泽 李霏 《计算机技术与发展》 2024年第10期110-117,共8页
多模态命名实体识别(MNER)旨在通过相关图像的辅助从文本中定位并分类命名实体。目前,中文多模态命名实体识别研究缺乏相关的人工标注数据,限制了中文多模态命名实体识别的发展。该文旨在构建一个基于社交媒体平台的中文MNER数据集,收集... 多模态命名实体识别(MNER)旨在通过相关图像的辅助从文本中定位并分类命名实体。目前,中文多模态命名实体识别研究缺乏相关的人工标注数据,限制了中文多模态命名实体识别的发展。该文旨在构建一个基于社交媒体平台的中文MNER数据集,收集了5000条微博帖子和18326张相应的图像,并人工标注了其中的人名、地名、组织机构名和其他类实体。该文在此数据集上应用了ACN模型和UMT模型进行基线实验。实验结果表明,两个模型的F1值分别达到了74.22%和89.50%,证明了数据集的有效性和可用性。此外,该文还进行了跨语言迁移学习实验,证明了中文和英文MNER数据能够相互补充,增强实体识别模型的性能。为了促进中文多模态命名实体识别的相关研究,该文公开了CMNER数据集和相关代码。 展开更多
关键词 多模态命名实体识别 图像 命名实体 中文 跨语言
下载PDF
基于BERT多知识图融合嵌入的中文NER模型 被引量:1
13
作者 张凤荔 黄鑫 +2 位作者 王瑞锦 周志远 韩英军 《电子科技大学学报》 EI CAS CSCD 北大核心 2023年第3期390-397,共8页
针对目前特定领域知识图谱构建效率低、领域已有知识图谱利用率不足、传统模型提取领域语义专业性强实体困难的问题,提出了基于BERT多知识图融合嵌入的中文NER模型(BERT-FKG),实现了对多个知识图通过融合语义进行实体间属性共享,丰富了... 针对目前特定领域知识图谱构建效率低、领域已有知识图谱利用率不足、传统模型提取领域语义专业性强实体困难的问题,提出了基于BERT多知识图融合嵌入的中文NER模型(BERT-FKG),实现了对多个知识图通过融合语义进行实体间属性共享,丰富了句子嵌入的知识。该模型在开放域和医疗领域的中文NER任务中,表现出了更好的性能。实验结果表明,多个领域知识图通过计算语义相似度进行相似实体的属性共享,能够使模型吸纳更多的领域知识,提高在NER任务中的准确率。 展开更多
关键词 BERT 中文命名实体识别 医疗领域 多知识图融合嵌入
下载PDF
FCG-NNER:一种融合字形信息的中文嵌套命名实体识别方法
14
作者 陈鹏 马洪彬 +2 位作者 周佳伦 李琳宇 余肖生 《重庆理工大学学报(自然科学)》 CAS 北大核心 2023年第12期222-231,共10页
基于跨度的模型是嵌套命名实体识别的主要方法,其核心是将实体识别问题转化为跨度分类问题。而在中文数据集中,由于中文单词不具有明显的分割符号,导致语义和边界信息不明确,进而造成中文嵌套命名实体识别效果不佳。为了解决这一问题,... 基于跨度的模型是嵌套命名实体识别的主要方法,其核心是将实体识别问题转化为跨度分类问题。而在中文数据集中,由于中文单词不具有明显的分割符号,导致语义和边界信息不明确,进而造成中文嵌套命名实体识别效果不佳。为了解决这一问题,提出了融合字形信息的基于跨度的中文嵌套命名实体识别算法——FCG-NNER,首先通过卷积神经网络获取汉字的字形信息,其次通过交叉Biaffine双仿射解码层实现原文信息与字形信息融合,然后通过对角融合CNN层获取不同跨度之间的局部相互作用,最后将交叉Biaffine双仿射解码层的输出与对角融合CNN层的输出相加后输入到全连接层中,得到最终的预测结果。采用2个具有代表性的中文嵌套NER数据集(CMeEE和CLUENER2020)用于实验验证。结果显示,FCG-NNER在CMeEE数据集中的精度为65.02%,召回率为67.93%,F1值达到0.664 4;在CLUENER2020数据集中的精度为79.45%,召回率为82.33%,F1值达到0.808 6,证明FCG-NNER算法的性能明显超过2个数据集的基线。 展开更多
关键词 中文嵌套命名实体识别 字形特征 跨度分类 特征融合
下载PDF
Arabic Named Entity Recognition:A BERT-BGRU Approach 被引量:5
15
作者 Norah Alsaaran Maha Alrabiah 《Computers, Materials & Continua》 SCIE EI 2021年第7期471-485,共15页
Named Entity Recognition(NER)is one of the fundamental tasks in Natural Language Processing(NLP),which aims to locate,extract,and classify named entities into a predefined category such as person,organization and loca... Named Entity Recognition(NER)is one of the fundamental tasks in Natural Language Processing(NLP),which aims to locate,extract,and classify named entities into a predefined category such as person,organization and location.Most of the earlier research for identifying named entities relied on using handcrafted features and very large knowledge resources,which is time consuming and not adequate for resource-scarce languages such as Arabic.Recently,deep learning achieved state-of-the-art performance on many NLP tasks including NER without requiring hand-crafted features.In addition,transfer learning has also proven its efficiency in several NLP tasks by exploiting pretrained language models that are used to transfer knowledge learned from large-scale datasets to domain-specific tasks.Bidirectional Encoder Representation from Transformer(BERT)is a contextual language model that generates the semantic vectors dynamically according to the context of the words.BERT architecture relay on multi-head attention that allows it to capture global dependencies between words.In this paper,we propose a deep learning-based model by fine-tuning BERT model to recognize and classify Arabic named entities.The pre-trained BERT context embeddings were used as input features to a Bidirectional Gated Recurrent Unit(BGRU)and were fine-tuned using two annotated Arabic Named Entity Recognition(ANER)datasets.Experimental results demonstrate that the proposed model outperformed state-of-the-art ANER models achieving 92.28%and 90.68%F-measure values on the ANERCorp dataset and the merged ANERCorp and AQMAR dataset,respectively. 展开更多
关键词 named entity recognition ARABIC deep learning BGRU BERT
下载PDF
Adversarial Active Learning for Named Entity Recognition in Cybersecurity 被引量:4
16
作者 Tao Li Yongjin Hu +1 位作者 Ankang Ju Zhuoran Hu 《Computers, Materials & Continua》 SCIE EI 2021年第1期407-420,共14页
Owing to the continuous barrage of cyber threats,there is a massive amount of cyber threat intelligence.However,a great deal of cyber threat intelligence come from textual sources.For analysis of cyber threat intellig... Owing to the continuous barrage of cyber threats,there is a massive amount of cyber threat intelligence.However,a great deal of cyber threat intelligence come from textual sources.For analysis of cyber threat intelligence,many security analysts rely on cumbersome and time-consuming manual efforts.Cybersecurity knowledge graph plays a significant role in automatics analysis of cyber threat intelligence.As the foundation for constructing cybersecurity knowledge graph,named entity recognition(NER)is required for identifying critical threat-related elements from textual cyber threat intelligence.Recently,deep neural network-based models have attained very good results in NER.However,the performance of these models relies heavily on the amount of labeled data.Since labeled data in cybersecurity is scarce,in this paper,we propose an adversarial active learning framework to effectively select the informative samples for further annotation.In addition,leveraging the long short-term memory(LSTM)network and the bidirectional LSTM(BiLSTM)network,we propose a novel NER model by introducing a dynamic attention mechanism into the BiLSTM-LSTM encoderdecoder.With the selected informative samples annotated,the proposed NER model is retrained.As a result,the performance of the NER model is incrementally enhanced with low labeling cost.Experimental results show the effectiveness of the proposed method. 展开更多
关键词 Adversarial learning active learning named entity recognition dynamic attention mechanism
下载PDF
Named Entity Recognition for Nepali Text Using Support Vector Machines 被引量:3
17
作者 Surya Bahadur Bam Tej Bahadur Shahi 《Intelligent Information Management》 2014年第2期21-29,共9页
Named Entity Recognition aims to identify and to classify rigid designators in text such as proper names, biological species, and temporal expressions into some predefined categories. There has been growing interest i... Named Entity Recognition aims to identify and to classify rigid designators in text such as proper names, biological species, and temporal expressions into some predefined categories. There has been growing interest in this field of research since the early 1990s. Named Entity Recognition has a vital role in different fields of natural language processing such as Machine Translation, Information Extraction, Question Answering System and various other fields. In this paper, Named Entity Recognition for Nepali text, based on the Support Vector Machine (SVM) is presented which is one of machine learning approaches for the classification task. A set of features are extracted from training data set. Accuracy and efficiency of SVM classifier are analyzed in three different sizes of training data set. Recognition systems are tested with ten datasets for Nepali text. The strength of this work is the efficient feature extraction and the comprehensive recognition techniques. The Support Vector Machine based Named Entity Recognition is limited to use a certain set of features and it uses a small dictionary which affects its performance. The learning performance of recognition system is observed. It is found that system can learn well from the small set of training data and increase the rate of learning on the increment of training size. 展开更多
关键词 Support VECTOR MACHINE named ENTITY recognition MACHINE Learning Classification Nepali LANGUAGE TEXT
下载PDF
A CONDITIONAL RANDOM FIELDS APPROACH TO BIOMEDICAL NAMED ENTITY RECOGNITION 被引量:4
18
作者 Wang Haochang Zhao Tiejun Li Sheng Yu Hao 《Journal of Electronics(China)》 2007年第6期838-844,共7页
Named entity recognition is a fundamental task in biomedical data mining. In this letter, a named entity recognition system based on CRFs (Conditional Random Fields) for biomedical texts is presented. The system mak... Named entity recognition is a fundamental task in biomedical data mining. In this letter, a named entity recognition system based on CRFs (Conditional Random Fields) for biomedical texts is presented. The system makes extensive use of a diverse set of features, including local features, full text features and external resource features. All features incorporated in this system are described in detail, and the impacts of different feature sets on the performance of the system are evaluated. In order to improve the performance of system, post-processing modules are exploited to deal with the abbreviation phenomena, cascaded named entity and boundary errors identification. Evaluation on this system proved that the feature selection has important impact on the system performance, and the post-processing explored has an important contribution on system performance to achieve better resuits. 展开更多
关键词 Conditional Random Fields (CRFs) named entity recognition Feature selection Post-processing
下载PDF
Chinese Named Entity Recognition with Character-Level BLSTM and Soft Attention Model 被引量:1
19
作者 Jize Yin Senlin Luo +1 位作者 Zhouting Wu Limin Pan 《Journal of Beijing Institute of Technology》 EI CAS 2020年第1期60-71,共12页
Unlike named entity recognition(NER)for English,the absence of word boundaries reduces the final accuracy for Chinese NER.To avoid accumulated error introduced by word segmentation,a deep model extracting character-le... Unlike named entity recognition(NER)for English,the absence of word boundaries reduces the final accuracy for Chinese NER.To avoid accumulated error introduced by word segmentation,a deep model extracting character-level features is carefully built and becomes a basis for a new Chinese NER method,which is proposed in this paper.This method converts the raw text to a character vector sequence,extracts global text features with a bidirectional long short-term memory and extracts local text features with a soft attention model.A linear chain conditional random field is also used to label all the characters with the help of the global and local text features.Experiments based on the Microsoft Research Asia(MSRA)dataset are designed and implemented.Results show that the proposed method has good performance compared to other methods,which proves that the global and local text features extracted have a positive influence on Chinese NER.For more variety in the test domains,a resume dataset from Sina Finance is also used to prove the effectiveness of the proposed method. 展开更多
关键词 Chinese named ENTITY recognition(ner) character-level BIDIRECTIONAL long SHORT-TERM memory SOFT attention model
下载PDF
Number Entities Recognition in Multiple Rounds of Dialogue Systems 被引量:1
20
作者 Shan Zhang Bin Cao +1 位作者 Yueshen Xu Jing Fan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2021年第4期309-323,共15页
As a representative technique in natural language processing(NLP),named entity recognition is used in many tasks,such as dialogue systems,machine translation and information extraction.In dialogue systems,there is a c... As a representative technique in natural language processing(NLP),named entity recognition is used in many tasks,such as dialogue systems,machine translation and information extraction.In dialogue systems,there is a common case for named entity recognition,where a lot of entities are composed of numbers,and are segmented to be located in different places.For example,in multiple rounds of dialogue systems,a phone number is likely to be divided into several parts,because the phone number is usually long and is emphasized.In this paper,the entity consisting of numbers is named as number entity.The discontinuous positions of number entities result from many reasons.We find two reasons from real-world dialogue systems.The first reason is the repetitive confirmation of different components of a number entity,and the second reason is the interception of mood words.The extraction of number entities is quite useful in many tasks,such as user information completion and service requests correction.However,the existing entity extraction methods cannot extract entities consisting of discontinuous entity blocks.To address these problems,in this paper,we propose a comprehensive method for number entity recognition,which is capable of extracting number entities in multiple rounds of dialogues systems.We conduct extensive experiments on a real-world dataset,and the experimental results demonstrate the high performance of our method. 展开更多
关键词 Natural language processing dialogue systems named entity recognition number entity discontinuous entity blocks
下载PDF
上一页 1 2 63 下一页 到第
使用帮助 返回顶部