Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and c...Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction.展开更多
As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate unders...As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.展开更多
Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained La...Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained Language Models(PLMs)offers new possibilities.PLMs excel at contextual learning,potentially simplifying many natural language processing tasks.However,their application to NER remains underexplored.This paper investigates leveraging the GPT-3 PLM for NER without fine-tuning.We propose a novel scheme that utilizes carefully crafted templates and context examples selected based on semantic similarity.Our experimental results demonstrate the feasibility of this approach,suggesting a promising direction for harnessing PLMs in NER.展开更多
The turbulence mechanism plays an important part in the mixing process and momentum transfer of turbulence. A three-dimensional Prandtl mixing length tidal model has been developed to simulate tidal flows and water qu...The turbulence mechanism plays an important part in the mixing process and momentum transfer of turbulence. A three-dimensional Prandtl mixing length tidal model has been developed to simulate tidal flows and water quality. The eddy viscosities and diffusivities are computed from the Prandtl mixing length model. In order to model the water quality of an estuary or coastal area many interdependent processes need to be simulated. These may be conveniently separated into three main groups: transport and mixing processes, biochemical interaction of water quality variables and the utilization and re-cycling of nutrients by living matter. The model simulates full oxygen and nutrient balance, primary productivity and the transport, reaction mechanism and fate of pollutants over tidal time-scales. The model is applied to numerical simulation of tidal flows and water quality in Dalian Bay. The model has been calibrated against a limited data set of historical water quality observations and in general demonstrates excellent agreement with all available data.展开更多
To study the rock deformation with three- dimensional model under rolling forces of disc cutter, by car- rying out the circular-grooving test with disc cutter rolling around on the rock, the rock mechanical behavior u...To study the rock deformation with three- dimensional model under rolling forces of disc cutter, by car- rying out the circular-grooving test with disc cutter rolling around on the rock, the rock mechanical behavior under rolling disc cutter is studied, the mechanical model of disc cutter rolling around the groove is established, and the the- ory of single-point and double-angle variables is proposed. Based on this theory, the physics equations and geometric equations of rock mechanical behavior under disc cutters of tunnel boring machine (TBM) are studied, and then the bal- ance equations of interactive forces between disc cutter and rock are established. Accordingly, formulas about normal force, rolling force and side force of a disc cutter are de- rived, and their validity is studied by tests. Therefore, a new method and theory is proposed to study rock- breaking mech- anism of disc cutters.展开更多
With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of train...With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of training data using large pre-trained language models,which is a hardware threshold to accomplish this task.Some researchers have achieved competitive results with less training data through ingenious methods,such as utilizing information provided by the named entity recognition model.This paper presents a novel semantic-enhancement-based entity linking approach,named semantically enhanced hardware-friendly entity linking(SHEL),which is designed to be hardware friendly and efficient while maintaining good performance.Specifically,SHEL's semantic enhancement approach consists of three aspects:(1)semantic compression of entity descriptions using a text summarization model;(2)maximizing the capture of mention contexts using asymmetric heuristics;(3)calculating a fixed size mention representation through pooling operations.These series of semantic enhancement methods effectively improve the model's ability to capture semantic information while taking into account the hardware constraints,and significantly improve the model's convergence speed by more than 50%compared with the strong baseline model proposed in this paper.In terms of performance,SHEL is comparable to the previous method,with superior performance on six well-established datasets,even though SHEL is trained using a smaller pre-trained language model as the encoder.展开更多
The nearly analytic discrete(NAD)method is a kind of finite difference method with advantages of high accuracy and stability.Previous studies have investigated the NAD method for simulating wave propagation in the tim...The nearly analytic discrete(NAD)method is a kind of finite difference method with advantages of high accuracy and stability.Previous studies have investigated the NAD method for simulating wave propagation in the time-domain.This study applies the NAD method to solving three-dimensional(3D)acoustic wave equations in the frequency-domain.This forward modeling approach is then used as the“engine”for implementing 3D frequency-domain full waveform inversion(FWI).In the numerical modeling experiments,synthetic examples are first given to show the superiority of the NAD method in forward modeling compared with traditional finite difference methods.Synthetic 3D frequency-domain FWI experiments are then carried out to examine the effectiveness of the proposed methods.The inversion results show that the NAD method is more suitable than traditional methods,in terms of computational cost and stability,for 3D frequency-domain FWI,and represents an effective approach for inversion of subsurface model structures.展开更多
Unlike named entity recognition(NER)for English,the absence of word boundaries reduces the final accuracy for Chinese NER.To avoid accumulated error introduced by word segmentation,a deep model extracting character-le...Unlike named entity recognition(NER)for English,the absence of word boundaries reduces the final accuracy for Chinese NER.To avoid accumulated error introduced by word segmentation,a deep model extracting character-level features is carefully built and becomes a basis for a new Chinese NER method,which is proposed in this paper.This method converts the raw text to a character vector sequence,extracts global text features with a bidirectional long short-term memory and extracts local text features with a soft attention model.A linear chain conditional random field is also used to label all the characters with the help of the global and local text features.Experiments based on the Microsoft Research Asia(MSRA)dataset are designed and implemented.Results show that the proposed method has good performance compared to other methods,which proves that the global and local text features extracted have a positive influence on Chinese NER.For more variety in the test domains,a resume dataset from Sina Finance is also used to prove the effectiveness of the proposed method.展开更多
One of the key problems in collaborative geometric modeling systems is topological entity correspondence when topolog- ical structure of geometry models on collaborative sites changes, ha this article, we propose a so...One of the key problems in collaborative geometric modeling systems is topological entity correspondence when topolog- ical structure of geometry models on collaborative sites changes, ha this article, we propose a solution for tracking topological entity alterations in 3D collaborative modeling environment. We firstly make a thorough analysis and detailed categorization on the altera- tion properties and causations for each type of topological entity, namely topological face and topological edge. Based on collabora- tive topological entity naming mechanism, a data structure called TEST (Topological Entity Structure Tree) is introduced to track the changing history and current state of each topological entity, to embody the relationship among topological entities. Rules and algo- rithms are presented for identification of topological entities referenced by operations for correct execution and model consistency. The algorithm has been verified within the prototype we have implemented with ACIS.展开更多
The extraction and understanding of text knowledge become increasingly crucial in the age of big data.One of the current research areas in the field of natural language processing(NLP)is how to accurately understand t...The extraction and understanding of text knowledge become increasingly crucial in the age of big data.One of the current research areas in the field of natural language processing(NLP)is how to accurately understand the text and collect accurate linguistic information because Chinese vocabulary is diverse and ambiguous.This paper mainly studies the candidate entity generation module of the entity link system.The candidate entity generation module constructs an entity reference expansion algorithm to improve the recall rate of candidate entities.In order to improve the efficiency of the connection algorithm of the entire system while ensuring the recall rate of candidate entities,we design a graph model filtering algorithm that fuses shallow semantic information to filter the list of candidate entities,and verify and analyze the efficiency of the algorithm through experiments.By analyzing the related technology of the entity linking algorithm,we study the related technology of candidate entity generation and entity disambiguation,improve the traditional entity linking algorithm,and give an innovative and practical entity linking model.The recall rate exceeds 82%,and the link accuracy rate exceeds 73%.Efficient and accurate entity linking can help machines to better understand text semantics,further promoting the development of NLP and improving the users’knowledge acquisition experience on the text.展开更多
How to identify topological entities during rebuilding features is a critical problem in Feature-Based Parametric Modeling System (FBPMS). In the article, authors proposes a new coding approach to distinguish differen...How to identify topological entities during rebuilding features is a critical problem in Feature-Based Parametric Modeling System (FBPMS). In the article, authors proposes a new coding approach to distinguish different entities. The coding mechanism is expatiated,and some typical examples are presented. At last, the algorithm of decoding is put forward based on set theory.展开更多
针对农业病害领域命名实体识别过程中存在的预训练语言模型利用不充分、外部知识注入利用率低、嵌套命名实体识别率低的问题,本文提出基于连续提示注入和指针网络的命名实体识别模型CP-MRC(Continuous prompts for machine reading comp...针对农业病害领域命名实体识别过程中存在的预训练语言模型利用不充分、外部知识注入利用率低、嵌套命名实体识别率低的问题,本文提出基于连续提示注入和指针网络的命名实体识别模型CP-MRC(Continuous prompts for machine reading comprehension)。该模型引入BERT(Bidirectional encoder representation from transformers)预训练模型,通过冻结BERT模型原有参数,保留其在预训练阶段获取到的文本表征能力;为了增强模型对领域数据的适用性,在每层Transformer中插入连续可训练提示向量;为提高嵌套命名实体识别的准确性,采用指针网络抽取实体序列。在自建农业病害数据集上开展了对比实验,该数据集包含2933条文本语料,8个实体类型,共10414个实体。实验结果显示,CP-MRC模型的精确率、召回率、F1值达到83.55%、81.4%、82.4%,优于其他模型;在病原、作物两类嵌套实体的识别率较其他模型F1值提升3个百分点和13个百分点,嵌套实体识别率明显提升。本文提出的模型仅采用少量可训练参数仍然具备良好识别性能,为较大规模预训练模型在信息抽取任务上的应用提供了思路。展开更多
为了提高建筑施工安全风险管理的信息化水平,以建筑施工活动及事故风险类型为研究对象,建立施工安全知识图谱。通过知识图谱改进作业条件危险性评价法(LEC)实现安全风险的定量计算,并基于知识图谱进行风险位置识别和不安全因素分析。研...为了提高建筑施工安全风险管理的信息化水平,以建筑施工活动及事故风险类型为研究对象,建立施工安全知识图谱。通过知识图谱改进作业条件危险性评价法(LEC)实现安全风险的定量计算,并基于知识图谱进行风险位置识别和不安全因素分析。研究提出安全风险虚体实化理念,实现了安全风险信息在数字空间实体化表达;基于建筑信息模型(Building Information Modeling, BIM)和知识图谱技术,建立了建筑施工安全风险信息模型(Building Construction Safety Risk Information Model, BCSRIM)。该模型有效避免了传统LEC法中主观因素产生的影响,实现了建筑施工安全风险定量计算、风险位置识别、风险分析及可视化管理。利用Revit二次开发技术,在Microsoft Visual Studio中使用C#语言连接Neo4j图数据库,完成了基于知识图谱的BCSRIM的开发。试验显示,研究提出的BCSRIM对提高施工现场的管理水平具有较高的实用价值。展开更多
由于社交媒体平台上所发布的非结构化信息存在数据不一致、重要程度不同等问题,使自动准确抽取所需信息并标注受灾级别成为一个有挑战性的工作。因此,结合形式概念分析(FCA)、词共现关系和上下文语义信息构建了水灾事件知识体系。利用...由于社交媒体平台上所发布的非结构化信息存在数据不一致、重要程度不同等问题,使自动准确抽取所需信息并标注受灾级别成为一个有挑战性的工作。因此,结合形式概念分析(FCA)、词共现关系和上下文语义信息构建了水灾事件知识体系。利用所构建的知识体系,基于TencentPretrain框架对大规模语言预训练模型(LLM)进行指令微调,构建了ChatFlowFlood信息抽取模型,可以在少量人工标记情况下,准确自动抽取被困情况、紧缺物资等信息;在信息抽取模型的基础上,通过模糊层次分析法(FAHP)和CRITIC法(CRiteria Importance Through Intercriteria Correlation)主客观结合评定求助信息的救援优先级,帮助决策者理解灾情紧急程度。实验结果表明,在中文社交媒体数据上,与ChatFlow-7B模型相比,ChatFlowFlood模型的FBERT指标提升了73.09%。展开更多
基金supported by the Outstanding Youth Team Project of Central Universities(QNTD202308)the Ant Group through CCF-Ant Research Fund(CCF-AFSG 769498 RF20220214).
文摘Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction.
基金financially supported by the Natural Science Foundation of China(Grant No.42301492)the National Key R&D Program of China(Grant Nos.2022YFF0711600,2022YFF0801201,2022YFF0801200)+3 种基金the Major Special Project of Xinjiang(Grant No.2022A03009-3)the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources(Grant No.KF-2022-07014)the Opening Fund of the Key Laboratory of the Geological Survey and Evaluation of the Ministry of Education(Grant No.GLAB 2023ZR01)the Fundamental Research Funds for the Central Universities。
文摘As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.
文摘Named Entity Recognition(NER)is crucial for extracting structured information from text.While traditional methods rely on rules,Conditional Random Fields(CRFs),or deep learning,the advent of large-scale Pre-trained Language Models(PLMs)offers new possibilities.PLMs excel at contextual learning,potentially simplifying many natural language processing tasks.However,their application to NER remains underexplored.This paper investigates leveraging the GPT-3 PLM for NER without fine-tuning.We propose a novel scheme that utilizes carefully crafted templates and context examples selected based on semantic similarity.Our experimental results demonstrate the feasibility of this approach,suggesting a promising direction for harnessing PLMs in NER.
基金The project is supported by The National Natural Science Foundation of China
文摘The turbulence mechanism plays an important part in the mixing process and momentum transfer of turbulence. A three-dimensional Prandtl mixing length tidal model has been developed to simulate tidal flows and water quality. The eddy viscosities and diffusivities are computed from the Prandtl mixing length model. In order to model the water quality of an estuary or coastal area many interdependent processes need to be simulated. These may be conveniently separated into three main groups: transport and mixing processes, biochemical interaction of water quality variables and the utilization and re-cycling of nutrients by living matter. The model simulates full oxygen and nutrient balance, primary productivity and the transport, reaction mechanism and fate of pollutants over tidal time-scales. The model is applied to numerical simulation of tidal flows and water quality in Dalian Bay. The model has been calibrated against a limited data set of historical water quality observations and in general demonstrates excellent agreement with all available data.
基金supported by the National Natural Science Foundation of China (51075147)
文摘To study the rock deformation with three- dimensional model under rolling forces of disc cutter, by car- rying out the circular-grooving test with disc cutter rolling around on the rock, the rock mechanical behavior under rolling disc cutter is studied, the mechanical model of disc cutter rolling around the groove is established, and the the- ory of single-point and double-angle variables is proposed. Based on this theory, the physics equations and geometric equations of rock mechanical behavior under disc cutters of tunnel boring machine (TBM) are studied, and then the bal- ance equations of interactive forces between disc cutter and rock are established. Accordingly, formulas about normal force, rolling force and side force of a disc cutter are de- rived, and their validity is studied by tests. Therefore, a new method and theory is proposed to study rock- breaking mech- anism of disc cutters.
基金the Beijing Municipal Science and Technology Program(Z231100001323004)。
文摘With the help of pre-trained language models,the accuracy of the entity linking task has made great strides in recent years.However,most models with excellent performance require fine-tuning on a large amount of training data using large pre-trained language models,which is a hardware threshold to accomplish this task.Some researchers have achieved competitive results with less training data through ingenious methods,such as utilizing information provided by the named entity recognition model.This paper presents a novel semantic-enhancement-based entity linking approach,named semantically enhanced hardware-friendly entity linking(SHEL),which is designed to be hardware friendly and efficient while maintaining good performance.Specifically,SHEL's semantic enhancement approach consists of three aspects:(1)semantic compression of entity descriptions using a text summarization model;(2)maximizing the capture of mention contexts using asymmetric heuristics;(3)calculating a fixed size mention representation through pooling operations.These series of semantic enhancement methods effectively improve the model's ability to capture semantic information while taking into account the hardware constraints,and significantly improve the model's convergence speed by more than 50%compared with the strong baseline model proposed in this paper.In terms of performance,SHEL is comparable to the previous method,with superior performance on six well-established datasets,even though SHEL is trained using a smaller pre-trained language model as the encoder.
基金supported by the Joint Fund of Seismological Science(Grant No.U1839206)the National R&D Program on Monitoring,Early Warning and Prevention of Major Natural Disaster(Grant No.2017YFC1500301)+2 种基金supported by IGGCAS Research Start-up Funds(Grant No.E0515402)National Natural Science Foundation of China(Grant No.E1115401)supported by National Natural Science Foundation of China(Grant No.11971258).
文摘The nearly analytic discrete(NAD)method is a kind of finite difference method with advantages of high accuracy and stability.Previous studies have investigated the NAD method for simulating wave propagation in the time-domain.This study applies the NAD method to solving three-dimensional(3D)acoustic wave equations in the frequency-domain.This forward modeling approach is then used as the“engine”for implementing 3D frequency-domain full waveform inversion(FWI).In the numerical modeling experiments,synthetic examples are first given to show the superiority of the NAD method in forward modeling compared with traditional finite difference methods.Synthetic 3D frequency-domain FWI experiments are then carried out to examine the effectiveness of the proposed methods.The inversion results show that the NAD method is more suitable than traditional methods,in terms of computational cost and stability,for 3D frequency-domain FWI,and represents an effective approach for inversion of subsurface model structures.
基金Supported by 242 National Information Security Projects(2017A149)。
文摘Unlike named entity recognition(NER)for English,the absence of word boundaries reduces the final accuracy for Chinese NER.To avoid accumulated error introduced by word segmentation,a deep model extracting character-level features is carefully built and becomes a basis for a new Chinese NER method,which is proposed in this paper.This method converts the raw text to a character vector sequence,extracts global text features with a bidirectional long short-term memory and extracts local text features with a soft attention model.A linear chain conditional random field is also used to label all the characters with the help of the global and local text features.Experiments based on the Microsoft Research Asia(MSRA)dataset are designed and implemented.Results show that the proposed method has good performance compared to other methods,which proves that the global and local text features extracted have a positive influence on Chinese NER.For more variety in the test domains,a resume dataset from Sina Finance is also used to prove the effectiveness of the proposed method.
文摘One of the key problems in collaborative geometric modeling systems is topological entity correspondence when topolog- ical structure of geometry models on collaborative sites changes, ha this article, we propose a solution for tracking topological entity alterations in 3D collaborative modeling environment. We firstly make a thorough analysis and detailed categorization on the altera- tion properties and causations for each type of topological entity, namely topological face and topological edge. Based on collabora- tive topological entity naming mechanism, a data structure called TEST (Topological Entity Structure Tree) is introduced to track the changing history and current state of each topological entity, to embody the relationship among topological entities. Rules and algo- rithms are presented for identification of topological entities referenced by operations for correct execution and model consistency. The algorithm has been verified within the prototype we have implemented with ACIS.
基金supported by the Sichuan Science and Technology Program under Grant No.2021YFQ0009。
文摘The extraction and understanding of text knowledge become increasingly crucial in the age of big data.One of the current research areas in the field of natural language processing(NLP)is how to accurately understand the text and collect accurate linguistic information because Chinese vocabulary is diverse and ambiguous.This paper mainly studies the candidate entity generation module of the entity link system.The candidate entity generation module constructs an entity reference expansion algorithm to improve the recall rate of candidate entities.In order to improve the efficiency of the connection algorithm of the entire system while ensuring the recall rate of candidate entities,we design a graph model filtering algorithm that fuses shallow semantic information to filter the list of candidate entities,and verify and analyze the efficiency of the algorithm through experiments.By analyzing the related technology of the entity linking algorithm,we study the related technology of candidate entity generation and entity disambiguation,improve the traditional entity linking algorithm,and give an innovative and practical entity linking model.The recall rate exceeds 82%,and the link accuracy rate exceeds 73%.Efficient and accurate entity linking can help machines to better understand text semantics,further promoting the development of NLP and improving the users’knowledge acquisition experience on the text.
文摘How to identify topological entities during rebuilding features is a critical problem in Feature-Based Parametric Modeling System (FBPMS). In the article, authors proposes a new coding approach to distinguish different entities. The coding mechanism is expatiated,and some typical examples are presented. At last, the algorithm of decoding is put forward based on set theory.
文摘针对农业病害领域命名实体识别过程中存在的预训练语言模型利用不充分、外部知识注入利用率低、嵌套命名实体识别率低的问题,本文提出基于连续提示注入和指针网络的命名实体识别模型CP-MRC(Continuous prompts for machine reading comprehension)。该模型引入BERT(Bidirectional encoder representation from transformers)预训练模型,通过冻结BERT模型原有参数,保留其在预训练阶段获取到的文本表征能力;为了增强模型对领域数据的适用性,在每层Transformer中插入连续可训练提示向量;为提高嵌套命名实体识别的准确性,采用指针网络抽取实体序列。在自建农业病害数据集上开展了对比实验,该数据集包含2933条文本语料,8个实体类型,共10414个实体。实验结果显示,CP-MRC模型的精确率、召回率、F1值达到83.55%、81.4%、82.4%,优于其他模型;在病原、作物两类嵌套实体的识别率较其他模型F1值提升3个百分点和13个百分点,嵌套实体识别率明显提升。本文提出的模型仅采用少量可训练参数仍然具备良好识别性能,为较大规模预训练模型在信息抽取任务上的应用提供了思路。
文摘为了提高建筑施工安全风险管理的信息化水平,以建筑施工活动及事故风险类型为研究对象,建立施工安全知识图谱。通过知识图谱改进作业条件危险性评价法(LEC)实现安全风险的定量计算,并基于知识图谱进行风险位置识别和不安全因素分析。研究提出安全风险虚体实化理念,实现了安全风险信息在数字空间实体化表达;基于建筑信息模型(Building Information Modeling, BIM)和知识图谱技术,建立了建筑施工安全风险信息模型(Building Construction Safety Risk Information Model, BCSRIM)。该模型有效避免了传统LEC法中主观因素产生的影响,实现了建筑施工安全风险定量计算、风险位置识别、风险分析及可视化管理。利用Revit二次开发技术,在Microsoft Visual Studio中使用C#语言连接Neo4j图数据库,完成了基于知识图谱的BCSRIM的开发。试验显示,研究提出的BCSRIM对提高施工现场的管理水平具有较高的实用价值。
文摘由于社交媒体平台上所发布的非结构化信息存在数据不一致、重要程度不同等问题,使自动准确抽取所需信息并标注受灾级别成为一个有挑战性的工作。因此,结合形式概念分析(FCA)、词共现关系和上下文语义信息构建了水灾事件知识体系。利用所构建的知识体系,基于TencentPretrain框架对大规模语言预训练模型(LLM)进行指令微调,构建了ChatFlowFlood信息抽取模型,可以在少量人工标记情况下,准确自动抽取被困情况、紧缺物资等信息;在信息抽取模型的基础上,通过模糊层次分析法(FAHP)和CRITIC法(CRiteria Importance Through Intercriteria Correlation)主客观结合评定求助信息的救援优先级,帮助决策者理解灾情紧急程度。实验结果表明,在中文社交媒体数据上,与ChatFlow-7B模型相比,ChatFlowFlood模型的FBERT指标提升了73.09%。