The joint entity relation extraction model which integrates the semantic information of relation is favored by relevant researchers because of its effectiveness in solving the overlapping of entities,and the method of...The joint entity relation extraction model which integrates the semantic information of relation is favored by relevant researchers because of its effectiveness in solving the overlapping of entities,and the method of defining the semantic template of relation manually is particularly prominent in the extraction effect because it can obtain the deep semantic information of relation.However,this method has some problems,such as relying on expert experience and poor portability.Inspired by the rule-based entity relation extraction method,this paper proposes a joint entity relation extraction model based on a relation semantic template automatically constructed,which is abbreviated as RSTAC.This model refines the extraction rules of relation semantic templates from relation corpus through dependency parsing and realizes the automatic construction of relation semantic templates.Based on the relation semantic template,the process of relation classification and triplet extraction is constrained,and finally,the entity relation triplet is obtained.The experimental results on the three major Chinese datasets of DuIE,SanWen,and FinRE showthat the RSTAC model successfully obtains rich deep semantics of relation,improves the extraction effect of entity relation triples,and the F1 scores are increased by an average of 0.96% compared with classical joint extraction models such as CasRel,TPLinker,and RFBFN.展开更多
More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditi...More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.展开更多
Spreadsheets contain a lot of valuable data and have many practical applications.The key technology of these practical applications is how to make machines understand the semantic structure of spreadsheets,e.g.,identi...Spreadsheets contain a lot of valuable data and have many practical applications.The key technology of these practical applications is how to make machines understand the semantic structure of spreadsheets,e.g.,identifying cell function types and discovering relationships between cell pairs.Most existing methods for understanding the semantic structure of spreadsheets do not make use of the semantic information of cells.A few studies do,but they ignore the layout structure information of spreadsheets,which affects the performance of cell function classification and the discovery of different relationship types of cell pairs.In this paper,we propose a Heuristic algorithm for Understanding the Semantic Structure of spreadsheets(HUSS).Specifically,for improving the cell function classification,we propose an error correction mechanism(ECM)based on an existing cell function classification model[11]and the layout features of spreadsheets.For improving the table structure analysis,we propose five types of heuristic rules to extract four different types of cell pairs,based on the cell style and spatial location information.Our experimental results on five real-world datasets demonstrate that HUSS can effectively understand the semantic structure of spreadsheets and outperforms corresponding baselines.展开更多
基金supported by the National Natural Science Foundation of China(Nos.U1804263,U1736214,62172435)the Zhongyuan Science and Technology Innovation Leading Talent Project(No.214200510019).
文摘The joint entity relation extraction model which integrates the semantic information of relation is favored by relevant researchers because of its effectiveness in solving the overlapping of entities,and the method of defining the semantic template of relation manually is particularly prominent in the extraction effect because it can obtain the deep semantic information of relation.However,this method has some problems,such as relying on expert experience and poor portability.Inspired by the rule-based entity relation extraction method,this paper proposes a joint entity relation extraction model based on a relation semantic template automatically constructed,which is abbreviated as RSTAC.This model refines the extraction rules of relation semantic templates from relation corpus through dependency parsing and realizes the automatic construction of relation semantic templates.Based on the relation semantic template,the process of relation classification and triplet extraction is constrained,and finally,the entity relation triplet is obtained.The experimental results on the three major Chinese datasets of DuIE,SanWen,and FinRE showthat the RSTAC model successfully obtains rich deep semantics of relation,improves the extraction effect of entity relation triples,and the F1 scores are increased by an average of 0.96% compared with classical joint extraction models such as CasRel,TPLinker,and RFBFN.
基金supported by the Knowledge Innovation Program of the Chinese Academy of Sciencesthe National High-Tech R&D Program of China(2008BAK49B05)
文摘More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.
基金supported in part by the National Natural Science Foundation of China under Grants(Nos.62120106008,61806065,61906059,62076085,91746209 and 62076087)the Fundamental Research Funds for the Central Universities(No.JZ2020HGQA0186).
文摘Spreadsheets contain a lot of valuable data and have many practical applications.The key technology of these practical applications is how to make machines understand the semantic structure of spreadsheets,e.g.,identifying cell function types and discovering relationships between cell pairs.Most existing methods for understanding the semantic structure of spreadsheets do not make use of the semantic information of cells.A few studies do,but they ignore the layout structure information of spreadsheets,which affects the performance of cell function classification and the discovery of different relationship types of cell pairs.In this paper,we propose a Heuristic algorithm for Understanding the Semantic Structure of spreadsheets(HUSS).Specifically,for improving the cell function classification,we propose an error correction mechanism(ECM)based on an existing cell function classification model[11]and the layout features of spreadsheets.For improving the table structure analysis,we propose five types of heuristic rules to extract four different types of cell pairs,based on the cell style and spatial location information.Our experimental results on five real-world datasets demonstrate that HUSS can effectively understand the semantic structure of spreadsheets and outperforms corresponding baselines.