Identifying semantic types for attributes in relations,known as attribute semantic type(AST)identification,plays an important role in many data analysis tasks,such as data cleaning,schema matching,and keyword search i...Identifying semantic types for attributes in relations,known as attribute semantic type(AST)identification,plays an important role in many data analysis tasks,such as data cleaning,schema matching,and keyword search in databases.However,due to a lack of unified naming standards across prevalent information systems(a.k.a.information islands),AST identification still remains as an open problem.To tackle this problem,we propose a context-aware method to figure out the ASTs for relations in this paper.We transform the AST identification into a multi-class classification problem and propose a schema context aware(SCA)model to learn the representation from a collection of relations associated with attribute values and schema context.Based on the learned representation,we predict the AST for a given attribute from an underlying relation,wherein the predicted AST is mapped to one of the labeled ASTs.To improve the performance for AST identification,especially for the case that the predicted semantic types of attributes are not included in the labeled ASTs,we then introduce knowledge base embeddings(a.k.a.KBVec)to enhance the above representation and construct a schema context aware model with knowledge base enhanced(SCA-KB)to get a stable and robust model.Extensive experiments based on real datasets demonstrate that our context-aware method outperforms the state-of-the-art approaches by a large margin,up to 6.14%and 25.17%in terms of macro average F1 score,and up to 0.28%and 9.56%in terms of weighted F1 score over high-quality and low-quality datasets respectively.展开更多
The paper offers a three-dimensional linguosemiotic study of similes, which implies integral analysis of their semantic, syntactic, and pragmatic aspects. Such an approach to the study of similes is quite new as they ...The paper offers a three-dimensional linguosemiotic study of similes, which implies integral analysis of their semantic, syntactic, and pragmatic aspects. Such an approach to the study of similes is quite new as they have been hitherto considered either from a literary viewpoint as one of the stylistic expressive means of language or in the philosophy of language in correlation with metaphor. The three-dimensional linguosemiotic methodology of research has enabled us: (1) to reveal the cognitive, psychological, and metaphorical essence of similes and work out the invariant conceptual model which remains unchanged throughout their structural-semantic variation in the text; (2) to single out pragmatic features of similes, the set of which defines their linguistic status as a language-in-use construct, i.e., textual phenomenon; (3) to study the denotational-cognitive aspect of similes pointing out the parameters according to which similes have been differentiated into semantic types and subtypes and (4) to generalize the syntactical aspect of similes and define the set of their structural modifications in the text conditioned both by the intralinguistic regularities and by pragmatic factors. Therefore, we have worked out an interdisciplinary theory of similes implying the synergy of the data of linguistic, literary, cognitive, and psychological studies展开更多
In Geographic Information Systems(GIS),geoprocessing workflows allow analysts to organize their methods on spatial data in complex chains.We propose a method for expressing workflows as linked data,and for semi-automa...In Geographic Information Systems(GIS),geoprocessing workflows allow analysts to organize their methods on spatial data in complex chains.We propose a method for expressing workflows as linked data,and for semi-automatically enriching them with semantics on the level of their operations and datasets.Linked workflows can be easily published on the Web and queried for types of inputs,results,or tools.Thus,GIS analysts can reuse their workflows in a modular way,selecting,adapting,and recommending resources based on compatible semantic types.Our typing approach starts from minimal annotations of workflow operations with classes of GIS tools,and then propagates data types and implicit semantic structures through the workflow using an OWL typing scheme and SPARQL rules by backtracking over GIS operations.The method is implemented in Python and is evaluated on two real-world geoprocessing workflows,generated with Esri's ArcGIS.To illustrate the potential applications of our typing method,we formulate and execute competency questions over these workflows.展开更多
基金supported by the National Key Research and Development Program of China under Grant No.2020YFB2104100the National Natural Science Foundation of China under Grant Nos.61972403 and U1711261the Fundamental Research Funds for the Central Universities of China,the Research Funds of Renmin University of China,and Tencent Rhino-Bird Joint Research Program.
文摘Identifying semantic types for attributes in relations,known as attribute semantic type(AST)identification,plays an important role in many data analysis tasks,such as data cleaning,schema matching,and keyword search in databases.However,due to a lack of unified naming standards across prevalent information systems(a.k.a.information islands),AST identification still remains as an open problem.To tackle this problem,we propose a context-aware method to figure out the ASTs for relations in this paper.We transform the AST identification into a multi-class classification problem and propose a schema context aware(SCA)model to learn the representation from a collection of relations associated with attribute values and schema context.Based on the learned representation,we predict the AST for a given attribute from an underlying relation,wherein the predicted AST is mapped to one of the labeled ASTs.To improve the performance for AST identification,especially for the case that the predicted semantic types of attributes are not included in the labeled ASTs,we then introduce knowledge base embeddings(a.k.a.KBVec)to enhance the above representation and construct a schema context aware model with knowledge base enhanced(SCA-KB)to get a stable and robust model.Extensive experiments based on real datasets demonstrate that our context-aware method outperforms the state-of-the-art approaches by a large margin,up to 6.14%and 25.17%in terms of macro average F1 score,and up to 0.28%and 9.56%in terms of weighted F1 score over high-quality and low-quality datasets respectively.
文摘The paper offers a three-dimensional linguosemiotic study of similes, which implies integral analysis of their semantic, syntactic, and pragmatic aspects. Such an approach to the study of similes is quite new as they have been hitherto considered either from a literary viewpoint as one of the stylistic expressive means of language or in the philosophy of language in correlation with metaphor. The three-dimensional linguosemiotic methodology of research has enabled us: (1) to reveal the cognitive, psychological, and metaphorical essence of similes and work out the invariant conceptual model which remains unchanged throughout their structural-semantic variation in the text; (2) to single out pragmatic features of similes, the set of which defines their linguistic status as a language-in-use construct, i.e., textual phenomenon; (3) to study the denotational-cognitive aspect of similes pointing out the parameters according to which similes have been differentiated into semantic types and subtypes and (4) to generalize the syntactical aspect of similes and define the set of their structural modifications in the text conditioned both by the intralinguistic regularities and by pragmatic factors. Therefore, we have worked out an interdisciplinary theory of similes implying the synergy of the data of linguistic, literary, cognitive, and psychological studies
文摘In Geographic Information Systems(GIS),geoprocessing workflows allow analysts to organize their methods on spatial data in complex chains.We propose a method for expressing workflows as linked data,and for semi-automatically enriching them with semantics on the level of their operations and datasets.Linked workflows can be easily published on the Web and queried for types of inputs,results,or tools.Thus,GIS analysts can reuse their workflows in a modular way,selecting,adapting,and recommending resources based on compatible semantic types.Our typing approach starts from minimal annotations of workflow operations with classes of GIS tools,and then propagates data types and implicit semantic structures through the workflow using an OWL typing scheme and SPARQL rules by backtracking over GIS operations.The method is implemented in Python and is evaluated on two real-world geoprocessing workflows,generated with Esri's ArcGIS.To illustrate the potential applications of our typing method,we formulate and execute competency questions over these workflows.