In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple e...In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach.展开更多
Building model data organization is often programmed to solve a specific problem,resulting in the inability to organize indoor and outdoor 3D scenes in an integrated manner.In this paper,existing building spatial data...Building model data organization is often programmed to solve a specific problem,resulting in the inability to organize indoor and outdoor 3D scenes in an integrated manner.In this paper,existing building spatial data models are studied,and the characteristics of building information modeling standards(IFC),city geographic modeling language(CityGML),indoor modeling language(IndoorGML),and other models are compared and analyzed.CityGML and IndoorGML models face challenges in satisfying diverse application scenarios and requirements due to limitations in their expression capabilities.It is proposed to combine the semantic information of the model objects to effectively partition and organize the indoor and outdoor spatial 3D model data and to construct the indoor and outdoor data organization mechanism of“chunk-layer-subobject-entrances-area-detail object.”This method is verified by proposing a 3D data organization method for indoor and outdoor space and constructing a 3D visualization system based on it.展开更多
Marine information has been increasing quickly. The traditional database technologies have disadvantages in manipulating large amounts of marine information which relates to the position in 3-D with the time. Recently...Marine information has been increasing quickly. The traditional database technologies have disadvantages in manipulating large amounts of marine information which relates to the position in 3-D with the time. Recently, greater emphasis has been placed on GIS (geographical information system)to deal with the marine information. The GIS has shown great success for terrestrial applications in the last decades, but its use in marine fields has been far more restricted. One of the main reasons is that most of the GIS systems or their data models are designed for land applications. They cannot do well with the nature of the marine environment and for the marine information. And this becomes a fundamental challenge to the traditional GIS and its data structure. This work designed a data model, the raster-based spatio-temporal hierarchical data model (RSHDM), for the marine information system, or for the knowledge discovery fi'om spatio-temporal data, which bases itself on the nature of the marine data and overcomes the shortages of the current spatio-temporal models when they are used in the field. As an experiment, the marine fishery data warehouse (FDW) for marine fishery management was set up, which was based on the RSHDM. The experiment proved that the RSHDM can do well with the data and can extract easily the aggregations that the management needs at different levels.展开更多
The technique of incremental updating,which can better guarantee the real-time situation of navigational map,is the developing orientation of navigational road network updating.The data center of vehicle navigation sy...The technique of incremental updating,which can better guarantee the real-time situation of navigational map,is the developing orientation of navigational road network updating.The data center of vehicle navigation system is in charge of storing incremental data,and the spatio-temporal data model for storing incremental data does affect the efficiency of the response of the data center to the requirements of incremental data from the vehicle terminal.According to the analysis on the shortcomings of several typical spatio-temporal data models used in the data center and based on the base map with overlay model,the reverse map with overlay model (RMOM) was put forward for the data center to make rapid response to incremental data request.RMOM supports the data center to store not only the current complete road network data,but also the overlays of incremental data from the time when each road network changed to the current moment.Moreover,the storage mechanism and index structure of the incremental data were designed,and the implementation algorithm of RMOM was developed.Taking navigational road network in Guangzhou City as an example,the simulation test was conducted to validate the efficiency of RMOM.Results show that the navigation database in the data center can response to the requirements of incremental data by only one query with RMOM,and costs less time.Compared with the base map with overlay model,the data center does not need to temporarily overlay incremental data with RMOM,so time-consuming of response is significantly reduced.RMOM greatly improves the efficiency of response and provides strong support for the real-time situation of navigational road network.展开更多
Semantic Web(SW)provides new opportunities for the study and application of big data,massive ranges of data sets in varied formats from multiple sources.Related studies focus on potential SW technologies for resolving...Semantic Web(SW)provides new opportunities for the study and application of big data,massive ranges of data sets in varied formats from multiple sources.Related studies focus on potential SW technologies for resolving big data problems,such as structurally and semantically heterogeneous data that result from the variety of data formats(structured,semi-structured,numeric,unstructured text data,email,video,audio,stock ticker).SW offers information semantically both for people and machines to retain the vast volume of data and provide a meaningful output of unstructured data.In the current research,we implement a new semantic Extract Transform Load(ETL)model that uses SW technologies for aggregating,integrating,and representing data as linked data.First,geospatial data resources are aggregated from the internet,and then a semantic ETL model is used to store the aggregated data in a semantic model after converting it to Resource Description Framework(RDF)format for successful integration and representation.The principal contribution of this research is the synthesis,aggregation,and semantic representation of geospatial data to solve problems.A case study of city data is used to illustrate the semantic ETL model’s functionalities.The results show that the proposed model solves the structural and semantic heterogeneity problems in diverse data sources for successful data aggregation,integration,and representation.展开更多
Detailed information on the spatio-temporal changes of cropland soil organic carbon(SOC) can significantly contribute to the improvement of soil fertility and mitigate climate change. Nonetheless, information and know...Detailed information on the spatio-temporal changes of cropland soil organic carbon(SOC) can significantly contribute to the improvement of soil fertility and mitigate climate change. Nonetheless, information and knowledge on the national scale spatio-temporal changes and the corresponding uncertainties of SOC in Chinese upland soils remain limited. The CENTURY model was used to estimate the SOC storages and their changes in Chinese uplands from 1980 to 2010. With the Monte Carlo method, the uncertainties of CENTURY-modelled SOC dynamics associated with the spatial heterogeneous model inputs were quantified. Results revealed that the SOC storage in Chinese uplands increased from 3.03(1.59 to 4.78) Pg C in 1980 to 3.40(2.39 to 4.62) Pg C in 2010. Increment of SOC storage during this period was 370 Tg C, with an uncertainty interval of –440 to 1110 Tg C. The regional disparities of SOC changes reached a significant level, with considerable SOC accumulation in the Huang-Huai-Hai Plain of China and SOC loss in the northeastern China. The SOC lost from Meadow soils, Black soils and Chernozems was most severe, whilst SOC accumulation in Fluvo-aquic soils, Cinnamon soils and Purplish soils was most significant. In modelling large-scale SOC dynamics, the initial soil properties were major sources of uncertainty. Hence, more detailed information concerning the soil properties must be collected. The SOC stock of Chinese uplands in 2010 was still relatively low, manifesting that recommended agricultural management practices in conjunction with effectively economic and policy incentives to farmers for soil fertility improvement were indispensable for future carbon sequestration in these regions.展开更多
The development of spatio-temporal data model is introduced. According to the soil characteristic of reclamation land, we adopt the base state with amendments model of multi-layer raster to organize the spatio-tempora...The development of spatio-temporal data model is introduced. According to the soil characteristic of reclamation land, we adopt the base state with amendments model of multi-layer raster to organize the spatio-temporal data, using the combined data structure on linear quadtree and linear octree to code. The advantage of this model is that it can easily obtain the information of certain layer and integratedly analyze the data with other methods. Then, the methods of obtain and analyses are introduced. The method can provide a tool for the research of the soil characteristic change and spatial distribution in reclamation land.展开更多
In modern workforce management,the demand for new ways to maximize worker satisfaction,productivity,and security levels is endless.Workforce movement data such as those source data from an access control system can su...In modern workforce management,the demand for new ways to maximize worker satisfaction,productivity,and security levels is endless.Workforce movement data such as those source data from an access control system can support this ongoing process with subsequent analysis.In this study,a solution to attaining this goal is proposed,based on the design and implementation of a data mart as part of a dimensional trajectory data warehouse(TDW)that acts as a repository for the management of movement data.A novel methodological approach is proposed for modeling multiple spatial and temporal dimensions in a logical model.The case study presented in this paper for modeling and analyzing workforce movement data is to support human resource management decision-making and the following discussion provides a representative example of the contribution of a TDW in the process of information management and decision support systems.The entire process of exporting,cleaning,consolidating,and transforming data is implemented to achieve an appropriate format for final import.Structured query language(SQL)queries demonstrate the convenience of dimensional design for data analysis,and valuable information can be extracted from the movements of employees on company premises to manage the workforce efficiently and effectively.Visual analytics through data visualization support the analysis and facilitate decisionmaking and business intelligence.展开更多
It is very important for the development of electric power big data technology to use the electric power knowledge.A new electric power knowledge theory model is proposed here to solve the problem of normalized modele...It is very important for the development of electric power big data technology to use the electric power knowledge.A new electric power knowledge theory model is proposed here to solve the problem of normalized modeled electric power knowledge for the management and analysis of electric power big data.Current modeling techniques of electric power knowledge are viewed as inadequate because of the complexity and variety of the relationships among electric power system data.Ontology theory and semantic web technologies used in electric power systems and in many other industry domains provide a new kind of knowledge modeling method.Based on this,this paper proposes the structure,elements,basic calculations and multidimensional reasoning method of the new knowledge model.A modeling example of the regulations defined in electric power system operation standard is demonstrated.Different forms of the model and related technologies are also introduced,including electric power system standard modeling,multi-type data management,unstructured data searching,knowledge display and data analysis based on semantic expansion and reduction.Research shows that the new model developed here is powerful and can adapt to various knowledge expression requirements of electric power big data.With the development of electric power big data technology,it is expected that the knowledge model will be improved and will be used in more applications.展开更多
Marine big data are characterized by a large amount and complex structures,which bring great challenges to data management and retrieval.Based on the GeoSOT Grid Code and the composite index structure of the MongoDB d...Marine big data are characterized by a large amount and complex structures,which bring great challenges to data management and retrieval.Based on the GeoSOT Grid Code and the composite index structure of the MongoDB database,this paper proposes a spatio-temporal grid index model(STGI)for efficient optimized query of marine big data.A spatio-temporal secondary index is created on the spatial code and time code columns to build a composite index in the MongoDB database used for the storage of massive marine data.Multiple comparative experiments demonstrate that the retrieval efficiency adopting the STGI approach is increased by more than two to three times compared with other index models.Through theoretical analysis and experimental verification,the conclusion could be achieved that the STGI model is quite suitable for retrieving large-scale spatial data with low time frequency,such as marine big data.展开更多
This paper focuses on the issues of categorical database gen-eralization and emphasizes the roles ofsupporting data model, integrated datamodel, spatial analysis and semanticanalysis in database generalization.The fra...This paper focuses on the issues of categorical database gen-eralization and emphasizes the roles ofsupporting data model, integrated datamodel, spatial analysis and semanticanalysis in database generalization.The framework contents of categoricaldatabase generalization transformationare defined. This paper presents an in-tegrated spatial supporting data struc-ture, a semantic supporting model andsimilarity model for the categorical da-tabase generalization. The concept oftransformation unit is proposed in generalization.展开更多
基金Science and Technology Innovation 2030-Major Project of“New Generation Artificial Intelligence”granted by Ministry of Science and Technology,Grant Number 2020AAA0109300.
文摘In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach.
文摘Building model data organization is often programmed to solve a specific problem,resulting in the inability to organize indoor and outdoor 3D scenes in an integrated manner.In this paper,existing building spatial data models are studied,and the characteristics of building information modeling standards(IFC),city geographic modeling language(CityGML),indoor modeling language(IndoorGML),and other models are compared and analyzed.CityGML and IndoorGML models face challenges in satisfying diverse application scenarios and requirements due to limitations in their expression capabilities.It is proposed to combine the semantic information of the model objects to effectively partition and organize the indoor and outdoor spatial 3D model data and to construct the indoor and outdoor data organization mechanism of“chunk-layer-subobject-entrances-area-detail object.”This method is verified by proposing a 3D data organization method for indoor and outdoor space and constructing a 3D visualization system based on it.
基金supported by the National Key Basic Research and Development Program of China under contract No.2006CB701305the National Natural Science Foundation of China under coutract No.40571129the National High-Technology Program of China under contract Nos 2002AA639400,2003AA604040 and 2003AA637030.
文摘Marine information has been increasing quickly. The traditional database technologies have disadvantages in manipulating large amounts of marine information which relates to the position in 3-D with the time. Recently, greater emphasis has been placed on GIS (geographical information system)to deal with the marine information. The GIS has shown great success for terrestrial applications in the last decades, but its use in marine fields has been far more restricted. One of the main reasons is that most of the GIS systems or their data models are designed for land applications. They cannot do well with the nature of the marine environment and for the marine information. And this becomes a fundamental challenge to the traditional GIS and its data structure. This work designed a data model, the raster-based spatio-temporal hierarchical data model (RSHDM), for the marine information system, or for the knowledge discovery fi'om spatio-temporal data, which bases itself on the nature of the marine data and overcomes the shortages of the current spatio-temporal models when they are used in the field. As an experiment, the marine fishery data warehouse (FDW) for marine fishery management was set up, which was based on the RSHDM. The experiment proved that the RSHDM can do well with the data and can extract easily the aggregations that the management needs at different levels.
基金Under the auspices of National High Technology Research and Development Program of China (No.2007AA12Z242)
文摘The technique of incremental updating,which can better guarantee the real-time situation of navigational map,is the developing orientation of navigational road network updating.The data center of vehicle navigation system is in charge of storing incremental data,and the spatio-temporal data model for storing incremental data does affect the efficiency of the response of the data center to the requirements of incremental data from the vehicle terminal.According to the analysis on the shortcomings of several typical spatio-temporal data models used in the data center and based on the base map with overlay model,the reverse map with overlay model (RMOM) was put forward for the data center to make rapid response to incremental data request.RMOM supports the data center to store not only the current complete road network data,but also the overlays of incremental data from the time when each road network changed to the current moment.Moreover,the storage mechanism and index structure of the incremental data were designed,and the implementation algorithm of RMOM was developed.Taking navigational road network in Guangzhou City as an example,the simulation test was conducted to validate the efficiency of RMOM.Results show that the navigation database in the data center can response to the requirements of incremental data by only one query with RMOM,and costs less time.Compared with the base map with overlay model,the data center does not need to temporarily overlay incremental data with RMOM,so time-consuming of response is significantly reduced.RMOM greatly improves the efficiency of response and provides strong support for the real-time situation of navigational road network.
文摘Semantic Web(SW)provides new opportunities for the study and application of big data,massive ranges of data sets in varied formats from multiple sources.Related studies focus on potential SW technologies for resolving big data problems,such as structurally and semantically heterogeneous data that result from the variety of data formats(structured,semi-structured,numeric,unstructured text data,email,video,audio,stock ticker).SW offers information semantically both for people and machines to retain the vast volume of data and provide a meaningful output of unstructured data.In the current research,we implement a new semantic Extract Transform Load(ETL)model that uses SW technologies for aggregating,integrating,and representing data as linked data.First,geospatial data resources are aggregated from the internet,and then a semantic ETL model is used to store the aggregated data in a semantic model after converting it to Resource Description Framework(RDF)format for successful integration and representation.The principal contribution of this research is the synthesis,aggregation,and semantic representation of geospatial data to solve problems.A case study of city data is used to illustrate the semantic ETL model’s functionalities.The results show that the proposed model solves the structural and semantic heterogeneity problems in diverse data sources for successful data aggregation,integration,and representation.
基金Under the auspices of National Key Research and Development Program of China(No.2017YFA0603002)National Natural Science Foundation of China(No.31800358,31700369)+1 种基金Jiangsu Agricultural Science and Technology Innovation Fund(No.CX(19)3099)the Foundation of Jiangsu Vocational College of Agriculture and Forestry(No.2019kj014)。
文摘Detailed information on the spatio-temporal changes of cropland soil organic carbon(SOC) can significantly contribute to the improvement of soil fertility and mitigate climate change. Nonetheless, information and knowledge on the national scale spatio-temporal changes and the corresponding uncertainties of SOC in Chinese upland soils remain limited. The CENTURY model was used to estimate the SOC storages and their changes in Chinese uplands from 1980 to 2010. With the Monte Carlo method, the uncertainties of CENTURY-modelled SOC dynamics associated with the spatial heterogeneous model inputs were quantified. Results revealed that the SOC storage in Chinese uplands increased from 3.03(1.59 to 4.78) Pg C in 1980 to 3.40(2.39 to 4.62) Pg C in 2010. Increment of SOC storage during this period was 370 Tg C, with an uncertainty interval of –440 to 1110 Tg C. The regional disparities of SOC changes reached a significant level, with considerable SOC accumulation in the Huang-Huai-Hai Plain of China and SOC loss in the northeastern China. The SOC lost from Meadow soils, Black soils and Chernozems was most severe, whilst SOC accumulation in Fluvo-aquic soils, Cinnamon soils and Purplish soils was most significant. In modelling large-scale SOC dynamics, the initial soil properties were major sources of uncertainty. Hence, more detailed information concerning the soil properties must be collected. The SOC stock of Chinese uplands in 2010 was still relatively low, manifesting that recommended agricultural management practices in conjunction with effectively economic and policy incentives to farmers for soil fertility improvement were indispensable for future carbon sequestration in these regions.
文摘The development of spatio-temporal data model is introduced. According to the soil characteristic of reclamation land, we adopt the base state with amendments model of multi-layer raster to organize the spatio-temporal data, using the combined data structure on linear quadtree and linear octree to code. The advantage of this model is that it can easily obtain the information of certain layer and integratedly analyze the data with other methods. Then, the methods of obtain and analyses are introduced. The method can provide a tool for the research of the soil characteristic change and spatial distribution in reclamation land.
文摘In modern workforce management,the demand for new ways to maximize worker satisfaction,productivity,and security levels is endless.Workforce movement data such as those source data from an access control system can support this ongoing process with subsequent analysis.In this study,a solution to attaining this goal is proposed,based on the design and implementation of a data mart as part of a dimensional trajectory data warehouse(TDW)that acts as a repository for the management of movement data.A novel methodological approach is proposed for modeling multiple spatial and temporal dimensions in a logical model.The case study presented in this paper for modeling and analyzing workforce movement data is to support human resource management decision-making and the following discussion provides a representative example of the contribution of a TDW in the process of information management and decision support systems.The entire process of exporting,cleaning,consolidating,and transforming data is implemented to achieve an appropriate format for final import.Structured query language(SQL)queries demonstrate the convenience of dimensional design for data analysis,and valuable information can be extracted from the movements of employees on company premises to manage the workforce efficiently and effectively.Visual analytics through data visualization support the analysis and facilitate decisionmaking and business intelligence.
基金supported by Science and Technology Foundation of the State Grid Corporation of China(XT71-14-043).
文摘It is very important for the development of electric power big data technology to use the electric power knowledge.A new electric power knowledge theory model is proposed here to solve the problem of normalized modeled electric power knowledge for the management and analysis of electric power big data.Current modeling techniques of electric power knowledge are viewed as inadequate because of the complexity and variety of the relationships among electric power system data.Ontology theory and semantic web technologies used in electric power systems and in many other industry domains provide a new kind of knowledge modeling method.Based on this,this paper proposes the structure,elements,basic calculations and multidimensional reasoning method of the new knowledge model.A modeling example of the regulations defined in electric power system operation standard is demonstrated.Different forms of the model and related technologies are also introduced,including electric power system standard modeling,multi-type data management,unstructured data searching,knowledge display and data analysis based on semantic expansion and reduction.Research shows that the new model developed here is powerful and can adapt to various knowledge expression requirements of electric power big data.With the development of electric power big data technology,it is expected that the knowledge model will be improved and will be used in more applications.
基金This research was funded by the National Key Research and Development Plan(2018YFB0505300)the Guangxi Science and Technology Major Project(AA18118025)+1 种基金the Opening Foundation of Key Laboratory of Environment Change and Resources Use in Beibu Gulf,Ministry of Education(Nanning Normal University)Guangxi Key Laboratory of Earth Surface Processes and Intelligent Simulation(Nanning Normal University)(No.NNNU-KLOP-K1905).
文摘Marine big data are characterized by a large amount and complex structures,which bring great challenges to data management and retrieval.Based on the GeoSOT Grid Code and the composite index structure of the MongoDB database,this paper proposes a spatio-temporal grid index model(STGI)for efficient optimized query of marine big data.A spatio-temporal secondary index is created on the spatial code and time code columns to build a composite index in the MongoDB database used for the storage of massive marine data.Multiple comparative experiments demonstrate that the retrieval efficiency adopting the STGI approach is increased by more than two to three times compared with other index models.Through theoretical analysis and experimental verification,the conclusion could be achieved that the STGI model is quite suitable for retrieving large-scale spatial data with low time frequency,such as marine big data.
基金the National Natural Science Foundation (No. 40271088) the Research Fund of International Institute of Geo-information Science and Earth Observation.
文摘This paper focuses on the issues of categorical database gen-eralization and emphasizes the roles ofsupporting data model, integrated datamodel, spatial analysis and semanticanalysis in database generalization.The framework contents of categoricaldatabase generalization transformationare defined. This paper presents an in-tegrated spatial supporting data struc-ture, a semantic supporting model andsimilarity model for the categorical da-tabase generalization. The concept oftransformation unit is proposed in generalization.