The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users wit...The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users with the right to issue a Data Subject Access Request(DSAR).Responding to such requests requires database administrators to identify information related to an individual accurately.However,manual compliance poses significant challenges and is error-prone.Database administrators need to write queries through time-consuming labor.The demand for large amounts of data by AI systems has driven the development of NoSQL databases.Due to the flexible schema of NoSQL databases,identifying personal information becomes even more challenging.This paper develops an automated tool to identify personal information that can help organizations respond to DSAR.Our tool employs a combination of various technologies,including schema extraction of NoSQL databases and relationship identification from query logs.We describe the algorithm used by our tool,detailing how it discovers and extracts implicit relationships from NoSQL databases and generates relationship graphs to help developers accurately identify personal data.We evaluate our tool on three datasets,covering different database designs,achieving an F1 score of 0.77 to 1.Experimental results demonstrate that our tool successfully identifies information relevant to the data subject.Our tool reduces manual effort and simplifies GDPR compliance,showing practical application value in enhancing the privacy performance of NOSQL databases and AI systems.展开更多
Discovery of materials using“bottom-up”or“top-down”approach is of great interest in materials science.Layered materials consisting of two-dimensional(2D)building blocks provide a good platform to explore new mater...Discovery of materials using“bottom-up”or“top-down”approach is of great interest in materials science.Layered materials consisting of two-dimensional(2D)building blocks provide a good platform to explore new materials in this respect.In van der Waals(vdW)layered materials,these building blocks are charge neutral and can be isolated from their bulk phase(top-down),but usually grow on substrate.In ionic layered materials,they are charged and usually cannot exist independently but can serve as motifs to construct new materials(bottom-up).In this paper,we introduce our recently constructed databases for 2D material-substrate interface(2DMSI),and 2D charged building blocks.For 2DMSI database,we systematically build a workflow to predict appropriate substrates and their geometries at substrates,and construct the 2DMSI database.For the 2D charged building block database,1208 entries from bulk material database are identified.Information of crystal structure,valence state,source,dimension and so on is provided for each entry with a json format.We also show its application in designing and searching for new functional layered materials.The 2DMSI database,building block database,and designed layered materials are available in Science Data Bank at https://doi.org/10.57760/sciencedb.j00113.00188.展开更多
The necessity and the feasibility of introducing attribute weight into digital fingerprinting system are given. The weighted algorithm for fingerprinting relational databases of traitor tracing is proposed. Higher wei...The necessity and the feasibility of introducing attribute weight into digital fingerprinting system are given. The weighted algorithm for fingerprinting relational databases of traitor tracing is proposed. Higher weights are assigned to more significant attributes, so important attributes are more frequently fingerprinted than other ones. Finally, the robustness of the proposed algorithm, such as performance against collusion attacks, is analyzed. Experimental results prove the superiority of the algorithm.展开更多
Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni...Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.展开更多
Data acquisition and modeling are the two important, difficult and costful aspects in a Cybercity project. 2D-GIS is mature and can manage a lot of spatial data. Thus 3D-GIS should make the best of data and technology...Data acquisition and modeling are the two important, difficult and costful aspects in a Cybercity project. 2D-GIS is mature and can manage a lot of spatial data. Thus 3D-GIS should make the best of data and technology of 2D-GIS. Construction of a useful synthetic environment requires integration of multiple types of information like DEM, texture images and 3D representation of objects such as buildings. In this paper, the method for 3D city landscape data model and visualization based on integrated databases is presented. Since the data volume of raster are very huge, special strategies(for example, pyramid gridded method) must be adopted in order to manage raster data efficiently. Three different methods of data acquisition, the proper data structure and a simple modeling method are presented as well. At last, a pilot project of Shanghai Cybercity is illustrated.展开更多
To solve the problems of shaving and reusing information in the information system, a rules-based ontology constructing approach from object-relational databases is proposed. A 3-tuple ontology constructing model is p...To solve the problems of shaving and reusing information in the information system, a rules-based ontology constructing approach from object-relational databases is proposed. A 3-tuple ontology constructing model is proposed first. Then, four types of ontology constructing rules including class, property, property characteristics, and property restrictions ave formalized according to the model. Experiment results described in Web ontology language prove that our proposed approach is feasible for applying in the semantic objects project of semantic computing laboratory in UC Irvine. Our approach reduces about twenty percent constructing time compared with the ontology construction from relational databases.展开更多
In this study,seven isotopic databases are presented and analyzed to identify mantle and crustal episodes on a global scale by focusing on periodicity ranging from 70 to 200 million years(Myr).The databases are the la...In this study,seven isotopic databases are presented and analyzed to identify mantle and crustal episodes on a global scale by focusing on periodicity ranging from 70 to 200 million years(Myr).The databases are the largest,or among the largest,compiled for each type of data-with an objective of finding some samples from every region of every continent,to make each database as global as conceivably possible.The databases contain zircon Lu/Hf isotopic data,whole-rock Sm/Nd isotopic data,U/Pb detrital zircon ages,U/Pb igneous zircon ages,U/Pb non-zircon ages,whole-rock Re/Os isotopic data,and large igneous province ages.Part I of this study focuses on the periodicities of age histograms and geochemical averages developed from the seven databases,via spectral and cross-correlation analyses.Natural physical cycles often propagate in exact integer multiples of a fundamental cycle,referred to as harmonics.The tests show that harmonic geological cycles of^93.5 and^187 Myr have persisted throughout terrestrial history,and the cyclicities are statistically significant for U/Pb igneous zircon ages,U/Pb detrital zircon ages,U/Pb zircon-rim ages,large igneous province ages,meanεHf(t)for all samples,meanεHf(t)values for igneous-only samples,and relative abundance of mafic rocks.Equally important,cross-correlation analyses show these seven time-series are nearly synchronous(±7 Myr)with a model consisting of periodicities of 93.5 and 187 Myr.Additionally,the similarities between peaks in the 93.5 and 187 Myr mantle cycles and terminal ages of established and suspected superchrons provide a framework for predicting and testing superchron periodicity.展开更多
A new theory on the construction of optimal truncated Low-Dimensional Dynamical Systems (LDDSs) with different physical meanings has been developed, The physical properties of the optimal bases are reflected in the us...A new theory on the construction of optimal truncated Low-Dimensional Dynamical Systems (LDDSs) with different physical meanings has been developed, The physical properties of the optimal bases are reflected in the user-defined optimal conditions, Through the analysis of linear and nonlinear examples, it is shown that the LDDSs constructed by using the Proper Orthogonal Decomposition (POD) method are not the optimum. After comparing the errors of LDDSs based on the new theory POD and Fourier methods, it is concluded that the LDDSs based on the new theory are optimally truncated and catch the desired physical properties of the systems.展开更多
Recent studies have addressed that the cache be havior is important in the design of main memory index structures. Cache-conscious indices such as the CSB^+-tree are shown to outperform conventional main memory indic...Recent studies have addressed that the cache be havior is important in the design of main memory index structures. Cache-conscious indices such as the CSB^+-tree are shown to outperform conventional main memory indices such as the AVL-tree and the T-tree. This paper proposes a cacheconscious version of the T-tree, CST-tree, defined according to the cache-conscious definition. To separate the keys within a node into two parts, the CST-tree can gain higher cache hit ratio.展开更多
The future usage of heterogeneous databases will consist of the WWW and CORBA environments. The integration of the WWW databases and CORBA standards are discussed. These two techniques need to merge together to make d...The future usage of heterogeneous databases will consist of the WWW and CORBA environments. The integration of the WWW databases and CORBA standards are discussed. These two techniques need to merge together to make distributed usage of heterogeneous databases user friendly. In an environment integrating WWW databases and CORBA technologies, CORBA can be used to access heterogeneous data sources in the internet. This kind of applications can achieve distributed transactions to assure data consistency and integrity. The application of this technology is with a good prospect.展开更多
Spatial objects have two types of attributes: geometrical attributes and non-geometrical attributes, which belong to two different attribute domains (geometrical and non-geometrical domains). Although geometrically...Spatial objects have two types of attributes: geometrical attributes and non-geometrical attributes, which belong to two different attribute domains (geometrical and non-geometrical domains). Although geometrically scattered in a geometrical domain, spatial objects may be similar to each other in a non-geometrical domain. Most existing clustering algorithms group spatial datasets into different compact regions in a geometrical domain without considering the aspect of a non-geometrical domain. However, many application scenarios require clustering results in which a cluster has not only high proximity in a geometrical domain, but also high similarity in a non-geometrical domain. This means constraints are imposed on the clustering goal from both geometrical and non-geometrical domains simultaneously. Such a clustering problem is called dual clustering. As distributed clustering applications become more and more popular, it is necessary to tackle the dual clustering problem in distributed databases. The DCAD algorithm is proposed to solve this problem. DCAD consists of two levels of clustering: local clustering and global clustering. First, clustering is conducted at each local site with a local clustering algorithm, and the features of local clusters are extracted clustering is obtained based on those features fective and efficient. Second, local features from each site are sent to a central site where global Experiments on both artificial and real spatial datasets show that DCAD is effective and efficient.展开更多
Most knowledgeable people agree that networking and routing technologies have been around about 25 years. Routing is simultaneously the most complicated function of a network and the most important. It is of the same ...Most knowledgeable people agree that networking and routing technologies have been around about 25 years. Routing is simultaneously the most complicated function of a network and the most important. It is of the same kind that more than 70% of computer application fields are MIS applications. So the challenge in building and using a MIS in the network is developing the means to find, access, and communicate large databases or multi databases systems. Because general databases are not time continuous, in fact, they can not be streaming, so we can't obtain reliable and secure quality of service by deleting some unimportant datagrams in the databases transmission. In this article, we will discuss which kind of routing protocol is the best type for large databases or multi databases systems transmission in the networks.展开更多
A weighted algorithm for watermarking relational databases for copyright protection is presented. The possibility of watermarking an attribute is assigned according to its weight decided by the owner of the database. ...A weighted algorithm for watermarking relational databases for copyright protection is presented. The possibility of watermarking an attribute is assigned according to its weight decided by the owner of the database. A one-way hash function and a secret key known only to the owner of the data are used to select tuples and bits to mark. By assigning high weight to significant attributes, the scheme ensures that important attributes take more chance to be marked than less important ones. Experimental results show that the proposed scheme is robust against various forms of attacks, and has perfect immunity to subset attack.展开更多
Geoanalytical data provide fundamental information according to which the Earth's resources can be known and exploited to support human life and development.Large amounts of manpower and material and financial res...Geoanalytical data provide fundamental information according to which the Earth's resources can be known and exploited to support human life and development.Large amounts of manpower and material and financial resources have been invested to acquire a wealth of geoanalytical data over the past 40 years.However,these data are usually managed by individual researchers and are preserved in an ad hoc manner without metadata that provide the necessary context for interpretation and data integration requirements.In this scenario,fewer data,except for published data,can be reutilized by geological researchers.Many geoanalytical databases have been constructed to collect existing data and to facilitate their use.These databases are useful tools for preserving,managing,and sharing data for geological research,and provide various data repositories to support geological studies.Since these databases are dispersed and diverse,it is difficult for researchers to make full use of them.This contribution provides an introduction on available geoanalytical databases.The database content can be made accessible to researchers,the ways in which this can be done,and the functionalities that can be used are illustrated in detail.Moreover,constraints that have limited the reutilization of geoanalytical data and creation of more advanced geoanalytical databases are discussed.展开更多
As the typical peer-to-peer distributed networks, blockchain systemsrequire each node to copy a complete transaction database, so as to ensure newtransactions can by verified independently. In a blockchain system (e.g...As the typical peer-to-peer distributed networks, blockchain systemsrequire each node to copy a complete transaction database, so as to ensure newtransactions can by verified independently. In a blockchain system (e.g., bitcoinsystem), the node does not rely on any central organization, and every node keepsan entire copy of the transaction database. However, this feature determines thatthe size of blockchain transaction database is growing rapidly. Therefore, with thecontinuous system operations, the node memory also needs to be expanded tosupport the system running. Especially in the big data era, the increasing networktraffic will lead to faster transaction growth rate. This paper analyzes blockchaintransaction databases and proposes a storage optimization scheme. The proposedscheme divides blockchain transaction database into cold zone and hot zone usingexpiration recognition method based on Least Recently Used (LRU) algorithm. Itcan achieve storage optimization by moving unspent transaction outputs outsidethe in-memory transaction databases. We present the theoretical analysis on theoptimization method to validate the effectiveness. Extensive experiments showour proposed method outperforms the current mechanism for the blockchaintransaction databases.展开更多
In this paper, we approach the design of ID caching technology(IDCT) for graph databases, with the purpose of accelerating the queries on graph database data and avoiding redundant graph database query operations whic...In this paper, we approach the design of ID caching technology(IDCT) for graph databases, with the purpose of accelerating the queries on graph database data and avoiding redundant graph database query operations which will consume great computer resources. Traditional graph database caching technology(GDCT)needs a large memory to store data and has the problems of serious data consistency and low cache utilization. To address these issues, in the paper we propose a new technology which focuses on ID allocation mechanism and high-speed queries of ID on graph databases. Specifically, ID of the query result is cached in memory and data consistency is achieved through the real-time synchronization and cache memory adaptation. In addition, we set up complex queries and simple queries to satisfy all query requirements and design a mechanism of cache replacement based on query action time, query times, and memory capacity, thus improving the performance furthermore.Extensive experiments show the superiority of our techniques compared with the traditional query approach of graph databases.展开更多
It is a period of information explosion. Especially for spatial information science, information can be acquired through many ways, such as man made planet, aeroplane, laser, digital photogrammetry and so on. Spatial...It is a period of information explosion. Especially for spatial information science, information can be acquired through many ways, such as man made planet, aeroplane, laser, digital photogrammetry and so on. Spatial data sources are usually distributed and heterogeneous. Federated database is the best resolution for the share and interoperation of spatial database. In this paper, the concepts of federated database and interoperability are introduced. Three heterogeneous kinds of spatial data, vector, image and DEM are used to create integrated database. A data model of federated spatial databases is given.展开更多
The technique of Knowlege Discovery in Databases (KDD) to learn valuable knowledge hidden in network alarm databases is introduced. To get such knowledge, we propose an efficient method based on sliding windows (named...The technique of Knowlege Discovery in Databases (KDD) to learn valuable knowledge hidden in network alarm databases is introduced. To get such knowledge, we propose an efficient method based on sliding windows (named as Slidwin) to discover different episode rules from time squential alarm data. The experimental results show that given different thresholds parameters, large amount of different rules could be discovered quickly.展开更多
Until recently, many computational materials scientists have shown little interest in materials databases. This is now changing be-cause the amount of computational data is rapidly increasing and the potential for dat...Until recently, many computational materials scientists have shown little interest in materials databases. This is now changing be-cause the amount of computational data is rapidly increasing and the potential for data mining provides unique opportunities for discovery and optimization. Here, a few examples of such opportunities are discussed relating to structural analysis and classification, discovery of correlations between materials properties, and discovery of unsuspected compounds.展开更多
This paper presents a development o f the extended Cellular Automata9CA),based on relational databases(RDB),to model dynamic interactions amon g spatial objects.The integration o f Geographical Information System(GIS)...This paper presents a development o f the extended Cellular Automata9CA),based on relational databases(RDB),to model dynamic interactions amon g spatial objects.The integration o f Geographical Information System(GIS)and CA has the great advantage of simu lationg geographical processes.But standard CA has some restrictions i n cellular shape and neighbourhood and neighbour rules,which restrict the CA’ s ability to simulate complex,real world environ-ments.This paper discusses a cell’ s spatialrelationbasedonthe spatialobject’ s geometricalandmon -geometricalc haracter-istics,and extends the cell’ s neighbour definition,and considers that the cell’ s neighbour lies in the forms of not on ly spa-tial adjacency but also attribute co rrelation.This paper then puts forw ard that spatial relations between t wo different cells can be divided into three types,including spatial adjacency,neighbour hood and complicated separation.Ba sed on tradition-al ideas,it is impossible to settle CA’ s restrictions completely.RDB -based CA is an academic experiment,in which some fields ard desighed to describe the essential information needed to define and select a cell’ s neighbour.The culture innovation diffusion system has mul tiple forms of space diffusion and in herited characteristics that the RD B -based CA is capable of simulating more effectiv ely.Finally this paper details a successful case study on the diffusion o f fashion wear trends.Compared to the original CA,the RDB -based CA is a more natural and efficient representation of human k nowl-edge over space,and is an effective t ol in simulation complex systems that have multiple forms of spatial diff usion.展开更多
基金supported by the National Natural Science Foundation of China(No.62302242)the China Postdoctoral Science Foundation(No.2023M731802).
文摘The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users with the right to issue a Data Subject Access Request(DSAR).Responding to such requests requires database administrators to identify information related to an individual accurately.However,manual compliance poses significant challenges and is error-prone.Database administrators need to write queries through time-consuming labor.The demand for large amounts of data by AI systems has driven the development of NoSQL databases.Due to the flexible schema of NoSQL databases,identifying personal information becomes even more challenging.This paper develops an automated tool to identify personal information that can help organizations respond to DSAR.Our tool employs a combination of various technologies,including schema extraction of NoSQL databases and relationship identification from query logs.We describe the algorithm used by our tool,detailing how it discovers and extracts implicit relationships from NoSQL databases and generates relationship graphs to help developers accurately identify personal data.We evaluate our tool on three datasets,covering different database designs,achieving an F1 score of 0.77 to 1.Experimental results demonstrate that our tool successfully identifies information relevant to the data subject.Our tool reduces manual effort and simplifies GDPR compliance,showing practical application value in enhancing the privacy performance of NOSQL databases and AI systems.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61888102,52272172,and 52102193)the Major Program of the National Natural Science Foundation of China(Grant No.92163206)+2 种基金the National Key Research and Development Program of China(Grant Nos.2021YFA1201501 and 2022YFA1204100)the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDB30000000)the Fundamental Research Funds for the Central Universities.
文摘Discovery of materials using“bottom-up”or“top-down”approach is of great interest in materials science.Layered materials consisting of two-dimensional(2D)building blocks provide a good platform to explore new materials in this respect.In van der Waals(vdW)layered materials,these building blocks are charge neutral and can be isolated from their bulk phase(top-down),but usually grow on substrate.In ionic layered materials,they are charged and usually cannot exist independently but can serve as motifs to construct new materials(bottom-up).In this paper,we introduce our recently constructed databases for 2D material-substrate interface(2DMSI),and 2D charged building blocks.For 2DMSI database,we systematically build a workflow to predict appropriate substrates and their geometries at substrates,and construct the 2DMSI database.For the 2D charged building block database,1208 entries from bulk material database are identified.Information of crystal structure,valence state,source,dimension and so on is provided for each entry with a json format.We also show its application in designing and searching for new functional layered materials.The 2DMSI database,building block database,and designed layered materials are available in Science Data Bank at https://doi.org/10.57760/sciencedb.j00113.00188.
文摘The necessity and the feasibility of introducing attribute weight into digital fingerprinting system are given. The weighted algorithm for fingerprinting relational databases of traitor tracing is proposed. Higher weights are assigned to more significant attributes, so important attributes are more frequently fingerprinted than other ones. Finally, the robustness of the proposed algorithm, such as performance against collusion attacks, is analyzed. Experimental results prove the superiority of the algorithm.
基金Supported by the Open Researches Fund Program of L IESMARS(WKL(0 0 ) 0 30 2 )
文摘Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.
文摘Data acquisition and modeling are the two important, difficult and costful aspects in a Cybercity project. 2D-GIS is mature and can manage a lot of spatial data. Thus 3D-GIS should make the best of data and technology of 2D-GIS. Construction of a useful synthetic environment requires integration of multiple types of information like DEM, texture images and 3D representation of objects such as buildings. In this paper, the method for 3D city landscape data model and visualization based on integrated databases is presented. Since the data volume of raster are very huge, special strategies(for example, pyramid gridded method) must be adopted in order to manage raster data efficiently. Three different methods of data acquisition, the proper data structure and a simple modeling method are presented as well. At last, a pilot project of Shanghai Cybercity is illustrated.
基金supported by the National Natural Science Foundation of China (60471055)the National "863" High Technology Research and Development Program of China (2007AA01Z443)
文摘To solve the problems of shaving and reusing information in the information system, a rules-based ontology constructing approach from object-relational databases is proposed. A 3-tuple ontology constructing model is proposed first. Then, four types of ontology constructing rules including class, property, property characteristics, and property restrictions ave formalized according to the model. Experiment results described in Web ontology language prove that our proposed approach is feasible for applying in the semantic objects project of semantic computing laboratory in UC Irvine. Our approach reduces about twenty percent constructing time compared with the ontology construction from relational databases.
文摘In this study,seven isotopic databases are presented and analyzed to identify mantle and crustal episodes on a global scale by focusing on periodicity ranging from 70 to 200 million years(Myr).The databases are the largest,or among the largest,compiled for each type of data-with an objective of finding some samples from every region of every continent,to make each database as global as conceivably possible.The databases contain zircon Lu/Hf isotopic data,whole-rock Sm/Nd isotopic data,U/Pb detrital zircon ages,U/Pb igneous zircon ages,U/Pb non-zircon ages,whole-rock Re/Os isotopic data,and large igneous province ages.Part I of this study focuses on the periodicities of age histograms and geochemical averages developed from the seven databases,via spectral and cross-correlation analyses.Natural physical cycles often propagate in exact integer multiples of a fundamental cycle,referred to as harmonics.The tests show that harmonic geological cycles of^93.5 and^187 Myr have persisted throughout terrestrial history,and the cyclicities are statistically significant for U/Pb igneous zircon ages,U/Pb detrital zircon ages,U/Pb zircon-rim ages,large igneous province ages,meanεHf(t)for all samples,meanεHf(t)values for igneous-only samples,and relative abundance of mafic rocks.Equally important,cross-correlation analyses show these seven time-series are nearly synchronous(±7 Myr)with a model consisting of periodicities of 93.5 and 187 Myr.Additionally,the similarities between peaks in the 93.5 and 187 Myr mantle cycles and terminal ages of established and suspected superchrons provide a framework for predicting and testing superchron periodicity.
基金The project supported by the National Natural Science Foundation of ChinaLNM,Institute of Mechanics,CAS
文摘A new theory on the construction of optimal truncated Low-Dimensional Dynamical Systems (LDDSs) with different physical meanings has been developed, The physical properties of the optimal bases are reflected in the user-defined optimal conditions, Through the analysis of linear and nonlinear examples, it is shown that the LDDSs constructed by using the Proper Orthogonal Decomposition (POD) method are not the optimum. After comparing the errors of LDDSs based on the new theory POD and Fourier methods, it is concluded that the LDDSs based on the new theory are optimally truncated and catch the desired physical properties of the systems.
基金Supported bythe National High Technology of 863Project (2002AA1Z2308 ,2002AA118030)
文摘Recent studies have addressed that the cache be havior is important in the design of main memory index structures. Cache-conscious indices such as the CSB^+-tree are shown to outperform conventional main memory indices such as the AVL-tree and the T-tree. This paper proposes a cacheconscious version of the T-tree, CST-tree, defined according to the cache-conscious definition. To separate the keys within a node into two parts, the CST-tree can gain higher cache hit ratio.
文摘The future usage of heterogeneous databases will consist of the WWW and CORBA environments. The integration of the WWW databases and CORBA standards are discussed. These two techniques need to merge together to make distributed usage of heterogeneous databases user friendly. In an environment integrating WWW databases and CORBA technologies, CORBA can be used to access heterogeneous data sources in the internet. This kind of applications can achieve distributed transactions to assure data consistency and integrity. The application of this technology is with a good prospect.
基金Funded by the National 973 Program of China (No.2003CB415205)the National Natural Science Foundation of China (No.40523005, No.60573183, No.60373019)the Open Research Fund Program of LIESMARS (No.WKL(04)0303).
文摘Spatial objects have two types of attributes: geometrical attributes and non-geometrical attributes, which belong to two different attribute domains (geometrical and non-geometrical domains). Although geometrically scattered in a geometrical domain, spatial objects may be similar to each other in a non-geometrical domain. Most existing clustering algorithms group spatial datasets into different compact regions in a geometrical domain without considering the aspect of a non-geometrical domain. However, many application scenarios require clustering results in which a cluster has not only high proximity in a geometrical domain, but also high similarity in a non-geometrical domain. This means constraints are imposed on the clustering goal from both geometrical and non-geometrical domains simultaneously. Such a clustering problem is called dual clustering. As distributed clustering applications become more and more popular, it is necessary to tackle the dual clustering problem in distributed databases. The DCAD algorithm is proposed to solve this problem. DCAD consists of two levels of clustering: local clustering and global clustering. First, clustering is conducted at each local site with a local clustering algorithm, and the features of local clusters are extracted clustering is obtained based on those features fective and efficient. Second, local features from each site are sent to a central site where global Experiments on both artificial and real spatial datasets show that DCAD is effective and efficient.
基金Supported by National Natural Science Foundation of China(6 98730 2 7)
文摘Most knowledgeable people agree that networking and routing technologies have been around about 25 years. Routing is simultaneously the most complicated function of a network and the most important. It is of the same kind that more than 70% of computer application fields are MIS applications. So the challenge in building and using a MIS in the network is developing the means to find, access, and communicate large databases or multi databases systems. Because general databases are not time continuous, in fact, they can not be streaming, so we can't obtain reliable and secure quality of service by deleting some unimportant datagrams in the databases transmission. In this article, we will discuss which kind of routing protocol is the best type for large databases or multi databases systems transmission in the networks.
基金Supported by the Aeronautics Science Foundation of China (02F52033), the High-Technology Research Project of Jiangsu Province (BG2004005) and Youth Research Foundation of Qufu Normal Univer-sity(XJ02057)
文摘A weighted algorithm for watermarking relational databases for copyright protection is presented. The possibility of watermarking an attribute is assigned according to its weight decided by the owner of the database. A one-way hash function and a secret key known only to the owner of the data are used to select tuples and bits to mark. By assigning high weight to significant attributes, the scheme ensures that important attributes take more chance to be marked than less important ones. Experimental results show that the proposed scheme is robust against various forms of attacks, and has perfect immunity to subset attack.
基金supported by ‘‘Instrument Equipment and superior resources sharing of high school’’ of China (‘‘211’’ program, Grant No. CERS-2-9)CGS research fund (JYYWF20181702)National Major Scientific Instruments and Equipment Development Special Funds (No. 2016YFF0103303)
文摘Geoanalytical data provide fundamental information according to which the Earth's resources can be known and exploited to support human life and development.Large amounts of manpower and material and financial resources have been invested to acquire a wealth of geoanalytical data over the past 40 years.However,these data are usually managed by individual researchers and are preserved in an ad hoc manner without metadata that provide the necessary context for interpretation and data integration requirements.In this scenario,fewer data,except for published data,can be reutilized by geological researchers.Many geoanalytical databases have been constructed to collect existing data and to facilitate their use.These databases are useful tools for preserving,managing,and sharing data for geological research,and provide various data repositories to support geological studies.Since these databases are dispersed and diverse,it is difficult for researchers to make full use of them.This contribution provides an introduction on available geoanalytical databases.The database content can be made accessible to researchers,the ways in which this can be done,and the functionalities that can be used are illustrated in detail.Moreover,constraints that have limited the reutilization of geoanalytical data and creation of more advanced geoanalytical databases are discussed.
基金supported by Researchers Supporting Project(No.RSP-2020/102)King Saud University,Riyadh,Saudi Arabiathe National Natural Science Foundation of China(Nos.61802031,61772454,61811530332,61811540410)+4 种基金the Natural Science Foundation of Hunan Province,China(No.2019JGYB177)the Research Foundation of Education Bureau of Hunan Province,China(No.18C0216)the“Practical Innovation and Entrepreneurial Ability Improvement Plan”for Professional Degree Graduate students of Changsha University of Science and Technology(No.SJCX201971)Hunan Graduate Scientific Research Innovation Project,China(No.CX2019694)This work is also supported by the Programs of Transformation and Upgrading of Industries and Information Technologies of Jiangsu Province(No.JITC-1900AX2038/01).
文摘As the typical peer-to-peer distributed networks, blockchain systemsrequire each node to copy a complete transaction database, so as to ensure newtransactions can by verified independently. In a blockchain system (e.g., bitcoinsystem), the node does not rely on any central organization, and every node keepsan entire copy of the transaction database. However, this feature determines thatthe size of blockchain transaction database is growing rapidly. Therefore, with thecontinuous system operations, the node memory also needs to be expanded tosupport the system running. Especially in the big data era, the increasing networktraffic will lead to faster transaction growth rate. This paper analyzes blockchaintransaction databases and proposes a storage optimization scheme. The proposedscheme divides blockchain transaction database into cold zone and hot zone usingexpiration recognition method based on Least Recently Used (LRU) algorithm. Itcan achieve storage optimization by moving unspent transaction outputs outsidethe in-memory transaction databases. We present the theoretical analysis on theoptimization method to validate the effectiveness. Extensive experiments showour proposed method outperforms the current mechanism for the blockchaintransaction databases.
基金supported by the Research Fund of National Key Laboratory of Computer Architecture under Grant No.CARCH201501the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant No.2016A09
文摘In this paper, we approach the design of ID caching technology(IDCT) for graph databases, with the purpose of accelerating the queries on graph database data and avoiding redundant graph database query operations which will consume great computer resources. Traditional graph database caching technology(GDCT)needs a large memory to store data and has the problems of serious data consistency and low cache utilization. To address these issues, in the paper we propose a new technology which focuses on ID allocation mechanism and high-speed queries of ID on graph databases. Specifically, ID of the query result is cached in memory and data consistency is achieved through the real-time synchronization and cache memory adaptation. In addition, we set up complex queries and simple queries to satisfy all query requirements and design a mechanism of cache replacement based on query action time, query times, and memory capacity, thus improving the performance furthermore.Extensive experiments show the superiority of our techniques compared with the traditional query approach of graph databases.
基金Supported by the National Nature Science Foundation under"Outstanding Young Researchers"(495 2 5 10 1)
文摘It is a period of information explosion. Especially for spatial information science, information can be acquired through many ways, such as man made planet, aeroplane, laser, digital photogrammetry and so on. Spatial data sources are usually distributed and heterogeneous. Federated database is the best resolution for the share and interoperation of spatial database. In this paper, the concepts of federated database and interoperability are introduced. Three heterogeneous kinds of spatial data, vector, image and DEM are used to create integrated database. A data model of federated spatial databases is given.
基金Supported by the National86 3High-Tech Project!(863-306-Z705-0 2 ) National Natural Science F oundation of China!(69896240)
文摘The technique of Knowlege Discovery in Databases (KDD) to learn valuable knowledge hidden in network alarm databases is introduced. To get such knowledge, we propose an efficient method based on sliding windows (named as Slidwin) to discover different episode rules from time squential alarm data. The experimental results show that given different thresholds parameters, large amount of different rules could be discovered quickly.
文摘Until recently, many computational materials scientists have shown little interest in materials databases. This is now changing be-cause the amount of computational data is rapidly increasing and the potential for data mining provides unique opportunities for discovery and optimization. Here, a few examples of such opportunities are discussed relating to structural analysis and classification, discovery of correlations between materials properties, and discovery of unsuspected compounds.
文摘This paper presents a development o f the extended Cellular Automata9CA),based on relational databases(RDB),to model dynamic interactions amon g spatial objects.The integration o f Geographical Information System(GIS)and CA has the great advantage of simu lationg geographical processes.But standard CA has some restrictions i n cellular shape and neighbourhood and neighbour rules,which restrict the CA’ s ability to simulate complex,real world environ-ments.This paper discusses a cell’ s spatialrelationbasedonthe spatialobject’ s geometricalandmon -geometricalc haracter-istics,and extends the cell’ s neighbour definition,and considers that the cell’ s neighbour lies in the forms of not on ly spa-tial adjacency but also attribute co rrelation.This paper then puts forw ard that spatial relations between t wo different cells can be divided into three types,including spatial adjacency,neighbour hood and complicated separation.Ba sed on tradition-al ideas,it is impossible to settle CA’ s restrictions completely.RDB -based CA is an academic experiment,in which some fields ard desighed to describe the essential information needed to define and select a cell’ s neighbour.The culture innovation diffusion system has mul tiple forms of space diffusion and in herited characteristics that the RD B -based CA is capable of simulating more effectiv ely.Finally this paper details a successful case study on the diffusion o f fashion wear trends.Compared to the original CA,the RDB -based CA is a more natural and efficient representation of human k nowl-edge over space,and is an effective t ol in simulation complex systems that have multiple forms of spatial diff usion.