The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users wit...The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users with the right to issue a Data Subject Access Request(DSAR).Responding to such requests requires database administrators to identify information related to an individual accurately.However,manual compliance poses significant challenges and is error-prone.Database administrators need to write queries through time-consuming labor.The demand for large amounts of data by AI systems has driven the development of NoSQL databases.Due to the flexible schema of NoSQL databases,identifying personal information becomes even more challenging.This paper develops an automated tool to identify personal information that can help organizations respond to DSAR.Our tool employs a combination of various technologies,including schema extraction of NoSQL databases and relationship identification from query logs.We describe the algorithm used by our tool,detailing how it discovers and extracts implicit relationships from NoSQL databases and generates relationship graphs to help developers accurately identify personal data.We evaluate our tool on three datasets,covering different database designs,achieving an F1 score of 0.77 to 1.Experimental results demonstrate that our tool successfully identifies information relevant to the data subject.Our tool reduces manual effort and simplifies GDPR compliance,showing practical application value in enhancing the privacy performance of NOSQL databases and AI systems.展开更多
Discovery of materials using“bottom-up”or“top-down”approach is of great interest in materials science.Layered materials consisting of two-dimensional(2D)building blocks provide a good platform to explore new mater...Discovery of materials using“bottom-up”or“top-down”approach is of great interest in materials science.Layered materials consisting of two-dimensional(2D)building blocks provide a good platform to explore new materials in this respect.In van der Waals(vdW)layered materials,these building blocks are charge neutral and can be isolated from their bulk phase(top-down),but usually grow on substrate.In ionic layered materials,they are charged and usually cannot exist independently but can serve as motifs to construct new materials(bottom-up).In this paper,we introduce our recently constructed databases for 2D material-substrate interface(2DMSI),and 2D charged building blocks.For 2DMSI database,we systematically build a workflow to predict appropriate substrates and their geometries at substrates,and construct the 2DMSI database.For the 2D charged building block database,1208 entries from bulk material database are identified.Information of crystal structure,valence state,source,dimension and so on is provided for each entry with a json format.We also show its application in designing and searching for new functional layered materials.The 2DMSI database,building block database,and designed layered materials are available in Science Data Bank at https://doi.org/10.57760/sciencedb.j00113.00188.展开更多
Electronic patient data gives many advantages,but also new difficulties.Deadlocks may delay procedures like acquiring patient information.Distributed deadlock resolution solutions introduce uncertainty due to inaccura...Electronic patient data gives many advantages,but also new difficulties.Deadlocks may delay procedures like acquiring patient information.Distributed deadlock resolution solutions introduce uncertainty due to inaccurate transaction properties.Soft computing-based solutions have been developed to solve this challenge.In a single framework,ambiguous,vague,incomplete,and inconsistent transaction attribute information has received minimal attention.The work presented in this paper employed type-2 neutrosophic logic,an extension of type-1 neutrosophic logic,to handle uncertainty in real-time deadlock-resolving systems.The proposed method is structured to reflect multiple types of knowledge and relations among transactions’features that include validation factor degree,slackness degree,degree of deadline-missed transaction based on the degree of membership of truthiness,degree ofmembership of indeterminacy,and degree ofmembership of falsity.Here,the footprint of uncertainty(FOU)for truth,indeterminacy,and falsity represents the level of uncertainty that exists in the value of a grade of membership.We employed a distributed real-time transaction processing simulator(DRTTPS)to conduct the simulations and conducted experiments using the benchmark Pima Indians diabetes dataset(PIDD).As the results showed,there is an increase in detection rate and a large drop in rollback rate when this new strategy is used.The performance of Type-2 neutrosophicbased resolution is better than the Type-1 neutrosophic-based approach on the execution ratio scale.The improvement rate has reached 10%to 20%,depending on the number of arrived transactions.展开更多
In China, the vast majority of the bibliographic databases is commercial, such as China National Knowledge Infrastructure (CNKI), Wanfang Database, Longyuan Journal Net, CQVIP Company, however, there are also non-pr...In China, the vast majority of the bibliographic databases is commercial, such as China National Knowledge Infrastructure (CNKI), Wanfang Database, Longyuan Journal Net, CQVIP Company, however, there are also non-profit open access (OA) databases, such as journal database jointly established by Chinese Academy of Social Sciences (CASS) and National Social Science Fund. The commercial bibliographic databases have to face many difficulties: intellectual property disputes, the benefit distribution between the hardcopy periodical and the commercial bibliographic database, the lack of quality assessment about the commercial bibliographic databases, the need of improving digital technology as well as the lack of a unified database regulation, which restricts the development of commercial bibliographic databases. This paper puts forward the countermeasures from the perspective of how to enhance the governmental management; how to protect the intellectual property fight; how to improve the technical standard of the commercial bibliographic databases; how to build interest distribution between the hardcopy periodical and the commercial bibliographic database; how to improve the quality of commercial bibliographic databases; and how to improve the industrial chain of the commercial bibliographic databases.展开更多
Objective:To investigate the variation,expression and clinical significance of E2F3 gene in melanoma.Methods:Firstly,cBioportal database,Oncomine database and GEO database were used to analyze the variation and expres...Objective:To investigate the variation,expression and clinical significance of E2F3 gene in melanoma.Methods:Firstly,cBioportal database,Oncomine database and GEO database were used to analyze the variation and expression level of E2F3 gene in melanoma.OSskcm database and TISIDB database were used to analyze the relationship between E2F3 and melanoma prognosis and tumor immune infiltrating cells.Then,the LinkedOmics database was used to identify the differential genes related to E2F3 expression in melanoma and analyze their biological functions.Finally,small molecule compounds for the treatment of melanoma were screened through the CMap database.Results:The mutation rate of E2F3 gene in melanoma is about 4%,and there are 21 mutation sites.Compared with normal skin tissues,the expression of E2F3 gene in melanoma was significantly increased(P<0.01).The mutation and increased expression of E2F3 gene were associated with the shortened overall survival(OS)of melanoma patients(P<0.05).The CNA level of E2F3 was negatively correlated with the expression levels of lymphocytes such as pDC,Neutrophil,Act DC and Th17,and negatively correlated with the expression levels of chemokines such as CXCL5,CCL13 and CCR1.The methylation level of E2F3 was positively correlated with the expression levels of Th1,Neutrophil,Act DC and other lymphocytes,and positively correlated with the expression levels of CXCL16,CXCL12,CCR1 and other chemokines.The expression level of E2F3 was negatively correlated with the expression levels of lymphocytes such as Th17,Tcm CD4 and Th1,and negatively correlated with the expression levels of chemokines such as CXCL 16,CCL 22 and CCL 2.The expression of 96 genes such as UHRF1BP1 in melanoma was significantly correlated with the expression of E2F3(|cor|0.5,P<0.05).The above genes were mainly related to RNA transport,eukaryotic ribosome biogenesis,cell cycle and other pathways.Among them,WDR12,WDR43,RBM28,UTP18,DKC1,PAK1IP1,DDX31,TEX10,TRUB1 and TRMT61B were the top 10 hub genes.YC-1,simvastatin,cytochalasin-d,Deforolimus and cytochalasin-b may be five small molecule compounds for the treatment of melanoma.Conclusion:The mutation and increased expression level of E2F3 gene are related to the poor prognosis of melanoma and participate in the occurrence and development of melanoma by affecting the expression of different tumor immune infiltrating cell subtypes,which may be a potential diagnostic marker and therapeutic target for melanoma.展开更多
Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni...Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.展开更多
Data acquisition and modeling are the two important, difficult and costful aspects in a Cybercity project. 2D-GIS is mature and can manage a lot of spatial data. Thus 3D-GIS should make the best of data and technology...Data acquisition and modeling are the two important, difficult and costful aspects in a Cybercity project. 2D-GIS is mature and can manage a lot of spatial data. Thus 3D-GIS should make the best of data and technology of 2D-GIS. Construction of a useful synthetic environment requires integration of multiple types of information like DEM, texture images and 3D representation of objects such as buildings. In this paper, the method for 3D city landscape data model and visualization based on integrated databases is presented. Since the data volume of raster are very huge, special strategies(for example, pyramid gridded method) must be adopted in order to manage raster data efficiently. Three different methods of data acquisition, the proper data structure and a simple modeling method are presented as well. At last, a pilot project of Shanghai Cybercity is illustrated.展开更多
To solve the problems of shaving and reusing information in the information system, a rules-based ontology constructing approach from object-relational databases is proposed. A 3-tuple ontology constructing model is p...To solve the problems of shaving and reusing information in the information system, a rules-based ontology constructing approach from object-relational databases is proposed. A 3-tuple ontology constructing model is proposed first. Then, four types of ontology constructing rules including class, property, property characteristics, and property restrictions ave formalized according to the model. Experiment results described in Web ontology language prove that our proposed approach is feasible for applying in the semantic objects project of semantic computing laboratory in UC Irvine. Our approach reduces about twenty percent constructing time compared with the ontology construction from relational databases.展开更多
Most knowledgeable people agree that networking and routing technologies have been around about 25 years. Routing is simultaneously the most complicated function of a network and the most important. It is of the same ...Most knowledgeable people agree that networking and routing technologies have been around about 25 years. Routing is simultaneously the most complicated function of a network and the most important. It is of the same kind that more than 70% of computer application fields are MIS applications. So the challenge in building and using a MIS in the network is developing the means to find, access, and communicate large databases or multi databases systems. Because general databases are not time continuous, in fact, they can not be streaming, so we can't obtain reliable and secure quality of service by deleting some unimportant datagrams in the databases transmission. In this article, we will discuss which kind of routing protocol is the best type for large databases or multi databases systems transmission in the networks.展开更多
The necessity and the feasibility of introducing attribute weight into digital fingerprinting system are given. The weighted algorithm for fingerprinting relational databases of traitor tracing is proposed. Higher wei...The necessity and the feasibility of introducing attribute weight into digital fingerprinting system are given. The weighted algorithm for fingerprinting relational databases of traitor tracing is proposed. Higher weights are assigned to more significant attributes, so important attributes are more frequently fingerprinted than other ones. Finally, the robustness of the proposed algorithm, such as performance against collusion attacks, is analyzed. Experimental results prove the superiority of the algorithm.展开更多
A weighted algorithm for watermarking relational databases for copyright protection is presented. The possibility of watermarking an attribute is assigned according to its weight decided by the owner of the database. ...A weighted algorithm for watermarking relational databases for copyright protection is presented. The possibility of watermarking an attribute is assigned according to its weight decided by the owner of the database. A one-way hash function and a secret key known only to the owner of the data are used to select tuples and bits to mark. By assigning high weight to significant attributes, the scheme ensures that important attributes take more chance to be marked than less important ones. Experimental results show that the proposed scheme is robust against various forms of attacks, and has perfect immunity to subset attack.展开更多
As the typical peer-to-peer distributed networks, blockchain systemsrequire each node to copy a complete transaction database, so as to ensure newtransactions can by verified independently. In a blockchain system (e.g...As the typical peer-to-peer distributed networks, blockchain systemsrequire each node to copy a complete transaction database, so as to ensure newtransactions can by verified independently. In a blockchain system (e.g., bitcoinsystem), the node does not rely on any central organization, and every node keepsan entire copy of the transaction database. However, this feature determines thatthe size of blockchain transaction database is growing rapidly. Therefore, with thecontinuous system operations, the node memory also needs to be expanded tosupport the system running. Especially in the big data era, the increasing networktraffic will lead to faster transaction growth rate. This paper analyzes blockchaintransaction databases and proposes a storage optimization scheme. The proposedscheme divides blockchain transaction database into cold zone and hot zone usingexpiration recognition method based on Least Recently Used (LRU) algorithm. Itcan achieve storage optimization by moving unspent transaction outputs outsidethe in-memory transaction databases. We present the theoretical analysis on theoptimization method to validate the effectiveness. Extensive experiments showour proposed method outperforms the current mechanism for the blockchaintransaction databases.展开更多
It is a period of information explosion. Especially for spatial information science, information can be acquired through many ways, such as man made planet, aeroplane, laser, digital photogrammetry and so on. Spatial...It is a period of information explosion. Especially for spatial information science, information can be acquired through many ways, such as man made planet, aeroplane, laser, digital photogrammetry and so on. Spatial data sources are usually distributed and heterogeneous. Federated database is the best resolution for the share and interoperation of spatial database. In this paper, the concepts of federated database and interoperability are introduced. Three heterogeneous kinds of spatial data, vector, image and DEM are used to create integrated database. A data model of federated spatial databases is given.展开更多
The technique of Knowlege Discovery in Databases (KDD) to learn valuable knowledge hidden in network alarm databases is introduced. To get such knowledge, we propose an efficient method based on sliding windows (named...The technique of Knowlege Discovery in Databases (KDD) to learn valuable knowledge hidden in network alarm databases is introduced. To get such knowledge, we propose an efficient method based on sliding windows (named as Slidwin) to discover different episode rules from time squential alarm data. The experimental results show that given different thresholds parameters, large amount of different rules could be discovered quickly.展开更多
In this paper, constrained K closest pairs query is introduced, wbich retrieves the K closest pairs satisfying the given spatial constraint from two datasets. For data sets indexed by R trees in spatial databases, thr...In this paper, constrained K closest pairs query is introduced, wbich retrieves the K closest pairs satisfying the given spatial constraint from two datasets. For data sets indexed by R trees in spatial databases, three algorithms are presented for answering this kind of query. Among of them, two-phase Range+Join and Join+Range algorithms adopt the strategy that changes the execution order of range and closest pairs queries, and constrained heap-based algorithm utilizes extended distance functions to prune search space and minimize the pruning distance. Experimental results show that constrained heap-base algorithm has better applicability and performance than two-phase algorithms.展开更多
Almost all the cellular processes in a living system are controlled by proteins:They regulate gene expression,catalyze chemical reactions,transport small molecules across membranes,and transmit signal across membranes...Almost all the cellular processes in a living system are controlled by proteins:They regulate gene expression,catalyze chemical reactions,transport small molecules across membranes,and transmit signal across membranes.Even,a viral infection is often initiated through virus-host protein interactions.Protein-protein interactions(PPIs)are the physical contacts between two or more proteins and they represent complex biological functions.Nowadays,PPIs have been used to construct PPI networks to study complex pathways for revealing the functions of unknown proteins.Scientists have used PPIs to find the molecular basis of certain diseases and also some potential drug targets.In this review,we will discuss how PPI networks are essential to understand the molecular basis of virus-host relationships and several databases which are dedicated to virus-host interaction studies.Here,we present a short but comprehensive review on PPIs,including the experimental and computational methods of finding PPIs,the databases dedicated to virus-host PPIs,and the associated various applications in protein interaction networks of some lethal viruses with their hosts.展开更多
GeoStar is the registered trademark of GIS software made by WTUSM in China.By means of the GeoStar,multi_scale images,DEMs,graphics and attributes integrated in very large seamless databases can be created,and the mul...GeoStar is the registered trademark of GIS software made by WTUSM in China.By means of the GeoStar,multi_scale images,DEMs,graphics and attributes integrated in very large seamless databases can be created,and the multi_dimensional dynamic visualization and information extraction are also available.This paper describes the fundamental characteristics of such huge integrated databases,for instance,the data models,database structures and the spatial index strategies.At last,the typical applications of GeoStar for a few pilot projects like the Shanghai CyberCity and the Guangdong provincial spatial data infrastructure (SDI) are illustrated and several concluding remarks are stressed.展开更多
Background: Suicide among physicians is a serious public health issue, with an extremely complex and multifactorial behavior. Aim: The aim of this study was to use the theme “Suicide among Physicians” to exemplify t...Background: Suicide among physicians is a serious public health issue, with an extremely complex and multifactorial behavior. Aim: The aim of this study was to use the theme “Suicide among Physicians” to exemplify the analysis of methodological similarities between the scientific content available at MEDLINE and BVS databases, as scientific research tools. Methods: This is a systematic review with metanalysis. The following combinations of keywords were used for data search in the referred databases: “suicide” AND “physicians” AND “public heath”. Results: Three hundred and thirteen publications were identified, but only 16 studies were chosen. Great association was found between MEDLINE and BVS databases and the Odds Ratio regarding the theme: “Suicide among physicians”. Conclusions: Considering the similarities found in the utilization of the two analyzed databases, it was possible to identify that suicide among physicians is associated with the exercise of an important professional role in the society and in the workplace. With regard to scientific information, the p-value-obtained value (<0.05) seems to be statistically significant for the association between the suggested theme and the methodological similarities of the scientific information available in the analyzed databases. Thus, these open-access research tools are considered scientific reliable tools.展开更多
AIM: To detect ophthalmic adverse drug reactions(ADRs), that occurred in Portugal from 2000 to 2009, through the utilization of administrative hospital databases. We also intended to compare the results of this method...AIM: To detect ophthalmic adverse drug reactions(ADRs), that occurred in Portugal from 2000 to 2009, through the utilization of administrative hospital databases. We also intended to compare the results of this methodology with spontaneous reporting.METHODS: We conducted a retrospective nationwide study using hospital administrative databases, which included all inpatients and outpatients in all public hospitals in Portugal, from 2000 to 2009. We used International Classification of Diseases- 9th Revision- Clinical Modification(ICD-9-CM) coding data that allowed the detection of ADRs. We used WHO's definition for ADR. We searched all of ICD-9-CM terms in Ophthalmology for codes that included "drug-induced", "iatrogenic", "toxic" and all other that could signal an ADR, such as "362.55- toxic maculopathy" or "365.03- steroid responders", and also "E" codes(codes from E930 to E949.9, that exclude intoxications and errors).RESULTS: From 11944725 hospitalizations or ambulatory episodes within that period of time, we identified 1524 probable ophthalmic ADRs(corresponding to a frequency of 1.28 per 10000 episodes) and an additional 100 possible ophthalmic ADRs. We used only 4 person-hours in the application of this methodology. A total of 113 spontaneous reports arose from ophthalmic ADRs from 2000 to 2009 in Portugal(frequency of 0.095 per 10000 episodes).To our knowledge, this was the first estimate of the frequency of ophthalmic ADRs through the use of databases, and the first nationwide estimate of ophthalmic ADRs, in Portugal. We identified 1524 probable ADRs and 100 possible ADRs. CONCLUSION: This database methodology adapted for Ophthalmology may represent a new approach for the detection of ophthalmic ADRs, since these codes exist in the ICD-9-CM classification. Its performance was clearly superior to spontaneous reporting.展开更多
The typical characteristic of the topology of Bayesian networks (BNs) is the interdependence among different nodes (variables), which makes it impossible to optimize one variable independently of others, and the learn...The typical characteristic of the topology of Bayesian networks (BNs) is the interdependence among different nodes (variables), which makes it impossible to optimize one variable independently of others, and the learning of BNs structures by general genetic algorithms is liable to converge to local extremum. To resolve efficiently this problem, a self-organizing genetic algorithm (SGA) based method for constructing BNs from databases is presented. This method makes use of a self-organizing mechanism to develop a genetic algorithm that extended the crossover operator from one to two, providing mutual competition between them, even adjusting the numbers of parents in recombination (crossover/recomposition) schemes. With the K2 algorithm, this method also optimizes the genetic operators, and utilizes adequately the domain knowledge. As a result, with this method it is able to find a global optimum of the topology of BNs, avoiding premature convergence to local extremum. The experimental results proved to be and the convergence of the SGA was discussed.展开更多
基金supported by the National Natural Science Foundation of China(No.62302242)the China Postdoctoral Science Foundation(No.2023M731802).
文摘The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users with the right to issue a Data Subject Access Request(DSAR).Responding to such requests requires database administrators to identify information related to an individual accurately.However,manual compliance poses significant challenges and is error-prone.Database administrators need to write queries through time-consuming labor.The demand for large amounts of data by AI systems has driven the development of NoSQL databases.Due to the flexible schema of NoSQL databases,identifying personal information becomes even more challenging.This paper develops an automated tool to identify personal information that can help organizations respond to DSAR.Our tool employs a combination of various technologies,including schema extraction of NoSQL databases and relationship identification from query logs.We describe the algorithm used by our tool,detailing how it discovers and extracts implicit relationships from NoSQL databases and generates relationship graphs to help developers accurately identify personal data.We evaluate our tool on three datasets,covering different database designs,achieving an F1 score of 0.77 to 1.Experimental results demonstrate that our tool successfully identifies information relevant to the data subject.Our tool reduces manual effort and simplifies GDPR compliance,showing practical application value in enhancing the privacy performance of NOSQL databases and AI systems.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61888102,52272172,and 52102193)the Major Program of the National Natural Science Foundation of China(Grant No.92163206)+2 种基金the National Key Research and Development Program of China(Grant Nos.2021YFA1201501 and 2022YFA1204100)the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDB30000000)the Fundamental Research Funds for the Central Universities.
文摘Discovery of materials using“bottom-up”or“top-down”approach is of great interest in materials science.Layered materials consisting of two-dimensional(2D)building blocks provide a good platform to explore new materials in this respect.In van der Waals(vdW)layered materials,these building blocks are charge neutral and can be isolated from their bulk phase(top-down),but usually grow on substrate.In ionic layered materials,they are charged and usually cannot exist independently but can serve as motifs to construct new materials(bottom-up).In this paper,we introduce our recently constructed databases for 2D material-substrate interface(2DMSI),and 2D charged building blocks.For 2DMSI database,we systematically build a workflow to predict appropriate substrates and their geometries at substrates,and construct the 2DMSI database.For the 2D charged building block database,1208 entries from bulk material database are identified.Information of crystal structure,valence state,source,dimension and so on is provided for each entry with a json format.We also show its application in designing and searching for new functional layered materials.The 2DMSI database,building block database,and designed layered materials are available in Science Data Bank at https://doi.org/10.57760/sciencedb.j00113.00188.
文摘Electronic patient data gives many advantages,but also new difficulties.Deadlocks may delay procedures like acquiring patient information.Distributed deadlock resolution solutions introduce uncertainty due to inaccurate transaction properties.Soft computing-based solutions have been developed to solve this challenge.In a single framework,ambiguous,vague,incomplete,and inconsistent transaction attribute information has received minimal attention.The work presented in this paper employed type-2 neutrosophic logic,an extension of type-1 neutrosophic logic,to handle uncertainty in real-time deadlock-resolving systems.The proposed method is structured to reflect multiple types of knowledge and relations among transactions’features that include validation factor degree,slackness degree,degree of deadline-missed transaction based on the degree of membership of truthiness,degree ofmembership of indeterminacy,and degree ofmembership of falsity.Here,the footprint of uncertainty(FOU)for truth,indeterminacy,and falsity represents the level of uncertainty that exists in the value of a grade of membership.We employed a distributed real-time transaction processing simulator(DRTTPS)to conduct the simulations and conducted experiments using the benchmark Pima Indians diabetes dataset(PIDD).As the results showed,there is an increase in detection rate and a large drop in rollback rate when this new strategy is used.The performance of Type-2 neutrosophicbased resolution is better than the Type-1 neutrosophic-based approach on the execution ratio scale.The improvement rate has reached 10%to 20%,depending on the number of arrived transactions.
文摘In China, the vast majority of the bibliographic databases is commercial, such as China National Knowledge Infrastructure (CNKI), Wanfang Database, Longyuan Journal Net, CQVIP Company, however, there are also non-profit open access (OA) databases, such as journal database jointly established by Chinese Academy of Social Sciences (CASS) and National Social Science Fund. The commercial bibliographic databases have to face many difficulties: intellectual property disputes, the benefit distribution between the hardcopy periodical and the commercial bibliographic database, the lack of quality assessment about the commercial bibliographic databases, the need of improving digital technology as well as the lack of a unified database regulation, which restricts the development of commercial bibliographic databases. This paper puts forward the countermeasures from the perspective of how to enhance the governmental management; how to protect the intellectual property fight; how to improve the technical standard of the commercial bibliographic databases; how to build interest distribution between the hardcopy periodical and the commercial bibliographic database; how to improve the quality of commercial bibliographic databases; and how to improve the industrial chain of the commercial bibliographic databases.
基金National Natural Science Foundation of China (No.82060503)。
文摘Objective:To investigate the variation,expression and clinical significance of E2F3 gene in melanoma.Methods:Firstly,cBioportal database,Oncomine database and GEO database were used to analyze the variation and expression level of E2F3 gene in melanoma.OSskcm database and TISIDB database were used to analyze the relationship between E2F3 and melanoma prognosis and tumor immune infiltrating cells.Then,the LinkedOmics database was used to identify the differential genes related to E2F3 expression in melanoma and analyze their biological functions.Finally,small molecule compounds for the treatment of melanoma were screened through the CMap database.Results:The mutation rate of E2F3 gene in melanoma is about 4%,and there are 21 mutation sites.Compared with normal skin tissues,the expression of E2F3 gene in melanoma was significantly increased(P<0.01).The mutation and increased expression of E2F3 gene were associated with the shortened overall survival(OS)of melanoma patients(P<0.05).The CNA level of E2F3 was negatively correlated with the expression levels of lymphocytes such as pDC,Neutrophil,Act DC and Th17,and negatively correlated with the expression levels of chemokines such as CXCL5,CCL13 and CCR1.The methylation level of E2F3 was positively correlated with the expression levels of Th1,Neutrophil,Act DC and other lymphocytes,and positively correlated with the expression levels of CXCL16,CXCL12,CCR1 and other chemokines.The expression level of E2F3 was negatively correlated with the expression levels of lymphocytes such as Th17,Tcm CD4 and Th1,and negatively correlated with the expression levels of chemokines such as CXCL 16,CCL 22 and CCL 2.The expression of 96 genes such as UHRF1BP1 in melanoma was significantly correlated with the expression of E2F3(|cor|0.5,P<0.05).The above genes were mainly related to RNA transport,eukaryotic ribosome biogenesis,cell cycle and other pathways.Among them,WDR12,WDR43,RBM28,UTP18,DKC1,PAK1IP1,DDX31,TEX10,TRUB1 and TRMT61B were the top 10 hub genes.YC-1,simvastatin,cytochalasin-d,Deforolimus and cytochalasin-b may be five small molecule compounds for the treatment of melanoma.Conclusion:The mutation and increased expression level of E2F3 gene are related to the poor prognosis of melanoma and participate in the occurrence and development of melanoma by affecting the expression of different tumor immune infiltrating cell subtypes,which may be a potential diagnostic marker and therapeutic target for melanoma.
基金Supported by the Open Researches Fund Program of L IESMARS(WKL(0 0 ) 0 30 2 )
文摘Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.
文摘Data acquisition and modeling are the two important, difficult and costful aspects in a Cybercity project. 2D-GIS is mature and can manage a lot of spatial data. Thus 3D-GIS should make the best of data and technology of 2D-GIS. Construction of a useful synthetic environment requires integration of multiple types of information like DEM, texture images and 3D representation of objects such as buildings. In this paper, the method for 3D city landscape data model and visualization based on integrated databases is presented. Since the data volume of raster are very huge, special strategies(for example, pyramid gridded method) must be adopted in order to manage raster data efficiently. Three different methods of data acquisition, the proper data structure and a simple modeling method are presented as well. At last, a pilot project of Shanghai Cybercity is illustrated.
基金supported by the National Natural Science Foundation of China (60471055)the National "863" High Technology Research and Development Program of China (2007AA01Z443)
文摘To solve the problems of shaving and reusing information in the information system, a rules-based ontology constructing approach from object-relational databases is proposed. A 3-tuple ontology constructing model is proposed first. Then, four types of ontology constructing rules including class, property, property characteristics, and property restrictions ave formalized according to the model. Experiment results described in Web ontology language prove that our proposed approach is feasible for applying in the semantic objects project of semantic computing laboratory in UC Irvine. Our approach reduces about twenty percent constructing time compared with the ontology construction from relational databases.
基金Supported by National Natural Science Foundation of China(6 98730 2 7)
文摘Most knowledgeable people agree that networking and routing technologies have been around about 25 years. Routing is simultaneously the most complicated function of a network and the most important. It is of the same kind that more than 70% of computer application fields are MIS applications. So the challenge in building and using a MIS in the network is developing the means to find, access, and communicate large databases or multi databases systems. Because general databases are not time continuous, in fact, they can not be streaming, so we can't obtain reliable and secure quality of service by deleting some unimportant datagrams in the databases transmission. In this article, we will discuss which kind of routing protocol is the best type for large databases or multi databases systems transmission in the networks.
文摘The necessity and the feasibility of introducing attribute weight into digital fingerprinting system are given. The weighted algorithm for fingerprinting relational databases of traitor tracing is proposed. Higher weights are assigned to more significant attributes, so important attributes are more frequently fingerprinted than other ones. Finally, the robustness of the proposed algorithm, such as performance against collusion attacks, is analyzed. Experimental results prove the superiority of the algorithm.
基金Supported by the Aeronautics Science Foundation of China (02F52033), the High-Technology Research Project of Jiangsu Province (BG2004005) and Youth Research Foundation of Qufu Normal Univer-sity(XJ02057)
文摘A weighted algorithm for watermarking relational databases for copyright protection is presented. The possibility of watermarking an attribute is assigned according to its weight decided by the owner of the database. A one-way hash function and a secret key known only to the owner of the data are used to select tuples and bits to mark. By assigning high weight to significant attributes, the scheme ensures that important attributes take more chance to be marked than less important ones. Experimental results show that the proposed scheme is robust against various forms of attacks, and has perfect immunity to subset attack.
基金supported by Researchers Supporting Project(No.RSP-2020/102)King Saud University,Riyadh,Saudi Arabiathe National Natural Science Foundation of China(Nos.61802031,61772454,61811530332,61811540410)+4 种基金the Natural Science Foundation of Hunan Province,China(No.2019JGYB177)the Research Foundation of Education Bureau of Hunan Province,China(No.18C0216)the“Practical Innovation and Entrepreneurial Ability Improvement Plan”for Professional Degree Graduate students of Changsha University of Science and Technology(No.SJCX201971)Hunan Graduate Scientific Research Innovation Project,China(No.CX2019694)This work is also supported by the Programs of Transformation and Upgrading of Industries and Information Technologies of Jiangsu Province(No.JITC-1900AX2038/01).
文摘As the typical peer-to-peer distributed networks, blockchain systemsrequire each node to copy a complete transaction database, so as to ensure newtransactions can by verified independently. In a blockchain system (e.g., bitcoinsystem), the node does not rely on any central organization, and every node keepsan entire copy of the transaction database. However, this feature determines thatthe size of blockchain transaction database is growing rapidly. Therefore, with thecontinuous system operations, the node memory also needs to be expanded tosupport the system running. Especially in the big data era, the increasing networktraffic will lead to faster transaction growth rate. This paper analyzes blockchaintransaction databases and proposes a storage optimization scheme. The proposedscheme divides blockchain transaction database into cold zone and hot zone usingexpiration recognition method based on Least Recently Used (LRU) algorithm. Itcan achieve storage optimization by moving unspent transaction outputs outsidethe in-memory transaction databases. We present the theoretical analysis on theoptimization method to validate the effectiveness. Extensive experiments showour proposed method outperforms the current mechanism for the blockchaintransaction databases.
基金Supported by the National Nature Science Foundation under"Outstanding Young Researchers"(495 2 5 10 1)
文摘It is a period of information explosion. Especially for spatial information science, information can be acquired through many ways, such as man made planet, aeroplane, laser, digital photogrammetry and so on. Spatial data sources are usually distributed and heterogeneous. Federated database is the best resolution for the share and interoperation of spatial database. In this paper, the concepts of federated database and interoperability are introduced. Three heterogeneous kinds of spatial data, vector, image and DEM are used to create integrated database. A data model of federated spatial databases is given.
基金Supported by the National86 3High-Tech Project!(863-306-Z705-0 2 ) National Natural Science F oundation of China!(69896240)
文摘The technique of Knowlege Discovery in Databases (KDD) to learn valuable knowledge hidden in network alarm databases is introduced. To get such knowledge, we propose an efficient method based on sliding windows (named as Slidwin) to discover different episode rules from time squential alarm data. The experimental results show that given different thresholds parameters, large amount of different rules could be discovered quickly.
基金Supported by National Natural Science Foundationof China (60073045)
文摘In this paper, constrained K closest pairs query is introduced, wbich retrieves the K closest pairs satisfying the given spatial constraint from two datasets. For data sets indexed by R trees in spatial databases, three algorithms are presented for answering this kind of query. Among of them, two-phase Range+Join and Join+Range algorithms adopt the strategy that changes the execution order of range and closest pairs queries, and constrained heap-based algorithm utilizes extended distance functions to prune search space and minimize the pruning distance. Experimental results show that constrained heap-base algorithm has better applicability and performance than two-phase algorithms.
基金National Natural Science Foundation of China,No.31971180 and No.11474013.
文摘Almost all the cellular processes in a living system are controlled by proteins:They regulate gene expression,catalyze chemical reactions,transport small molecules across membranes,and transmit signal across membranes.Even,a viral infection is often initiated through virus-host protein interactions.Protein-protein interactions(PPIs)are the physical contacts between two or more proteins and they represent complex biological functions.Nowadays,PPIs have been used to construct PPI networks to study complex pathways for revealing the functions of unknown proteins.Scientists have used PPIs to find the molecular basis of certain diseases and also some potential drug targets.In this review,we will discuss how PPI networks are essential to understand the molecular basis of virus-host relationships and several databases which are dedicated to virus-host interaction studies.Here,we present a short but comprehensive review on PPIs,including the experimental and computational methods of finding PPIs,the databases dedicated to virus-host PPIs,and the associated various applications in protein interaction networks of some lethal viruses with their hosts.
文摘GeoStar is the registered trademark of GIS software made by WTUSM in China.By means of the GeoStar,multi_scale images,DEMs,graphics and attributes integrated in very large seamless databases can be created,and the multi_dimensional dynamic visualization and information extraction are also available.This paper describes the fundamental characteristics of such huge integrated databases,for instance,the data models,database structures and the spatial index strategies.At last,the typical applications of GeoStar for a few pilot projects like the Shanghai CyberCity and the Guangdong provincial spatial data infrastructure (SDI) are illustrated and several concluding remarks are stressed.
文摘Background: Suicide among physicians is a serious public health issue, with an extremely complex and multifactorial behavior. Aim: The aim of this study was to use the theme “Suicide among Physicians” to exemplify the analysis of methodological similarities between the scientific content available at MEDLINE and BVS databases, as scientific research tools. Methods: This is a systematic review with metanalysis. The following combinations of keywords were used for data search in the referred databases: “suicide” AND “physicians” AND “public heath”. Results: Three hundred and thirteen publications were identified, but only 16 studies were chosen. Great association was found between MEDLINE and BVS databases and the Odds Ratio regarding the theme: “Suicide among physicians”. Conclusions: Considering the similarities found in the utilization of the two analyzed databases, it was possible to identify that suicide among physicians is associated with the exercise of an important professional role in the society and in the workplace. With regard to scientific information, the p-value-obtained value (<0.05) seems to be statistically significant for the association between the suggested theme and the methodological similarities of the scientific information available in the analyzed databases. Thus, these open-access research tools are considered scientific reliable tools.
基金support given by the research project HR-QoD - Quality of data (outliers, inconsistencies and errors) in hospital inpatient databases: methods and implications for data modeling, cleansing and analysis (project PTDC/SAUESA/75660/2006)
文摘AIM: To detect ophthalmic adverse drug reactions(ADRs), that occurred in Portugal from 2000 to 2009, through the utilization of administrative hospital databases. We also intended to compare the results of this methodology with spontaneous reporting.METHODS: We conducted a retrospective nationwide study using hospital administrative databases, which included all inpatients and outpatients in all public hospitals in Portugal, from 2000 to 2009. We used International Classification of Diseases- 9th Revision- Clinical Modification(ICD-9-CM) coding data that allowed the detection of ADRs. We used WHO's definition for ADR. We searched all of ICD-9-CM terms in Ophthalmology for codes that included "drug-induced", "iatrogenic", "toxic" and all other that could signal an ADR, such as "362.55- toxic maculopathy" or "365.03- steroid responders", and also "E" codes(codes from E930 to E949.9, that exclude intoxications and errors).RESULTS: From 11944725 hospitalizations or ambulatory episodes within that period of time, we identified 1524 probable ophthalmic ADRs(corresponding to a frequency of 1.28 per 10000 episodes) and an additional 100 possible ophthalmic ADRs. We used only 4 person-hours in the application of this methodology. A total of 113 spontaneous reports arose from ophthalmic ADRs from 2000 to 2009 in Portugal(frequency of 0.095 per 10000 episodes).To our knowledge, this was the first estimate of the frequency of ophthalmic ADRs through the use of databases, and the first nationwide estimate of ophthalmic ADRs, in Portugal. We identified 1524 probable ADRs and 100 possible ADRs. CONCLUSION: This database methodology adapted for Ophthalmology may represent a new approach for the detection of ophthalmic ADRs, since these codes exist in the ICD-9-CM classification. Its performance was clearly superior to spontaneous reporting.
文摘The typical characteristic of the topology of Bayesian networks (BNs) is the interdependence among different nodes (variables), which makes it impossible to optimize one variable independently of others, and the learning of BNs structures by general genetic algorithms is liable to converge to local extremum. To resolve efficiently this problem, a self-organizing genetic algorithm (SGA) based method for constructing BNs from databases is presented. This method makes use of a self-organizing mechanism to develop a genetic algorithm that extended the crossover operator from one to two, providing mutual competition between them, even adjusting the numbers of parents in recombination (crossover/recomposition) schemes. With the K2 algorithm, this method also optimizes the genetic operators, and utilizes adequately the domain knowledge. As a result, with this method it is able to find a global optimum of the topology of BNs, avoiding premature convergence to local extremum. The experimental results proved to be and the convergence of the SGA was discussed.