The advantage of recursive programming is that it is very easy to write and it only requires very few lines of code if done correctly.Structured query language(SQL)is a database language and is used to manipulate data...The advantage of recursive programming is that it is very easy to write and it only requires very few lines of code if done correctly.Structured query language(SQL)is a database language and is used to manipulate data.In Microsoft SQL Server 2000,recursive queries are implemented to retrieve data which is presented in a hierarchical format,but this way has its disadvantages.Common table expression(CTE)construction introduced in Microsoft SQL Server 2005 provides the significant advantage of being able to reference itself to create a recursive CTE.Hierarchical data structures,organizational charts and other parent-child table relationship reports can easily benefit from the use of recursive CTEs.The recursive query is illustrated and implemented on some simple hierarchical data.In addition,one business case study is brought forward and the solution using recursive query based on CTE is shown.At the same time,stored procedures are programmed to do the recursion in SQL.Test results show that recursive queries based on CTEs bring us the chance to create much more complex queries while retaining a much simpler syntax.展开更多
To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed...This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed herein utilizes the fusion of diversified feature formats,specifically,metadata,textual,and pattern features.The goal is to enhance the system’s ability to discern and generalize transformation rules fromsource to destination formats in varied contexts.Firstly,the article delves into the methodology for extracting these distinct features from raw data and the pre-processing steps undertaken to prepare the data for the model.Subsequent sections expound on the mechanism of feature optimization using Recursive Feature Elimination(RFE)with linear regression,aiming to retain the most contributive features and eliminate redundant or less significant ones.The core of the research revolves around the deployment of the XGBoostmodel for training,using the prepared and optimized feature sets.The article presents a detailed overview of the mathematical model and algorithmic steps behind this procedure.Finally,the process of rule discovery(prediction phase)by the trained XGBoost model is explained,underscoring its role in real-time,automated data transformations.By employingmachine learning and particularly,the XGBoost model in the context of Business Rule Engine(BRE)data transformation,the article underscores a paradigm shift towardsmore scalable,efficient,and less human-dependent data transformation systems.This research opens doors for further exploration into automated rule discovery systems and their applications in various sectors.展开更多
GML is becoming the de facto standard for electronic data exchange among the applications of Web and distributed geographic information systems. However, the conventional query languages (e. g. SQL and its extended v...GML is becoming the de facto standard for electronic data exchange among the applications of Web and distributed geographic information systems. However, the conventional query languages (e. g. SQL and its extended versions) are not suitable for direct querying and updating of GML documents. Even the effective approaches working well with XML could not guarantee good results when applied to GML documents. Although XQuery is a powerful standard query language for XML, it is not proposed for querying spatial features, which constitute the most important components in GML documents. We propose GQL, a query language specification to support spatial queries over GML documents by extending XQuery. The data model, algebra, and formal semantics as well as various spatial Junctions and operations of GQL are presented in detail.展开更多
Visual Query Language on Spatial Information (SIVQL) is one kind of visual query language based on the extension of Query by Example (QBE). It is a visual operation based on graphics or media object, such as point, li...Visual Query Language on Spatial Information (SIVQL) is one kind of visual query language based on the extension of Query by Example (QBE). It is a visual operation based on graphics or media object, such as point, line and area elements. In this paper, the relation calculation and query function of SIVQL have been studied and discussed by using set theory and relation algebra. The theory foundation of SIVQL has been investigated by the mathematical method. Finally, its application examples are also given with the specific information system.展开更多
Aiming to improve the Structured Query Language( SQL) injection penetration test accuracy through the formalismguided test case generation,an attack purpose based attack tree model of SQL injection is proposed,and the...Aiming to improve the Structured Query Language( SQL) injection penetration test accuracy through the formalismguided test case generation,an attack purpose based attack tree model of SQL injection is proposed,and then under the guidance of this model, the formal descriptions for the SQL injection vulnerability feature and SQL injection attack inputs are established. Moreover,according to new coverage criteria,these models are instantiated and the executable test cases are generated.Experiments show that compared with the random enumerated test case used in other works,the test case generated by our method can detect the SQL injection vulnerability more effectively. Therefore,the false negative is reduced and the test accuracy is improved.展开更多
For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and r...For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.展开更多
The unified multimedia query language (UMQL) is a powerful general-purpose multimedia query language, and it is very suitable for multimedia information retrieval. The paper proposes a grammar analysis model to impl...The unified multimedia query language (UMQL) is a powerful general-purpose multimedia query language, and it is very suitable for multimedia information retrieval. The paper proposes a grammar analysis model to implement an effective grammatical processing for the language. It separates the grammar analysis ofa UMQL query specification into two phases: syntactic analysis and semantic analysis, and then respectively uses Backus-Naur form (EBNF) and logical algebra to specify both restrictive grammar rules. As a result, the model can present error guiding information for a query specification which owns incorrect grammar. The model not only suits well the processing of UMQL queries, but aLso has a guiding significance for other projects concerning query processings of descriptive query languages.展开更多
Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query pla...Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query plan is put forward, which can generate an equivalent UMQA internal query plan for any UMQL query. Then, to improve the execution costs of UMQA query plans effectively, equivalent UMQA translation formulae and general optimization strategies are studied, and an optimization algorithm for UMQA internal query plans is presented. This algorithm uses equivalent UMQA translation formulae to optimize query plans, and makes the optimized query plans accord with the optimization strategies as much as possible. Finally, the logic implementation methods of UMQA plans, i.e., logic implementation methods of UMQA operators, are discussed to obtain useful target data from a muifirnedia database. All of these algorithms are implemented in a UMQL prototype system. Application results show that these query processing techniques are feasible and applicable.展开更多
A new secured database management system architecture using intrusion detection systems(IDS)is proposed in this paper for organizations with no previous role mapping for users.A simple representation of Structured Que...A new secured database management system architecture using intrusion detection systems(IDS)is proposed in this paper for organizations with no previous role mapping for users.A simple representation of Structured Query Language queries is proposed to easily permit the use of the worked clustering algorithm.A new clustering algorithm that uses a tube search with adaptive memory is applied to database log files to create users’profiles.Then,queries issued for each user are checked against the related user profile using a classifier to determine whether or not each query is malicious.The IDS will stop query execution or report the threat to the responsible person if the query is malicious.A simple classifier based on the Euclidean distance is used and the issued query is transformed to the proposed simple representation using a classifier,where the Euclidean distance between the centers and the profile’s issued query is calculated.A synthetic data set is used for our experimental evaluations.Normal user access behavior in relation to the database is modelled using the data set.The false negative(FN)and false positive(FP)rates are used to compare our proposed algorithm with other methods.The experimental results indicate that our proposed method results in very small FN and FP rates.展开更多
One way of achieving interoperability among heterogeneous, distributed DBMSs is through a multidatabase system. Recently, there is an increasing use of CORBA implementation in developing multidatabase systems. Panoram...One way of achieving interoperability among heterogeneous, distributed DBMSs is through a multidatabase system. Recently, there is an increasing use of CORBA implementation in developing multidatabase systems. Panorama is a multidatabase system that has been implemented on the top of CORBA compliant namely VisiBroker. It aims to achieve interoperability among Oracle, Sybase and other different DBMSs through the registration of these DBMSs to Panorama and through the single global query language PanoSQL designed for this system. In this paper, we first introduce CORBA for the interoperability in multidatabase systems. Then, a general view for our designed multidatabase system, Panorama, has been given. In section four, we introduce the global query language PanoSQL designed to achieve interoperability among the different DBMSs implemented in Panorama. Then, as an example, we present the registration of Oracle to Panorama in order to achieve interoperability in this system. And finally, a conclusion and the future work for this system have been given.展开更多
Temporal ontologies allow to represent not only concepts,their properties,and their relationships,but also time-varying information through explicit versioning of definitions or through the four-dimensional perduranti...Temporal ontologies allow to represent not only concepts,their properties,and their relationships,but also time-varying information through explicit versioning of definitions or through the four-dimensional perdurantist view.They are widely used to formally represent temporal data semantics in several applications belonging to different fields(e.g.,Semantic Web,expert systems,knowledge bases,big data,and artificial intelligence).They facilitate temporal knowledge representation and discovery,with the support of temporal data querying and reasoning.However,there is no standard or consensual temporal ontology query language.In a previous work,we have proposed an approach namedτJOWL(temporal OWL 2 from temporal JSON,where OWL 2 stands for"OWL 2 Web Ontology Language"and JSON stands for"JavaScript Object Notation").τJOWL allows(1)to automatically build a temporal OWL 2 ontology of data,following the Closed World Assumption(CWA),from temporal JSON-based big data,and(2)to manage its incremental maintenance accommodating their evolution,in a temporal and multi-schema-version environment.In this paper,we propose a temporal ontology query language forτJOWL,namedτSQWRL(temporal SQWRL),designed as a temporal extension of the ontology query language—Semantic Query-enhanced Web Rule Language(SQWRL).The new language has been inspired by the features of the consensual temporal query language TSQL2(Temporal SQL2),well known in the temporal(relational)database community.The aim of the proposal is to enable and simplify the task of retrieving any desired ontology version or of specifying any(complex)temporal query on time-varying ontologies generated from time-varying big data.Some examples,in the Internet of Healthcare Things(IoHT)domain,are provided to motivate and illustrate our proposal.展开更多
Objective: Medical data mining and sharing is an important process in E-Health applications. However, because these data consist of a large amount of personal private information of patients, there is the risk of priv...Objective: Medical data mining and sharing is an important process in E-Health applications. However, because these data consist of a large amount of personal private information of patients, there is the risk of privacy disclosure when sharing and mining. Therefore, ensuring the security of medical big data in the process of publishing, sharing, and mining has become the focus of current research. The objective of our study is to design a framework based on a differential privacy protection mechanism to ensure the secure sharing of medical data. We developed a privacy protection query language (PQL) that integrates multiple data mining methods and provides a secure sharing function.Methods: This study is mainly performed in Xuzhou Medical University, China and designs three sub-modules: a parsing module, mining module, and noising module. Each module encapsulates different computing methods, such as a composite parser and a noise theory. In the PQL framework, we apply the differential privacy theory to the results of the computing between modules to guarantee the security of various mining algorithms. These computing devices operate independently, but the mining results depend on their cooperation. In addition, PQL is encapsulated in MNSSp3 that is a data mining and security sharing platform and the data comes from public data sets, such as UCBI. The public data set (NCBI database) was used as the experimental data, and the data collection time was January 2020.Results: We designed and developed a query language that provides functions for medical data mining, sharing, and privacy preservation. We theoretically proved the performance of the PQL framework. The experimental results show that the PQL framework can ensure the security of each mining result and the availability of the output results is above 97%.Conclusion: Our framework enables medical data providers to securely share health data or treatment data and develops a usable query language, based on a differential privacy mechanism, that enables researchers to mine information securely using data mining algorithms.展开更多
A conceptual level database language for the entity relationship (ER) modelimplicitly contains integrities basic to ER concepts and special retrieval seman-tics for inheritances of attributes and relationships. Prolog...A conceptual level database language for the entity relationship (ER) modelimplicitly contains integrities basic to ER concepts and special retrieval seman-tics for inheritances of attributes and relationships. Prolog, which belongs tothe logical and physical level, cannot be used as a foundation to directly definethe database language. It is shown how Prolog can be enhanced to under-stand the concepts of entities, relationships, attributes and is-a relationships.The enhanced Prolog is then used as a foundation to define the semantics of adatabase query language for the ER model. The three basic functions of modelspecification, updates and retrievas are defined.展开更多
Probabilistic programming is a powerful means for formally specifying machine learning models.The inference engine of a probabilistic programming environment can be used for serving complex queries on these models.Mos...Probabilistic programming is a powerful means for formally specifying machine learning models.The inference engine of a probabilistic programming environment can be used for serving complex queries on these models.Most of the current research in probabilistic programming is dedicated to the design and implementation of highly efficient inference engines.Much less research aims at making the power of these inference engines accessible to non-expert users.Probabilistic programming means writing code.Yet many potential users from promising application areas such as the social sciences lack programming skills.This prompted recent efforts in synthesizing probabilistic programs directly from data.However,working with synthesized programs still requires the user to read,understand,and write some code,for instance,when invoking the inference engine for answering queries.Here,we present an interactive visual approach to synthesizing and querying probabilistic programs that does not require the user to read or write code.展开更多
This article proposes a graph-theoretic methodology for query approximation in Geographic Information Systems, enabling the relaxation of three kinds of query constraints: topological, semantic and structural. An app...This article proposes a graph-theoretic methodology for query approximation in Geographic Information Systems, enabling the relaxation of three kinds of query constraints: topological, semantic and structural. An approximate query is associated with a value corresponding to the degree of similarity with the original query. Such a value is computed for topological constraints on the basis of the topological distance between configurations, for semantic constraints using the information content approach, and for structural constraints revisiting the maximum weighted matching problem in bipartite graphs. Finally, the high correlation of our proposal with human judgment is demonstrated by an experiment.展开更多
The integration of heterogeneous multidatabases on a network is one of the key issues to be sofved in thedevelopment of CIMS (computer integrated manufacturing system). As a solution to this issue, a multidatabase int...The integration of heterogeneous multidatabases on a network is one of the key issues to be sofved in thedevelopment of CIMS (computer integrated manufacturing system). As a solution to this issue, a multidatabase integration environment, CIMBASE, has been developed. CIMBASE adopts an object-oriented data model and provides users with a series of software tools: a query language, a pre-compiler, a graphical database schema editor,a graphical query interface and a form based query generation tool.This paper discusses in detail the major aspectsof CIMBASE: its object-oriented data model, query language interpreter and the design and implementation of itsprc-compiler. The design and algorithms presented in this paper provide a solid foundation for research on multidatabase integration.展开更多
In recent years,Apache Spark has become the de facto standard for big data processing.SparkSQL is a module offering support for relational analysis on Spark with Structured Query Language(SQL).SparkSQL provides conven...In recent years,Apache Spark has become the de facto standard for big data processing.SparkSQL is a module offering support for relational analysis on Spark with Structured Query Language(SQL).SparkSQL provides convenient data processing interfaces.Despite its efficient optimizer,SparkSQL still suffers from the inefficiency of Spark resulting from Java virtual machine and the unnecessary data serialization and deserialization.Adopting native languages such as C++could help to avoid such bottlenecks.Benefiting from a bare-metal runtime environment and template usage,systems with C++interfaces usually achieve superior performance.However,the complexity of native languages also increases the required programming and debugging efforts.In this work,we present LotusSQL,an engine to provide SQL support for dataset abstraction on a native backend Lotus.We employ a convenient SQL processing framework to deal with frontend jobs.Advanced query optimization technologies are added to improve the quality of execution plans.Above the storage design and user interface of the compute engine,LotusSQL implements a set of structured dataset operations with high efficiency and integrates them with the frontend.Evaluation results show that LotusSQL achieves a speedup of up to 9 in certain queries and outperforms Spark SQL in a standard query benchmark by more than 2 on average.展开更多
The paper discusses the need of a high-level query language to allow analysts,geographers and,in general,non-programmers to easily cross-analyze multi-source VGI created by means of apps,crowd-sourced data from social...The paper discusses the need of a high-level query language to allow analysts,geographers and,in general,non-programmers to easily cross-analyze multi-source VGI created by means of apps,crowd-sourced data from social networks and authoritative geo-referenced data,usually represented as JSON data sets(nowadays,the de facto standard for data exported by social networks).Since an easy to use high-level language for querying and manipulating collections of possibly geo-tagged JSON objects is still unavailable,we propose a truly declarative language,named J-CO-QL,that is based on a well-defined execution model.A plug-in for a GIS permits to visualize geo-tagged data sets stored in a NoSQL database such as MongoDB;furthermore,the same plug-in can be used to write and execute J-CO-QL queries on those databases.The paper introduces the language by exemplifying its operators within a real study case,the aim of which is to understand the mobility of people in the neighborhood of Bergamo city.Cross-analysis of data about transportation networks and VGI from travelers is performed,by means of J-CO-QL language,capable to manipulate and transform,combine and join possibly geo-tagged JSON objects,in order to produce new possibly geo-tagged JSON objects satisfying users’needs.展开更多
Logic flaws within web applications will allow malicious operations to be triggered towards back-end database. Existing approaches to identifying logic flaws of database accesses are strongly tied to structured query ...Logic flaws within web applications will allow malicious operations to be triggered towards back-end database. Existing approaches to identifying logic flaws of database accesses are strongly tied to structured query language (SQL) statement construction and cannot be applied to the new generation of web applications that use not only structured query language (NoSQL) databases as the storage tier. In this paper, we present Lom, a black-box approach for discovering many categories of logic flaws within MongoDB- based web applications. Our approach introduces a MongoDB operation model to support new features of MongoDB and models the application logic as a mealy finite state machine. During the testing phase, test inputs which emulate state violation attacks are constructed for identifying logic flaws at each application state. We apply Lom to several MongoDB-based web applications and demonstrate its effectiveness.展开更多
文摘The advantage of recursive programming is that it is very easy to write and it only requires very few lines of code if done correctly.Structured query language(SQL)is a database language and is used to manipulate data.In Microsoft SQL Server 2000,recursive queries are implemented to retrieve data which is presented in a hierarchical format,but this way has its disadvantages.Common table expression(CTE)construction introduced in Microsoft SQL Server 2005 provides the significant advantage of being able to reference itself to create a recursive CTE.Hierarchical data structures,organizational charts and other parent-child table relationship reports can easily benefit from the use of recursive CTEs.The recursive query is illustrated and implemented on some simple hierarchical data.In addition,one business case study is brought forward and the solution using recursive query based on CTE is shown.At the same time,stored procedures are programmed to do the recursion in SQL.Test results show that recursive queries based on CTEs bring us the chance to create much more complex queries while retaining a much simpler syntax.
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
文摘This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed herein utilizes the fusion of diversified feature formats,specifically,metadata,textual,and pattern features.The goal is to enhance the system’s ability to discern and generalize transformation rules fromsource to destination formats in varied contexts.Firstly,the article delves into the methodology for extracting these distinct features from raw data and the pre-processing steps undertaken to prepare the data for the model.Subsequent sections expound on the mechanism of feature optimization using Recursive Feature Elimination(RFE)with linear regression,aiming to retain the most contributive features and eliminate redundant or less significant ones.The core of the research revolves around the deployment of the XGBoostmodel for training,using the prepared and optimized feature sets.The article presents a detailed overview of the mathematical model and algorithmic steps behind this procedure.Finally,the process of rule discovery(prediction phase)by the trained XGBoost model is explained,underscoring its role in real-time,automated data transformations.By employingmachine learning and particularly,the XGBoost model in the context of Business Rule Engine(BRE)data transformation,the article underscores a paradigm shift towardsmore scalable,efficient,and less human-dependent data transformation systems.This research opens doors for further exploration into automated rule discovery systems and their applications in various sectors.
基金Funded by the Youth Chengguang Project of Science and Technology of Wuhan City of China(No.20045006071-16)
文摘GML is becoming the de facto standard for electronic data exchange among the applications of Web and distributed geographic information systems. However, the conventional query languages (e. g. SQL and its extended versions) are not suitable for direct querying and updating of GML documents. Even the effective approaches working well with XML could not guarantee good results when applied to GML documents. Although XQuery is a powerful standard query language for XML, it is not proposed for querying spatial features, which constitute the most important components in GML documents. We propose GQL, a query language specification to support spatial queries over GML documents by extending XQuery. The data model, algebra, and formal semantics as well as various spatial Junctions and operations of GQL are presented in detail.
文摘Visual Query Language on Spatial Information (SIVQL) is one kind of visual query language based on the extension of Query by Example (QBE). It is a visual operation based on graphics or media object, such as point, line and area elements. In this paper, the relation calculation and query function of SIVQL have been studied and discussed by using set theory and relation algebra. The theory foundation of SIVQL has been investigated by the mathematical method. Finally, its application examples are also given with the specific information system.
基金National Natural Science Foundation of China(No.51274150)Tianjin Major Project of Application Foundation and Advanced Technology,China(No.12JCZDJC27800)
文摘Aiming to improve the Structured Query Language( SQL) injection penetration test accuracy through the formalismguided test case generation,an attack purpose based attack tree model of SQL injection is proposed,and then under the guidance of this model, the formal descriptions for the SQL injection vulnerability feature and SQL injection attack inputs are established. Moreover,according to new coverage criteria,these models are instantiated and the executable test cases are generated.Experiments show that compared with the random enumerated test case used in other works,the test case generated by our method can detect the SQL injection vulnerability more effectively. Therefore,the false negative is reduced and the test accuracy is improved.
文摘For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.
基金the National High-Tech Research and Development Plan of China under Grant No. 2006AA01Z430.
文摘The unified multimedia query language (UMQL) is a powerful general-purpose multimedia query language, and it is very suitable for multimedia information retrieval. The paper proposes a grammar analysis model to implement an effective grammatical processing for the language. It separates the grammar analysis ofa UMQL query specification into two phases: syntactic analysis and semantic analysis, and then respectively uses Backus-Naur form (EBNF) and logical algebra to specify both restrictive grammar rules. As a result, the model can present error guiding information for a query specification which owns incorrect grammar. The model not only suits well the processing of UMQL queries, but aLso has a guiding significance for other projects concerning query processings of descriptive query languages.
基金The National High Technology Research and Development Program of China(863 Program) (No.2006AA01Z430)
文摘Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query plan is put forward, which can generate an equivalent UMQA internal query plan for any UMQL query. Then, to improve the execution costs of UMQA query plans effectively, equivalent UMQA translation formulae and general optimization strategies are studied, and an optimization algorithm for UMQA internal query plans is presented. This algorithm uses equivalent UMQA translation formulae to optimize query plans, and makes the optimized query plans accord with the optimization strategies as much as possible. Finally, the logic implementation methods of UMQA plans, i.e., logic implementation methods of UMQA operators, are discussed to obtain useful target data from a muifirnedia database. All of these algorithms are implemented in a UMQL prototype system. Application results show that these query processing techniques are feasible and applicable.
文摘A new secured database management system architecture using intrusion detection systems(IDS)is proposed in this paper for organizations with no previous role mapping for users.A simple representation of Structured Query Language queries is proposed to easily permit the use of the worked clustering algorithm.A new clustering algorithm that uses a tube search with adaptive memory is applied to database log files to create users’profiles.Then,queries issued for each user are checked against the related user profile using a classifier to determine whether or not each query is malicious.The IDS will stop query execution or report the threat to the responsible person if the query is malicious.A simple classifier based on the Euclidean distance is used and the issued query is transformed to the proposed simple representation using a classifier,where the Euclidean distance between the centers and the profile’s issued query is calculated.A synthetic data set is used for our experimental evaluations.Normal user access behavior in relation to the database is modelled using the data set.The false negative(FN)and false positive(FP)rates are used to compare our proposed algorithm with other methods.The experimental results indicate that our proposed method results in very small FN and FP rates.
文摘One way of achieving interoperability among heterogeneous, distributed DBMSs is through a multidatabase system. Recently, there is an increasing use of CORBA implementation in developing multidatabase systems. Panorama is a multidatabase system that has been implemented on the top of CORBA compliant namely VisiBroker. It aims to achieve interoperability among Oracle, Sybase and other different DBMSs through the registration of these DBMSs to Panorama and through the single global query language PanoSQL designed for this system. In this paper, we first introduce CORBA for the interoperability in multidatabase systems. Then, a general view for our designed multidatabase system, Panorama, has been given. In section four, we introduce the global query language PanoSQL designed to achieve interoperability among the different DBMSs implemented in Panorama. Then, as an example, we present the registration of Oracle to Panorama in order to achieve interoperability in this system. And finally, a conclusion and the future work for this system have been given.
文摘Temporal ontologies allow to represent not only concepts,their properties,and their relationships,but also time-varying information through explicit versioning of definitions or through the four-dimensional perdurantist view.They are widely used to formally represent temporal data semantics in several applications belonging to different fields(e.g.,Semantic Web,expert systems,knowledge bases,big data,and artificial intelligence).They facilitate temporal knowledge representation and discovery,with the support of temporal data querying and reasoning.However,there is no standard or consensual temporal ontology query language.In a previous work,we have proposed an approach namedτJOWL(temporal OWL 2 from temporal JSON,where OWL 2 stands for"OWL 2 Web Ontology Language"and JSON stands for"JavaScript Object Notation").τJOWL allows(1)to automatically build a temporal OWL 2 ontology of data,following the Closed World Assumption(CWA),from temporal JSON-based big data,and(2)to manage its incremental maintenance accommodating their evolution,in a temporal and multi-schema-version environment.In this paper,we propose a temporal ontology query language forτJOWL,namedτSQWRL(temporal SQWRL),designed as a temporal extension of the ontology query language—Semantic Query-enhanced Web Rule Language(SQWRL).The new language has been inspired by the features of the consensual temporal query language TSQL2(Temporal SQL2),well known in the temporal(relational)database community.The aim of the proposal is to enable and simplify the task of retrieving any desired ontology version or of specifying any(complex)temporal query on time-varying ontologies generated from time-varying big data.Some examples,in the Internet of Healthcare Things(IoHT)domain,are provided to motivate and illustrate our proposal.
基金This work was supported by the Special Investigation on Science and Technology Basic Resources of the MOST of China(No.2019FY100103)the National Natural Science Founda-tion of China(No.62003291)+1 种基金the Xuzhou Science and Technology Project(No.KC20112)the Industry Univer-sity-Research-Cooperation Project in Jiangsu Province(No.BY2018124).
文摘Objective: Medical data mining and sharing is an important process in E-Health applications. However, because these data consist of a large amount of personal private information of patients, there is the risk of privacy disclosure when sharing and mining. Therefore, ensuring the security of medical big data in the process of publishing, sharing, and mining has become the focus of current research. The objective of our study is to design a framework based on a differential privacy protection mechanism to ensure the secure sharing of medical data. We developed a privacy protection query language (PQL) that integrates multiple data mining methods and provides a secure sharing function.Methods: This study is mainly performed in Xuzhou Medical University, China and designs three sub-modules: a parsing module, mining module, and noising module. Each module encapsulates different computing methods, such as a composite parser and a noise theory. In the PQL framework, we apply the differential privacy theory to the results of the computing between modules to guarantee the security of various mining algorithms. These computing devices operate independently, but the mining results depend on their cooperation. In addition, PQL is encapsulated in MNSSp3 that is a data mining and security sharing platform and the data comes from public data sets, such as UCBI. The public data set (NCBI database) was used as the experimental data, and the data collection time was January 2020.Results: We designed and developed a query language that provides functions for medical data mining, sharing, and privacy preservation. We theoretically proved the performance of the PQL framework. The experimental results show that the PQL framework can ensure the security of each mining result and the availability of the output results is above 97%.Conclusion: Our framework enables medical data providers to securely share health data or treatment data and develops a usable query language, based on a differential privacy mechanism, that enables researchers to mine information securely using data mining algorithms.
文摘A conceptual level database language for the entity relationship (ER) modelimplicitly contains integrities basic to ER concepts and special retrieval seman-tics for inheritances of attributes and relationships. Prolog, which belongs tothe logical and physical level, cannot be used as a foundation to directly definethe database language. It is shown how Prolog can be enhanced to under-stand the concepts of entities, relationships, attributes and is-a relationships.The enhanced Prolog is then used as a foundation to define the semantics of adatabase query language for the ER model. The three basic functions of modelspecification, updates and retrievas are defined.
基金This work was supported by the Carl Zeiss Foundation,Germany within the projects"Interactive Inference"and"A virtual Werkstatt for digitization in the sciences",and by the Ministry for Economics,Sciences and Digital Society of Thuringia(TMWWDG),Germany under the framework of the Landesprogramm ProDigital(DigLeben-5575/10-9).
文摘Probabilistic programming is a powerful means for formally specifying machine learning models.The inference engine of a probabilistic programming environment can be used for serving complex queries on these models.Most of the current research in probabilistic programming is dedicated to the design and implementation of highly efficient inference engines.Much less research aims at making the power of these inference engines accessible to non-expert users.Probabilistic programming means writing code.Yet many potential users from promising application areas such as the social sciences lack programming skills.This prompted recent efforts in synthesizing probabilistic programs directly from data.However,working with synthesized programs still requires the user to read,understand,and write some code,for instance,when invoking the inference engine for answering queries.Here,we present an interactive visual approach to synthesizing and querying probabilistic programs that does not require the user to read or write code.
文摘This article proposes a graph-theoretic methodology for query approximation in Geographic Information Systems, enabling the relaxation of three kinds of query constraints: topological, semantic and structural. An approximate query is associated with a value corresponding to the degree of similarity with the original query. Such a value is computed for topological constraints on the basis of the topological distance between configurations, for semantic constraints using the information content approach, and for structural constraints revisiting the maximum weighted matching problem in bipartite graphs. Finally, the high correlation of our proposal with human judgment is demonstrated by an experiment.
文摘The integration of heterogeneous multidatabases on a network is one of the key issues to be sofved in thedevelopment of CIMS (computer integrated manufacturing system). As a solution to this issue, a multidatabase integration environment, CIMBASE, has been developed. CIMBASE adopts an object-oriented data model and provides users with a series of software tools: a query language, a pre-compiler, a graphical database schema editor,a graphical query interface and a form based query generation tool.This paper discusses in detail the major aspectsof CIMBASE: its object-oriented data model, query language interpreter and the design and implementation of itsprc-compiler. The design and algorithms presented in this paper provide a solid foundation for research on multidatabase integration.
文摘In recent years,Apache Spark has become the de facto standard for big data processing.SparkSQL is a module offering support for relational analysis on Spark with Structured Query Language(SQL).SparkSQL provides convenient data processing interfaces.Despite its efficient optimizer,SparkSQL still suffers from the inefficiency of Spark resulting from Java virtual machine and the unnecessary data serialization and deserialization.Adopting native languages such as C++could help to avoid such bottlenecks.Benefiting from a bare-metal runtime environment and template usage,systems with C++interfaces usually achieve superior performance.However,the complexity of native languages also increases the required programming and debugging efforts.In this work,we present LotusSQL,an engine to provide SQL support for dataset abstraction on a native backend Lotus.We employ a convenient SQL processing framework to deal with frontend jobs.Advanced query optimization technologies are added to improve the quality of execution plans.Above the storage design and user interface of the compute engine,LotusSQL implements a set of structured dataset operations with high efficiency and integrates them with the frontend.Evaluation results show that LotusSQL achieves a speedup of up to 9 in certain queries and outperforms Spark SQL in a standard query benchmark by more than 2 on average.
文摘The paper discusses the need of a high-level query language to allow analysts,geographers and,in general,non-programmers to easily cross-analyze multi-source VGI created by means of apps,crowd-sourced data from social networks and authoritative geo-referenced data,usually represented as JSON data sets(nowadays,the de facto standard for data exported by social networks).Since an easy to use high-level language for querying and manipulating collections of possibly geo-tagged JSON objects is still unavailable,we propose a truly declarative language,named J-CO-QL,that is based on a well-defined execution model.A plug-in for a GIS permits to visualize geo-tagged data sets stored in a NoSQL database such as MongoDB;furthermore,the same plug-in can be used to write and execute J-CO-QL queries on those databases.The paper introduces the language by exemplifying its operators within a real study case,the aim of which is to understand the mobility of people in the neighborhood of Bergamo city.Cross-analysis of data about transportation networks and VGI from travelers is performed,by means of J-CO-QL language,capable to manipulate and transform,combine and join possibly geo-tagged JSON objects,in order to produce new possibly geo-tagged JSON objects satisfying users’needs.
基金supported by China Scholarship Council,Tianjin Science and Technology Committee(No.12JCZDJC20800)Science and Technology Planning Project of Tianjin(No.13ZCZDGX01098)+2 种基金NSF TRUST(The Team for Research in Ubiquitous Secure Technology)Science and Technology Center(No.CCF-0424422)National High Technology Research and Development Program of Chia(863Program)(No.2013BAH01B05)National Natural Science Foundation of China(No.61402264)
文摘Logic flaws within web applications will allow malicious operations to be triggered towards back-end database. Existing approaches to identifying logic flaws of database accesses are strongly tied to structured query language (SQL) statement construction and cannot be applied to the new generation of web applications that use not only structured query language (NoSQL) databases as the storage tier. In this paper, we present Lom, a black-box approach for discovering many categories of logic flaws within MongoDB- based web applications. Our approach introduces a MongoDB operation model to support new features of MongoDB and models the application logic as a mealy finite state machine. During the testing phase, test inputs which emulate state violation attacks are constructed for identifying logic flaws at each application state. We apply Lom to several MongoDB-based web applications and demonstrate its effectiveness.