An approximate approach of querying between heterogeneous ontology-basedinformation systems based on an association matrix is proposed. First, the association matrix isdefined to describe relations between concepts in...An approximate approach of querying between heterogeneous ontology-basedinformation systems based on an association matrix is proposed. First, the association matrix isdefined to describe relations between concepts in two ontologies. Then, a methodof rewriting queriesbased on the association matrix is presented to solve the ontology heterogeneity problem. Itrewrites the queries in one ontology to approximate queries in another ontology based on thesubsumption relations between concepts. The method also uses vectors to represent queries, and thencomputes the vectors with the association matrix; the disjoint relations between concepts can beconsidered by the results. It can get better approximations than the methods currently in use, whichdonot consider disjoint relations. The method can be processed by machines automatically. It issimple to implement and expected to run quite fast.展开更多
Querying XML data is a computationally expensive process due to the complex nature of both the XML data and the XML queries. In this paper we propose an approach to expedite XML query processing by caching the results...Querying XML data is a computationally expensive process due to the complex nature of both the XML data and the XML queries. In this paper we propose an approach to expedite XML query processing by caching the results of frequent queries. We discover frequent query patterns from user-issued queries using an efficient bottom-up mining approach called VBUXMiner. VBUXMiner consists of two main steps. First, all queries are merged into a summary structure named "compressed global tree guide" (CGTG). Second, a bottom-up traversal scheme based on the CGTG is employed to generate frequent query patterns. We use the frequent query patterns in a cache mechanism to improve the XML query performance. Experimental results show that our proposed mining approach outperforms the previous mining algorithms for XML queries, such as XQPMinerTID and FastXMiner, and that by caching the results of frequent query patterns, XML query performance can be dramatically improved.展开更多
Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Pee...Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Peer(P2P) environment was proposed to achieve both low processing cost in terms of the number of peers accessed and search messages and balanced query loads among peers. The system is based on a balanced tree structured P2P network. By partitioning the query space intelligently, the amount of query forwarding is effectively controlled, and the number of peers involved and search messages are also limited. Dynamic load balancing can be achieved during space partitioning and query resolving. Extensive experiments confirm the effectiveness and scalability of our algorithms on P2P networks.展开更多
Description logics (DLs) play an important role in representing and reasoning domain knowledge. Conjunctive queries stemmed from the domain of relational databases, and have attracted more attentions in semantic Web...Description logics (DLs) play an important role in representing and reasoning domain knowledge. Conjunctive queries stemmed from the domain of relational databases, and have attracted more attentions in semantic Web recently. To acquire a tractable DL for query answering, DL-Lite is proposed. Due to the large amount of imprecision and uncertainty in the real world, it is essential to extend DLs to deal with these vague and imprecise information. We thus propose a new fuzzy DL f-DLR-Lite.n, which allows for the presence of n-ary relations and the occurrence of concept conjunction on the left land of inclusion axioms. We also suggest an improved fuzzy query language, which supports the presence of thresholds and user defined weights. We also show that the query answering algorithm over the extended DL is still FOL reducible and shows polynomial data complexity. DL f-DLR-Lite,n can make up for the disadvantages of knowledge representation and reasoning of classic DLs, and the enhanced query language expresses user intentions more precisely and reasonably.展开更多
For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and r...For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.展开更多
This study examined users' querying behaviors based on a sample of 30 Chinese college students from Peking University. The authors designed 5 search tasks and each participant conducted two randomly selected searc...This study examined users' querying behaviors based on a sample of 30 Chinese college students from Peking University. The authors designed 5 search tasks and each participant conducted two randomly selected search tasks during the experiment. The results show that when searching for pre-designed search tasks, users often have relatively clear goals and strategies before searching. When formulating their queries, users often select words from tasks, use concrete concepts directly, or extract 'central words' or keywords. When reformulating queries, seven query reformulation types were identified from users' behaviors, i.e. broadening, narrowing, issuing new query, paralleling, changing search tools, reformulating syntax terms, and clicking on suggested queries. The results reveal that the search results and/or the contexts can also influence users' querying behaviors.展开更多
Online social networks(OSNs)offer people the opportunity to join communities where they share a common interest or objective.This kind of community is useful for studying the human behavior,diffusion of information,an...Online social networks(OSNs)offer people the opportunity to join communities where they share a common interest or objective.This kind of community is useful for studying the human behavior,diffusion of information,and dynamics of groups.As the members of a community are always changing,an efficient solution is needed to query information in real time.This paper introduces the Follow Model to present the basic relationship between users in OSNs,and combines it with the MapReduce solution to develop new algorithms with parallel paradigms for querying.Two models for reverse relation and high-order relation of the users were implemented in the Hadoop system.Based on 75 GB message data and 26 GB relation network data from Twitter,a case study was realized using two dynamic discussion communities:#musicmonday and#beatcancer.The querying performance demonstrates that the new solution with the implementation in Hadoop significantly improves the ability to find useful information from OSNs.展开更多
Big data introduces challenges to query answering, from theory to practice. A number of questions arise. What queries are "tractable" on big data? How can we make big data "small" so that it is feasible to find e...Big data introduces challenges to query answering, from theory to practice. A number of questions arise. What queries are "tractable" on big data? How can we make big data "small" so that it is feasible to find exact query answers?When exact answers are beyond reach in practice, what approximation theory can help us strike a balance between the quality of approximate query answers and the costs of computing such answers? To get sensible query answers in big data,what else do we necessarily do in addition to coping with the size of the data? This position paper aims to provide an overview of recent advances in the study of querying big data. We propose approaches to tackling these challenging issues,and identify open problems for future research.展开更多
This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exist...This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exists a fraction DQ of D such that Q(D)=Q(DQ),and the cost of identifying DQ is independent of the size of D.It has been shown that with an auxiliary structure known as access schema,many queries in relational algebra(RA)are boundedly evaluable under the set semantics of RA.This paper extends the theory of bounded evaluation to RAaggr,i.e.,RA extended with aggregation,under the bag semantics.(1)We extend access schema to bag access schema,to help us identify DQ for RAaggr queries Q.(2)While it is undecidable to determine whether an RAaggr query is boundedly evaluable under a bag access schema,we identify special cases that are decidable and practical.(3)In addition,we develop an effective syntax for bounded RAaggr queries,i.e.,a core subclass of boundedly evaluable RAaggr queries without sacrificing their expressive power.(4)Based on the effective syntax,we provide efficient algorithms to check the bounded evaluability of RAaggr queries and to generate query plans for bounded RAaggr queries.(5)As proof of concept,we extend PostgreSQL to support bounded evaluation.We experimentally verify that the extended system improves performance by orders of magnitude.展开更多
With its untameable and traceable properties,blockchain technology has been widely used in the field of data sharing.How to preserve individual privacy while enabling efficient data queries is one of the primary issue...With its untameable and traceable properties,blockchain technology has been widely used in the field of data sharing.How to preserve individual privacy while enabling efficient data queries is one of the primary issues with secure data sharing.In this paper,we study verifiable keyword frequency(KF)queries with local differential privacy in blockchain.Both the numerical and the keyword attributes are present in data objects;the latter are sensitive and require privacy protection.However,prior studies in blockchain have the problem of trilemma in privacy protection and are unable to handle KF queries.We propose an efficient framework that protects data owners’privacy on keyword attributes while enabling quick and verifiable query processing for KF queries.The framework computes an estimate of a keyword’s frequency and is efficient in query time and verification object(VO)size.A utility-optimized local differential privacy technique is used for privacy protection.The data owner adds noise locally into data based on local differential privacy so that the attacker cannot infer the owner of the keywords while keeping the difference in the probability distribution of the KF within the privacy budget.We propose the VB-cm tree as the authenticated data structure(ADS).The VB-cm tree combines the Verkle tree and the Count-Min sketch(CM-sketch)to lower the VO size and query time.The VB-cm tree uses the vector commitment to verify the query results.The fixed-size CM-sketch,which summarizes the frequency of multiple keywords,is used to estimate the KF via hashing operations.We conduct an extensive evaluation of the proposed framework.The experimental results show that compared to theMerkle B+tree,the query time is reduced by 52.38%,and the VO size is reduced by more than one order of magnitude.展开更多
In a cloud environment,outsourced graph data is widely used in companies,enterprises,medical institutions,and so on.Data owners and users can save costs and improve efficiency by storing large amounts of graph data on...In a cloud environment,outsourced graph data is widely used in companies,enterprises,medical institutions,and so on.Data owners and users can save costs and improve efficiency by storing large amounts of graph data on cloud servers.Servers on cloud platforms usually have some subjective or objective attacks,which make the outsourced graph data in an insecure state.The issue of privacy data protection has become an important obstacle to data sharing and usage.How to query outsourcing graph data safely and effectively has become the focus of research.Adjacency query is a basic and frequently used operation in graph,and it will effectively promote the query range and query ability if multi-keyword fuzzy search can be supported at the same time.This work proposes to protect the privacy information of outsourcing graph data by encryption,mainly studies the problem of multi-keyword fuzzy adjacency query,and puts forward a solution.In our scheme,we use the Bloom filter and encryption mechanism to build a secure index and query token,and adjacency queries are implemented through indexes and query tokens on the cloud server.Our proposed scheme is proved by formal analysis,and the performance and effectiveness of the scheme are illustrated by experimental analysis.The research results of this work will provide solid theoretical and technical support for the further popularization and application of encrypted graph data processing technology.展开更多
This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed...This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed herein utilizes the fusion of diversified feature formats,specifically,metadata,textual,and pattern features.The goal is to enhance the system’s ability to discern and generalize transformation rules fromsource to destination formats in varied contexts.Firstly,the article delves into the methodology for extracting these distinct features from raw data and the pre-processing steps undertaken to prepare the data for the model.Subsequent sections expound on the mechanism of feature optimization using Recursive Feature Elimination(RFE)with linear regression,aiming to retain the most contributive features and eliminate redundant or less significant ones.The core of the research revolves around the deployment of the XGBoostmodel for training,using the prepared and optimized feature sets.The article presents a detailed overview of the mathematical model and algorithmic steps behind this procedure.Finally,the process of rule discovery(prediction phase)by the trained XGBoost model is explained,underscoring its role in real-time,automated data transformations.By employingmachine learning and particularly,the XGBoost model in the context of Business Rule Engine(BRE)data transformation,the article underscores a paradigm shift towardsmore scalable,efficient,and less human-dependent data transformation systems.This research opens doors for further exploration into automated rule discovery systems and their applications in various sectors.展开更多
To solve the low efficiency of approximate queries caused by the large sizes of the knowledge graphs in the real world,an embedding-based approximate query method is proposed.First,the nodes in the query graph are cla...To solve the low efficiency of approximate queries caused by the large sizes of the knowledge graphs in the real world,an embedding-based approximate query method is proposed.First,the nodes in the query graph are classified according to the degrees of approximation required for different types of nodes.This classification transforms the query problem into three constraints,from which approximate information is extracted.Second,candidates are generated by calculating the similarity between embeddings.Finally,a deep neural network model is designed,incorporating a loss function based on the high-dimensional ellipsoidal diffusion distance.This model identifies the distance between nodes using their embeddings and constructs a score function.k nodes are returned as the query results.The results show that the proposed method can return both exact results and approximate matching results.On datasets DBLP(DataBase systems and Logic Programming)and FUA-S(Flight USA Airports-Sparse),this method exhibits superior performance in terms of precision and recall,returning results in 0.10 and 0.03 s,respectively.This indicates greater efficiency compared to PathSim and other comparative methods.展开更多
This research aims to enhance Clinical Decision Support Systems(CDSS)within Wireless Body Area Networks(WBANs)by leveraging advanced machine learning techniques.Specifically,we target the challenges of accurate diagno...This research aims to enhance Clinical Decision Support Systems(CDSS)within Wireless Body Area Networks(WBANs)by leveraging advanced machine learning techniques.Specifically,we target the challenges of accurate diagnosis in medical imaging and sequential data analysis using Recurrent Neural Networks(RNNs)with Long Short-Term Memory(LSTM)layers and echo state cells.These models are tailored to improve diagnostic precision,particularly for conditions like rotator cuff tears in osteoporosis patients and gastrointestinal diseases.Traditional diagnostic methods and existing CDSS frameworks often fall short in managing complex,sequential medical data,struggling with long-term dependencies and data imbalances,resulting in suboptimal accuracy and delayed decisions.Our goal is to develop Artificial Intelligence(AI)models that address these shortcomings,offering robust,real-time diagnostic support.We propose a hybrid RNN model that integrates SimpleRNN,LSTM layers,and echo state cells to manage long-term dependencies effectively.Additionally,we introduce CG-Net,a novel Convolutional Neural Network(CNN)framework for gastrointestinal disease classification,which outperforms traditional CNN models.We further enhance model performance through data augmentation and transfer learning,improving generalization and robustness against data scarcity and imbalance.Comprehensive validation,including 5-fold cross-validation and metrics such as accuracy,precision,recall,F1-score,and Area Under the Curve(AUC),confirms the models’reliability.Moreover,SHapley Additive exPlanations(SHAP)and Local Interpretable Model-agnostic Explanations(LIME)are employed to improve model interpretability.Our findings show that the proposed models significantly enhance diagnostic accuracy and efficiency,offering substantial advancements in WBANs and CDSS.展开更多
A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various form...A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various forms of semi-structured,structured,and unstructured information.These systems use a flat architecture and run different types of data analytics.NoSQL databases are nontabular and store data in a different manner than the relational table.NoSQL databases come in various forms,including key-value pairs,documents,wide columns,and graphs,each based on its data model.They offer simpler scalability and generally outperform traditional relational databases.While NoSQL databases can store diverse data types,they lack full support for atomicity,consistency,isolation,and durability features found in relational databases.Consequently,employing machine learning approaches becomes necessary to categorize complex structured query language(SQL)queries.Results indicate that the most frequently used automatic classification technique in processing SQL queries on NoSQL databases is machine learning-based classification.Overall,this study provides an overview of the automatic classification techniques used in processing SQL queries on NoSQL databases.Understanding these techniques can aid in the development of effective and efficient NoSQL database applications.展开更多
The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimizati...The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimization is already an NP-hard problem.Learned query optimizers(mainly in the single-node DBMS)receive attention due to its capability to capture data distributions and flexible ways to avoid hard-craft rules in refinement and adaptation to new hardware.In this paper,we focus on extensions of learned query optimizers to distributed DBMSs.Specifically,we propose one possible but general architecture of the learned query optimizer in the distributed context and highlight differences from the learned optimizer in the single-node ones.In addition,we discuss the challenges and possible solutions.展开更多
With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enha...With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enhance database query systems, enabling more intuitive and semantic query mechanisms. Our model leverages LLM’s deep learning architecture to interpret and process natural language queries and translate them into accurate database queries. The system integrates an LLM-powered semantic parser that translates user input into structured queries that can be understood by the database management system. First, the user query is pre-processed, the text is normalized, and the ambiguity is removed. This is followed by semantic parsing, where the LLM interprets the pre-processed text and identifies key entities and relationships. This is followed by query generation, which converts the parsed information into a structured query format and tailors it to the target database schema. Finally, there is query execution and feedback, where the resulting query is executed on the database and the results are returned to the user. The system also provides feedback mechanisms to improve and optimize future query interpretations. By using advanced LLMs for model implementation and fine-tuning on diverse datasets, the experimental results show that the proposed method significantly improves the accuracy and usability of database queries, making data retrieval easy for users without specialized knowledge.展开更多
文摘An approximate approach of querying between heterogeneous ontology-basedinformation systems based on an association matrix is proposed. First, the association matrix isdefined to describe relations between concepts in two ontologies. Then, a methodof rewriting queriesbased on the association matrix is presented to solve the ontology heterogeneity problem. Itrewrites the queries in one ontology to approximate queries in another ontology based on thesubsumption relations between concepts. The method also uses vectors to represent queries, and thencomputes the vectors with the association matrix; the disjoint relations between concepts can beconsidered by the results. It can get better approximations than the methods currently in use, whichdonot consider disjoint relations. The method can be processed by machines automatically. It issimple to implement and expected to run quite fast.
基金the National Natural Science Foundation of China (No. 60603044)the National Key Technologies Supporting Program of China during the 11th Five-Year Plan Period (No. 2006BAH02A03)the Program for Changjiang Scholars and Innovative Research Team in University of China (No. IRT0652)
文摘Querying XML data is a computationally expensive process due to the complex nature of both the XML data and the XML queries. In this paper we propose an approach to expedite XML query processing by caching the results of frequent queries. We discover frequent query patterns from user-issued queries using an efficient bottom-up mining approach called VBUXMiner. VBUXMiner consists of two main steps. First, all queries are merged into a summary structure named "compressed global tree guide" (CGTG). Second, a bottom-up traversal scheme based on the CGTG is employed to generate frequent query patterns. We use the frequent query patterns in a cache mechanism to improve the XML query performance. Experimental results show that our proposed mining approach outperforms the previous mining algorithms for XML queries, such as XQPMinerTID and FastXMiner, and that by caching the results of frequent query patterns, XML query performance can be dramatically improved.
基金Supported by the Natural Science Foundation ofJiangsu Province(BG2004034)
文摘Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Peer(P2P) environment was proposed to achieve both low processing cost in terms of the number of peers accessed and search messages and balanced query loads among peers. The system is based on a balanced tree structured P2P network. By partitioning the query space intelligently, the amount of query forwarding is effectively controlled, and the number of peers involved and search messages are also limited. Dynamic load balancing can be achieved during space partitioning and query resolving. Extensive experiments confirm the effectiveness and scalability of our algorithms on P2P networks.
基金the Program for New Century Excellent Talents in University (NCET-05-0288)the Specialized Research Fund for the Doctoral Program of Higher Education of China (20050145024)
文摘Description logics (DLs) play an important role in representing and reasoning domain knowledge. Conjunctive queries stemmed from the domain of relational databases, and have attracted more attentions in semantic Web recently. To acquire a tractable DL for query answering, DL-Lite is proposed. Due to the large amount of imprecision and uncertainty in the real world, it is essential to extend DLs to deal with these vague and imprecise information. We thus propose a new fuzzy DL f-DLR-Lite.n, which allows for the presence of n-ary relations and the occurrence of concept conjunction on the left land of inclusion axioms. We also suggest an improved fuzzy query language, which supports the presence of thresholds and user defined weights. We also show that the query answering algorithm over the extended DL is still FOL reducible and shows polynomial data complexity. DL f-DLR-Lite,n can make up for the disadvantages of knowledge representation and reasoning of classic DLs, and the enhanced query language expresses user intentions more precisely and reasonably.
文摘For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.
基金partially supported by China Scholarship Council(Grant No.:2009601175)
文摘This study examined users' querying behaviors based on a sample of 30 Chinese college students from Peking University. The authors designed 5 search tasks and each participant conducted two randomly selected search tasks during the experiment. The results show that when searching for pre-designed search tasks, users often have relatively clear goals and strategies before searching. When formulating their queries, users often select words from tasks, use concrete concepts directly, or extract 'central words' or keywords. When reformulating queries, seven query reformulation types were identified from users' behaviors, i.e. broadening, narrowing, issuing new query, paralleling, changing search tools, reformulating syntax terms, and clicking on suggested queries. The results reveal that the search results and/or the contexts can also influence users' querying behaviors.
基金Project supported by the Brazilian National Council for Scientific and Technological Development(CNPq)(No.304058/2010-6)
文摘Online social networks(OSNs)offer people the opportunity to join communities where they share a common interest or objective.This kind of community is useful for studying the human behavior,diffusion of information,and dynamics of groups.As the members of a community are always changing,an efficient solution is needed to query information in real time.This paper introduces the Follow Model to present the basic relationship between users in OSNs,and combines it with the MapReduce solution to develop new algorithms with parallel paradigms for querying.Two models for reverse relation and high-order relation of the users were implemented in the Hadoop system.Based on 75 GB message data and 26 GB relation network data from Twitter,a case study was realized using two dynamic discussion communities:#musicmonday and#beatcancer.The querying performance demonstrates that the new solution with the implementation in Hadoop significantly improves the ability to find useful information from OSNs.
基金supported in part by the National Basic Research 973 Program of China under Grant No.2014CB340302Fan is also supported in part by the National Natural Science Foundation of China under Grant No.61133002+3 种基金the Guangdong Innovative Research Team Program under Grant No.2011D005Shenzhen Peacock Program under Grant No.1105100030834361the Engineering and Physical Sciences Research Council of UK under Grant No.EP/J015377/1the National Science Foundation of USA under Grant No.III-1302212
文摘Big data introduces challenges to query answering, from theory to practice. A number of questions arise. What queries are "tractable" on big data? How can we make big data "small" so that it is feasible to find exact query answers?When exact answers are beyond reach in practice, what approximation theory can help us strike a balance between the quality of approximate query answers and the costs of computing such answers? To get sensible query answers in big data,what else do we necessarily do in addition to coping with the size of the data? This position paper aims to provide an overview of recent advances in the study of querying big data. We propose approaches to tackling these challenging issues,and identify open problems for future research.
基金supported in part by Royal Society YVolfson Research Merit Award WRM/R1/180014,ERC 652976,EPSRC EP/M025268/1,Shenzhen Institute of Computing Sciences,and Beijing Advanced Innovation Center for Big Data and Brain Computing.
文摘This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exists a fraction DQ of D such that Q(D)=Q(DQ),and the cost of identifying DQ is independent of the size of D.It has been shown that with an auxiliary structure known as access schema,many queries in relational algebra(RA)are boundedly evaluable under the set semantics of RA.This paper extends the theory of bounded evaluation to RAaggr,i.e.,RA extended with aggregation,under the bag semantics.(1)We extend access schema to bag access schema,to help us identify DQ for RAaggr queries Q.(2)While it is undecidable to determine whether an RAaggr query is boundedly evaluable under a bag access schema,we identify special cases that are decidable and practical.(3)In addition,we develop an effective syntax for bounded RAaggr queries,i.e.,a core subclass of boundedly evaluable RAaggr queries without sacrificing their expressive power.(4)Based on the effective syntax,we provide efficient algorithms to check the bounded evaluability of RAaggr queries and to generate query plans for bounded RAaggr queries.(5)As proof of concept,we extend PostgreSQL to support bounded evaluation.We experimentally verify that the extended system improves performance by orders of magnitude.
文摘With its untameable and traceable properties,blockchain technology has been widely used in the field of data sharing.How to preserve individual privacy while enabling efficient data queries is one of the primary issues with secure data sharing.In this paper,we study verifiable keyword frequency(KF)queries with local differential privacy in blockchain.Both the numerical and the keyword attributes are present in data objects;the latter are sensitive and require privacy protection.However,prior studies in blockchain have the problem of trilemma in privacy protection and are unable to handle KF queries.We propose an efficient framework that protects data owners’privacy on keyword attributes while enabling quick and verifiable query processing for KF queries.The framework computes an estimate of a keyword’s frequency and is efficient in query time and verification object(VO)size.A utility-optimized local differential privacy technique is used for privacy protection.The data owner adds noise locally into data based on local differential privacy so that the attacker cannot infer the owner of the keywords while keeping the difference in the probability distribution of the KF within the privacy budget.We propose the VB-cm tree as the authenticated data structure(ADS).The VB-cm tree combines the Verkle tree and the Count-Min sketch(CM-sketch)to lower the VO size and query time.The VB-cm tree uses the vector commitment to verify the query results.The fixed-size CM-sketch,which summarizes the frequency of multiple keywords,is used to estimate the KF via hashing operations.We conduct an extensive evaluation of the proposed framework.The experimental results show that compared to theMerkle B+tree,the query time is reduced by 52.38%,and the VO size is reduced by more than one order of magnitude.
基金This research was supported in part by the Nature Science Foundation of China(Nos.62262033,61962029,61762055,62062045 and 62362042)the Jiangxi Provincial Natural Science Foundation of China(Nos.20224BAB202012,20202ACBL202005 and 20202BAB212006)+3 种基金the Science and Technology Research Project of Jiangxi Education Department(Nos.GJJ211815,GJJ2201914 and GJJ201832)the Hubei Natural Science Foundation Innovation and Development Joint Fund Project(No.2022CFD101)Xiangyang High-Tech Key Science and Technology Plan Project(No.2022ABH006848)Hubei Superior and Distinctive Discipline Group of“New Energy Vehicle and Smart Transportation”,the Project of Zhejiang Institute of Mechanical&Electrical Engineering,and the Jiangxi Provincial Social Science Foundation of China(No.23GL52D).
文摘In a cloud environment,outsourced graph data is widely used in companies,enterprises,medical institutions,and so on.Data owners and users can save costs and improve efficiency by storing large amounts of graph data on cloud servers.Servers on cloud platforms usually have some subjective or objective attacks,which make the outsourced graph data in an insecure state.The issue of privacy data protection has become an important obstacle to data sharing and usage.How to query outsourcing graph data safely and effectively has become the focus of research.Adjacency query is a basic and frequently used operation in graph,and it will effectively promote the query range and query ability if multi-keyword fuzzy search can be supported at the same time.This work proposes to protect the privacy information of outsourcing graph data by encryption,mainly studies the problem of multi-keyword fuzzy adjacency query,and puts forward a solution.In our scheme,we use the Bloom filter and encryption mechanism to build a secure index and query token,and adjacency queries are implemented through indexes and query tokens on the cloud server.Our proposed scheme is proved by formal analysis,and the performance and effectiveness of the scheme are illustrated by experimental analysis.The research results of this work will provide solid theoretical and technical support for the further popularization and application of encrypted graph data processing technology.
文摘This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed herein utilizes the fusion of diversified feature formats,specifically,metadata,textual,and pattern features.The goal is to enhance the system’s ability to discern and generalize transformation rules fromsource to destination formats in varied contexts.Firstly,the article delves into the methodology for extracting these distinct features from raw data and the pre-processing steps undertaken to prepare the data for the model.Subsequent sections expound on the mechanism of feature optimization using Recursive Feature Elimination(RFE)with linear regression,aiming to retain the most contributive features and eliminate redundant or less significant ones.The core of the research revolves around the deployment of the XGBoostmodel for training,using the prepared and optimized feature sets.The article presents a detailed overview of the mathematical model and algorithmic steps behind this procedure.Finally,the process of rule discovery(prediction phase)by the trained XGBoost model is explained,underscoring its role in real-time,automated data transformations.By employingmachine learning and particularly,the XGBoost model in the context of Business Rule Engine(BRE)data transformation,the article underscores a paradigm shift towardsmore scalable,efficient,and less human-dependent data transformation systems.This research opens doors for further exploration into automated rule discovery systems and their applications in various sectors.
基金The State Grid Technology Project(No.5108202340042A-1-1-ZN).
文摘To solve the low efficiency of approximate queries caused by the large sizes of the knowledge graphs in the real world,an embedding-based approximate query method is proposed.First,the nodes in the query graph are classified according to the degrees of approximation required for different types of nodes.This classification transforms the query problem into three constraints,from which approximate information is extracted.Second,candidates are generated by calculating the similarity between embeddings.Finally,a deep neural network model is designed,incorporating a loss function based on the high-dimensional ellipsoidal diffusion distance.This model identifies the distance between nodes using their embeddings and constructs a score function.k nodes are returned as the query results.The results show that the proposed method can return both exact results and approximate matching results.On datasets DBLP(DataBase systems and Logic Programming)and FUA-S(Flight USA Airports-Sparse),this method exhibits superior performance in terms of precision and recall,returning results in 0.10 and 0.03 s,respectively.This indicates greater efficiency compared to PathSim and other comparative methods.
基金supported by the“Human Resources Program in Energy Technology”of the Korea Institute of Energy Technology Evaluation and Planning(KETEP)and granted financial resources from the Ministry of Trade,Industry,and Energy,Korea(No.20204010600090).
文摘This research aims to enhance Clinical Decision Support Systems(CDSS)within Wireless Body Area Networks(WBANs)by leveraging advanced machine learning techniques.Specifically,we target the challenges of accurate diagnosis in medical imaging and sequential data analysis using Recurrent Neural Networks(RNNs)with Long Short-Term Memory(LSTM)layers and echo state cells.These models are tailored to improve diagnostic precision,particularly for conditions like rotator cuff tears in osteoporosis patients and gastrointestinal diseases.Traditional diagnostic methods and existing CDSS frameworks often fall short in managing complex,sequential medical data,struggling with long-term dependencies and data imbalances,resulting in suboptimal accuracy and delayed decisions.Our goal is to develop Artificial Intelligence(AI)models that address these shortcomings,offering robust,real-time diagnostic support.We propose a hybrid RNN model that integrates SimpleRNN,LSTM layers,and echo state cells to manage long-term dependencies effectively.Additionally,we introduce CG-Net,a novel Convolutional Neural Network(CNN)framework for gastrointestinal disease classification,which outperforms traditional CNN models.We further enhance model performance through data augmentation and transfer learning,improving generalization and robustness against data scarcity and imbalance.Comprehensive validation,including 5-fold cross-validation and metrics such as accuracy,precision,recall,F1-score,and Area Under the Curve(AUC),confirms the models’reliability.Moreover,SHapley Additive exPlanations(SHAP)and Local Interpretable Model-agnostic Explanations(LIME)are employed to improve model interpretability.Our findings show that the proposed models significantly enhance diagnostic accuracy and efficiency,offering substantial advancements in WBANs and CDSS.
基金supported by the Student Scheme provided by Universiti Kebangsaan Malaysia with the Code TAP-20558.
文摘A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various forms of semi-structured,structured,and unstructured information.These systems use a flat architecture and run different types of data analytics.NoSQL databases are nontabular and store data in a different manner than the relational table.NoSQL databases come in various forms,including key-value pairs,documents,wide columns,and graphs,each based on its data model.They offer simpler scalability and generally outperform traditional relational databases.While NoSQL databases can store diverse data types,they lack full support for atomicity,consistency,isolation,and durability features found in relational databases.Consequently,employing machine learning approaches becomes necessary to categorize complex structured query language(SQL)queries.Results indicate that the most frequently used automatic classification technique in processing SQL queries on NoSQL databases is machine learning-based classification.Overall,this study provides an overview of the automatic classification techniques used in processing SQL queries on NoSQL databases.Understanding these techniques can aid in the development of effective and efficient NoSQL database applications.
基金partially supported by NSFC under Grant Nos.61832001 and 62272008ZTE Industry-University-Institute Fund Project。
文摘The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimization is already an NP-hard problem.Learned query optimizers(mainly in the single-node DBMS)receive attention due to its capability to capture data distributions and flexible ways to avoid hard-craft rules in refinement and adaptation to new hardware.In this paper,we focus on extensions of learned query optimizers to distributed DBMSs.Specifically,we propose one possible but general architecture of the learned query optimizer in the distributed context and highlight differences from the learned optimizer in the single-node ones.In addition,we discuss the challenges and possible solutions.
文摘With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enhance database query systems, enabling more intuitive and semantic query mechanisms. Our model leverages LLM’s deep learning architecture to interpret and process natural language queries and translate them into accurate database queries. The system integrates an LLM-powered semantic parser that translates user input into structured queries that can be understood by the database management system. First, the user query is pre-processed, the text is normalized, and the ambiguity is removed. This is followed by semantic parsing, where the LLM interprets the pre-processed text and identifies key entities and relationships. This is followed by query generation, which converts the parsed information into a structured query format and tailors it to the target database schema. Finally, there is query execution and feedback, where the resulting query is executed on the database and the results are returned to the user. The system also provides feedback mechanisms to improve and optimize future query interpretations. By using advanced LLMs for model implementation and fine-tuning on diverse datasets, the experimental results show that the proposed method significantly improves the accuracy and usability of database queries, making data retrieval easy for users without specialized knowledge.