This paper proposes linear and nonlinear filters for a non-Gaussian dynamic system with an unknown nominal covariance of the output noise.The challenge of designing a suitable filter in the presence of an unknown cova...This paper proposes linear and nonlinear filters for a non-Gaussian dynamic system with an unknown nominal covariance of the output noise.The challenge of designing a suitable filter in the presence of an unknown covariance matrix is addressed by focusing on the output data set of the system.Considering that data generated from a Gaussian distribution exhibit ellipsoidal scattering,we first propose the weighted sum of norms(SON)clustering method that prioritizes nearby points,reduces distant point influence,and lowers computational cost.Then,by introducing the weighted maximum likelihood,we propose a semi-definite program(SDP)to detect outliers and reduce their impacts on each cluster.Detecting these weights paves the way to obtain an appropriate covariance of the output noise.Next,two filtering approaches are presented:a cluster-based robust linear filter using the maximum a posterior(MAP)estimation and a clusterbased robust nonlinear filter assuming that output noise distribution stems from some Gaussian noise resources according to the ellipsoidal clusters.At last,simulation results demonstrate the effectiveness of our proposed filtering approaches.展开更多
In the globalized market environment, increasingly significant economic and environmental factors withincomplex industrial plants impose importance on the optimization of global production indices; such opti-mization ...In the globalized market environment, increasingly significant economic and environmental factors withincomplex industrial plants impose importance on the optimization of global production indices; such opti-mization includes improvements in production efficiency, product quality, and yield, along with reductionsof energy and resource usage. This paper briefly overviews recent progress in data-driven hybrid intelli-gence optimization methods and technologies in improving the performance of global production indicesin mineral processing. First, we provide the problem description. Next, we summarize recent progress indata-based optimization for mineral processing plants. This optimization consists of four layers: optimiza-tion of the target values for monthly global production indices, optimization of the target values for dailyglobal production indices, optimization of the target values for operational indices, and automation systemsfor unit processes. We briefly overview recent progress in each of the different layers. Finally, we point outopportunities for future works in data-based optimization for mineral processing plants.展开更多
In this paper,a data-based scheme is proposed to solve the optimal tracking problem of autonomous nonlinear switching systems.The system state is forced to track the reference signal by minimizing the performance func...In this paper,a data-based scheme is proposed to solve the optimal tracking problem of autonomous nonlinear switching systems.The system state is forced to track the reference signal by minimizing the performance function.First,the problem is transformed to solve the corresponding Bellman optimality equation in terms of the Q-function(also named as action value function).Then,an iterative algorithm based on adaptive dynamic programming(ADP)is developed to find the optimal solution which is totally based on sampled data.The linear-in-parameter(LIP)neural network is taken as the value function approximator.Considering the presence of approximation error at each iteration step,the generated approximated value function sequence is proved to be boundedness around the exact optimal solution under some verifiable assumptions.Moreover,the effect that the learning process will be terminated after a finite number of iterations is investigated in this paper.A sufficient condition for asymptotically stability of the tracking error is derived.Finally,the effectiveness of the algorithm is demonstrated with three simulation examples.展开更多
In this paper, a data-based fault tolerant control(FTC) scheme is investigated for unknown continuous-time(CT)affine nonlinear systems with actuator faults. First, a neural network(NN) identifier based on particle swa...In this paper, a data-based fault tolerant control(FTC) scheme is investigated for unknown continuous-time(CT)affine nonlinear systems with actuator faults. First, a neural network(NN) identifier based on particle swarm optimization(PSO) is constructed to model the unknown system dynamics. By utilizing the estimated system states, the particle swarm optimized critic neural network(PSOCNN) is employed to solve the Hamilton-Jacobi-Bellman equation(HJBE) more efficiently.Then, a data-based FTC scheme, which consists of the NN identifier and the fault compensator, is proposed to achieve actuator fault tolerance. The stability of the closed-loop system under actuator faults is guaranteed by the Lyapunov stability theorem. Finally, simulations are provided to demonstrate the effectiveness of the developed method.展开更多
For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and r...For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.展开更多
Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from quer...Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.展开更多
This paper focuses on developing a system that allows presentation authors to effectively retrieve presentation slides for reuse from a large volume of existing presentation materials. We assume episodic memories of t...This paper focuses on developing a system that allows presentation authors to effectively retrieve presentation slides for reuse from a large volume of existing presentation materials. We assume episodic memories of the authors can be used as contextual keywords in query expressions to efficiently dig out the expected slides for reuse rather than using only the part-of-slide-descriptions-based keyword queries. As a system, a new slide repository is proposed, composed of slide material collections, slide content data and pieces of information from authors' episodic memories related to each slide and presentation together with a slide retrieval application enabling authors to use the episodic memories as part of queries. The result of our experiment shows that the episodic memory-used queries can give more discoverability than the keyword-based queries. Additionally, an improvement model is discussed on the slide retrieval for further slide-finding efficiency by expanding the episodic memories model in the repository taking in the links with the author-and-slide-related data and events having been post on the private and social media sites.展开更多
With its untameable and traceable properties,blockchain technology has been widely used in the field of data sharing.How to preserve individual privacy while enabling efficient data queries is one of the primary issue...With its untameable and traceable properties,blockchain technology has been widely used in the field of data sharing.How to preserve individual privacy while enabling efficient data queries is one of the primary issues with secure data sharing.In this paper,we study verifiable keyword frequency(KF)queries with local differential privacy in blockchain.Both the numerical and the keyword attributes are present in data objects;the latter are sensitive and require privacy protection.However,prior studies in blockchain have the problem of trilemma in privacy protection and are unable to handle KF queries.We propose an efficient framework that protects data owners’privacy on keyword attributes while enabling quick and verifiable query processing for KF queries.The framework computes an estimate of a keyword’s frequency and is efficient in query time and verification object(VO)size.A utility-optimized local differential privacy technique is used for privacy protection.The data owner adds noise locally into data based on local differential privacy so that the attacker cannot infer the owner of the keywords while keeping the difference in the probability distribution of the KF within the privacy budget.We propose the VB-cm tree as the authenticated data structure(ADS).The VB-cm tree combines the Verkle tree and the Count-Min sketch(CM-sketch)to lower the VO size and query time.The VB-cm tree uses the vector commitment to verify the query results.The fixed-size CM-sketch,which summarizes the frequency of multiple keywords,is used to estimate the KF via hashing operations.We conduct an extensive evaluation of the proposed framework.The experimental results show that compared to theMerkle B+tree,the query time is reduced by 52.38%,and the VO size is reduced by more than one order of magnitude.展开更多
To cope with the challenges of CoViD-19,europe has adopted relevant measures of a data-based approach to governance,on which scholars have huge differences,and the related researches are conducive to further discussio...To cope with the challenges of CoViD-19,europe has adopted relevant measures of a data-based approach to governance,on which scholars have huge differences,and the related researches are conducive to further discussion on the differences.By sorting out the challenges posed by the pandemic to public security and data protection in europe,we can summarize the“european Solution”of the data-based approach to governance,including legislation,instruments,supervision,international cooperation,and continuity.The“Solution”has curbed the spread of the pandemic to a certain extent.However,due to the influence of the traditional values of the EU,the“Solution”is too idealistic in the balance between public security and data protection,which intensifies the dilemma and causes many problems,such as ambiguous legislation,inadequate effectiveness and security of instruments,an arduous endeavor in inter national cooperation,and imperfect regulations on digital green certificates.Therefore,in a major public health crisis,there is still a long way to go in exploring a balance between public security and data protection.展开更多
To solve the low efficiency of approximate queries caused by the large sizes of the knowledge graphs in the real world,an embedding-based approximate query method is proposed.First,the nodes in the query graph are cla...To solve the low efficiency of approximate queries caused by the large sizes of the knowledge graphs in the real world,an embedding-based approximate query method is proposed.First,the nodes in the query graph are classified according to the degrees of approximation required for different types of nodes.This classification transforms the query problem into three constraints,from which approximate information is extracted.Second,candidates are generated by calculating the similarity between embeddings.Finally,a deep neural network model is designed,incorporating a loss function based on the high-dimensional ellipsoidal diffusion distance.This model identifies the distance between nodes using their embeddings and constructs a score function.k nodes are returned as the query results.The results show that the proposed method can return both exact results and approximate matching results.On datasets DBLP(DataBase systems and Logic Programming)and FUA-S(Flight USA Airports-Sparse),this method exhibits superior performance in terms of precision and recall,returning results in 0.10 and 0.03 s,respectively.This indicates greater efficiency compared to PathSim and other comparative methods.展开更多
The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimizati...The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimization is already an NP-hard problem.Learned query optimizers(mainly in the single-node DBMS)receive attention due to its capability to capture data distributions and flexible ways to avoid hard-craft rules in refinement and adaptation to new hardware.In this paper,we focus on extensions of learned query optimizers to distributed DBMSs.Specifically,we propose one possible but general architecture of the learned query optimizer in the distributed context and highlight differences from the learned optimizer in the single-node ones.In addition,we discuss the challenges and possible solutions.展开更多
A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various form...A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various forms of semi-structured,structured,and unstructured information.These systems use a flat architecture and run different types of data analytics.NoSQL databases are nontabular and store data in a different manner than the relational table.NoSQL databases come in various forms,including key-value pairs,documents,wide columns,and graphs,each based on its data model.They offer simpler scalability and generally outperform traditional relational databases.While NoSQL databases can store diverse data types,they lack full support for atomicity,consistency,isolation,and durability features found in relational databases.Consequently,employing machine learning approaches becomes necessary to categorize complex structured query language(SQL)queries.Results indicate that the most frequently used automatic classification technique in processing SQL queries on NoSQL databases is machine learning-based classification.Overall,this study provides an overview of the automatic classification techniques used in processing SQL queries on NoSQL databases.Understanding these techniques can aid in the development of effective and efficient NoSQL database applications.展开更多
With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enha...With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enhance database query systems, enabling more intuitive and semantic query mechanisms. Our model leverages LLM’s deep learning architecture to interpret and process natural language queries and translate them into accurate database queries. The system integrates an LLM-powered semantic parser that translates user input into structured queries that can be understood by the database management system. First, the user query is pre-processed, the text is normalized, and the ambiguity is removed. This is followed by semantic parsing, where the LLM interprets the pre-processed text and identifies key entities and relationships. This is followed by query generation, which converts the parsed information into a structured query format and tailors it to the target database schema. Finally, there is query execution and feedback, where the resulting query is executed on the database and the results are returned to the user. The system also provides feedback mechanisms to improve and optimize future query interpretations. By using advanced LLMs for model implementation and fine-tuning on diverse datasets, the experimental results show that the proposed method significantly improves the accuracy and usability of database queries, making data retrieval easy for users without specialized knowledge.展开更多
This research aims to enhance Clinical Decision Support Systems(CDSS)within Wireless Body Area Networks(WBANs)by leveraging advanced machine learning techniques.Specifically,we target the challenges of accurate diagno...This research aims to enhance Clinical Decision Support Systems(CDSS)within Wireless Body Area Networks(WBANs)by leveraging advanced machine learning techniques.Specifically,we target the challenges of accurate diagnosis in medical imaging and sequential data analysis using Recurrent Neural Networks(RNNs)with Long Short-Term Memory(LSTM)layers and echo state cells.These models are tailored to improve diagnostic precision,particularly for conditions like rotator cuff tears in osteoporosis patients and gastrointestinal diseases.Traditional diagnostic methods and existing CDSS frameworks often fall short in managing complex,sequential medical data,struggling with long-term dependencies and data imbalances,resulting in suboptimal accuracy and delayed decisions.Our goal is to develop Artificial Intelligence(AI)models that address these shortcomings,offering robust,real-time diagnostic support.We propose a hybrid RNN model that integrates SimpleRNN,LSTM layers,and echo state cells to manage long-term dependencies effectively.Additionally,we introduce CG-Net,a novel Convolutional Neural Network(CNN)framework for gastrointestinal disease classification,which outperforms traditional CNN models.We further enhance model performance through data augmentation and transfer learning,improving generalization and robustness against data scarcity and imbalance.Comprehensive validation,including 5-fold cross-validation and metrics such as accuracy,precision,recall,F1-score,and Area Under the Curve(AUC),confirms the models’reliability.Moreover,SHapley Additive exPlanations(SHAP)and Local Interpretable Model-agnostic Explanations(LIME)are employed to improve model interpretability.Our findings show that the proposed models significantly enhance diagnostic accuracy and efficiency,offering substantial advancements in WBANs and CDSS.展开更多
To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
Aiming at the problem that only some types of SPARQL ( simple protocal and resource description framework query language) queries can be answered by using the current resource description framework link traversal ba...Aiming at the problem that only some types of SPARQL ( simple protocal and resource description framework query language) queries can be answered by using the current resource description framework link traversal based query execution (RDF-LTE) approach, this paper discusses how the execution order of the triple pattern affects the query results and cost based on concrete SPARQL queries, and analyzes two properties of the web of linked data, missing backward links and missing contingency solution. Then three heuristic principles for logic query plan optimization, namely, the filtered basic graph pattern (FBGP) principle, the triple pattern chain principle and the seed URIs principle, are proposed. The three principles contribute to decrease the intermediate solutions and increase the types of queries that can be answered. The effectiveness and feasibility of the proposed approach is evaluated. The experimental results show that more query results can be returned with less cost, thus enabling users to develop the full potential of the web of linked data.展开更多
Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and re...Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.展开更多
文摘This paper proposes linear and nonlinear filters for a non-Gaussian dynamic system with an unknown nominal covariance of the output noise.The challenge of designing a suitable filter in the presence of an unknown covariance matrix is addressed by focusing on the output data set of the system.Considering that data generated from a Gaussian distribution exhibit ellipsoidal scattering,we first propose the weighted sum of norms(SON)clustering method that prioritizes nearby points,reduces distant point influence,and lowers computational cost.Then,by introducing the weighted maximum likelihood,we propose a semi-definite program(SDP)to detect outliers and reduce their impacts on each cluster.Detecting these weights paves the way to obtain an appropriate covariance of the output noise.Next,two filtering approaches are presented:a cluster-based robust linear filter using the maximum a posterior(MAP)estimation and a clusterbased robust nonlinear filter assuming that output noise distribution stems from some Gaussian noise resources according to the ellipsoidal clusters.At last,simulation results demonstrate the effectiveness of our proposed filtering approaches.
文摘In the globalized market environment, increasingly significant economic and environmental factors withincomplex industrial plants impose importance on the optimization of global production indices; such opti-mization includes improvements in production efficiency, product quality, and yield, along with reductionsof energy and resource usage. This paper briefly overviews recent progress in data-driven hybrid intelli-gence optimization methods and technologies in improving the performance of global production indicesin mineral processing. First, we provide the problem description. Next, we summarize recent progress indata-based optimization for mineral processing plants. This optimization consists of four layers: optimiza-tion of the target values for monthly global production indices, optimization of the target values for dailyglobal production indices, optimization of the target values for operational indices, and automation systemsfor unit processes. We briefly overview recent progress in each of the different layers. Finally, we point outopportunities for future works in data-based optimization for mineral processing plants.
基金supported by the National Natural Science Foundation of China(61921004,U1713209,61803085,and 62041301)。
文摘In this paper,a data-based scheme is proposed to solve the optimal tracking problem of autonomous nonlinear switching systems.The system state is forced to track the reference signal by minimizing the performance function.First,the problem is transformed to solve the corresponding Bellman optimality equation in terms of the Q-function(also named as action value function).Then,an iterative algorithm based on adaptive dynamic programming(ADP)is developed to find the optimal solution which is totally based on sampled data.The linear-in-parameter(LIP)neural network is taken as the value function approximator.Considering the presence of approximation error at each iteration step,the generated approximated value function sequence is proved to be boundedness around the exact optimal solution under some verifiable assumptions.Moreover,the effect that the learning process will be terminated after a finite number of iterations is investigated in this paper.A sufficient condition for asymptotically stability of the tracking error is derived.Finally,the effectiveness of the algorithm is demonstrated with three simulation examples.
基金supported in part by the National Natural ScienceFoundation of China(61533017,61973330,61773075,61603387)the Early Career Development Award of SKLMCCS(20180201)the State Key Laboratory of Synthetical Automation for Process Industries(2019-KF-23-03)。
文摘In this paper, a data-based fault tolerant control(FTC) scheme is investigated for unknown continuous-time(CT)affine nonlinear systems with actuator faults. First, a neural network(NN) identifier based on particle swarm optimization(PSO) is constructed to model the unknown system dynamics. By utilizing the estimated system states, the particle swarm optimized critic neural network(PSOCNN) is employed to solve the Hamilton-Jacobi-Bellman equation(HJBE) more efficiently.Then, a data-based FTC scheme, which consists of the NN identifier and the fault compensator, is proposed to achieve actuator fault tolerance. The stability of the closed-loop system under actuator faults is guaranteed by the Lyapunov stability theorem. Finally, simulations are provided to demonstrate the effectiveness of the developed method.
文摘For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.
基金supported by the Social Science Planning Foundation of Chongqing(Grant No.:2011QNCB28)
文摘Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.
文摘This paper focuses on developing a system that allows presentation authors to effectively retrieve presentation slides for reuse from a large volume of existing presentation materials. We assume episodic memories of the authors can be used as contextual keywords in query expressions to efficiently dig out the expected slides for reuse rather than using only the part-of-slide-descriptions-based keyword queries. As a system, a new slide repository is proposed, composed of slide material collections, slide content data and pieces of information from authors' episodic memories related to each slide and presentation together with a slide retrieval application enabling authors to use the episodic memories as part of queries. The result of our experiment shows that the episodic memory-used queries can give more discoverability than the keyword-based queries. Additionally, an improvement model is discussed on the slide retrieval for further slide-finding efficiency by expanding the episodic memories model in the repository taking in the links with the author-and-slide-related data and events having been post on the private and social media sites.
文摘With its untameable and traceable properties,blockchain technology has been widely used in the field of data sharing.How to preserve individual privacy while enabling efficient data queries is one of the primary issues with secure data sharing.In this paper,we study verifiable keyword frequency(KF)queries with local differential privacy in blockchain.Both the numerical and the keyword attributes are present in data objects;the latter are sensitive and require privacy protection.However,prior studies in blockchain have the problem of trilemma in privacy protection and are unable to handle KF queries.We propose an efficient framework that protects data owners’privacy on keyword attributes while enabling quick and verifiable query processing for KF queries.The framework computes an estimate of a keyword’s frequency and is efficient in query time and verification object(VO)size.A utility-optimized local differential privacy technique is used for privacy protection.The data owner adds noise locally into data based on local differential privacy so that the attacker cannot infer the owner of the keywords while keeping the difference in the probability distribution of the KF within the privacy budget.We propose the VB-cm tree as the authenticated data structure(ADS).The VB-cm tree combines the Verkle tree and the Count-Min sketch(CM-sketch)to lower the VO size and query time.The VB-cm tree uses the vector commitment to verify the query results.The fixed-size CM-sketch,which summarizes the frequency of multiple keywords,is used to estimate the KF via hashing operations.We conduct an extensive evaluation of the proposed framework.The experimental results show that compared to theMerkle B+tree,the query time is reduced by 52.38%,and the VO size is reduced by more than one order of magnitude.
基金the phased achievement of the major research project of the National Social Science Fund of China(Project Approval No.21VGQ010)supported by the 2021 Central University Basic Scientific Research Project of Lanzhou University(Project Approval No.21lzujbkyjd002).
文摘To cope with the challenges of CoViD-19,europe has adopted relevant measures of a data-based approach to governance,on which scholars have huge differences,and the related researches are conducive to further discussion on the differences.By sorting out the challenges posed by the pandemic to public security and data protection in europe,we can summarize the“european Solution”of the data-based approach to governance,including legislation,instruments,supervision,international cooperation,and continuity.The“Solution”has curbed the spread of the pandemic to a certain extent.However,due to the influence of the traditional values of the EU,the“Solution”is too idealistic in the balance between public security and data protection,which intensifies the dilemma and causes many problems,such as ambiguous legislation,inadequate effectiveness and security of instruments,an arduous endeavor in inter national cooperation,and imperfect regulations on digital green certificates.Therefore,in a major public health crisis,there is still a long way to go in exploring a balance between public security and data protection.
基金The State Grid Technology Project(No.5108202340042A-1-1-ZN).
文摘To solve the low efficiency of approximate queries caused by the large sizes of the knowledge graphs in the real world,an embedding-based approximate query method is proposed.First,the nodes in the query graph are classified according to the degrees of approximation required for different types of nodes.This classification transforms the query problem into three constraints,from which approximate information is extracted.Second,candidates are generated by calculating the similarity between embeddings.Finally,a deep neural network model is designed,incorporating a loss function based on the high-dimensional ellipsoidal diffusion distance.This model identifies the distance between nodes using their embeddings and constructs a score function.k nodes are returned as the query results.The results show that the proposed method can return both exact results and approximate matching results.On datasets DBLP(DataBase systems and Logic Programming)and FUA-S(Flight USA Airports-Sparse),this method exhibits superior performance in terms of precision and recall,returning results in 0.10 and 0.03 s,respectively.This indicates greater efficiency compared to PathSim and other comparative methods.
基金partially supported by NSFC under Grant Nos.61832001 and 62272008ZTE Industry-University-Institute Fund Project。
文摘The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimization is already an NP-hard problem.Learned query optimizers(mainly in the single-node DBMS)receive attention due to its capability to capture data distributions and flexible ways to avoid hard-craft rules in refinement and adaptation to new hardware.In this paper,we focus on extensions of learned query optimizers to distributed DBMSs.Specifically,we propose one possible but general architecture of the learned query optimizer in the distributed context and highlight differences from the learned optimizer in the single-node ones.In addition,we discuss the challenges and possible solutions.
基金supported by the Student Scheme provided by Universiti Kebangsaan Malaysia with the Code TAP-20558.
文摘A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various forms of semi-structured,structured,and unstructured information.These systems use a flat architecture and run different types of data analytics.NoSQL databases are nontabular and store data in a different manner than the relational table.NoSQL databases come in various forms,including key-value pairs,documents,wide columns,and graphs,each based on its data model.They offer simpler scalability and generally outperform traditional relational databases.While NoSQL databases can store diverse data types,they lack full support for atomicity,consistency,isolation,and durability features found in relational databases.Consequently,employing machine learning approaches becomes necessary to categorize complex structured query language(SQL)queries.Results indicate that the most frequently used automatic classification technique in processing SQL queries on NoSQL databases is machine learning-based classification.Overall,this study provides an overview of the automatic classification techniques used in processing SQL queries on NoSQL databases.Understanding these techniques can aid in the development of effective and efficient NoSQL database applications.
文摘With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enhance database query systems, enabling more intuitive and semantic query mechanisms. Our model leverages LLM’s deep learning architecture to interpret and process natural language queries and translate them into accurate database queries. The system integrates an LLM-powered semantic parser that translates user input into structured queries that can be understood by the database management system. First, the user query is pre-processed, the text is normalized, and the ambiguity is removed. This is followed by semantic parsing, where the LLM interprets the pre-processed text and identifies key entities and relationships. This is followed by query generation, which converts the parsed information into a structured query format and tailors it to the target database schema. Finally, there is query execution and feedback, where the resulting query is executed on the database and the results are returned to the user. The system also provides feedback mechanisms to improve and optimize future query interpretations. By using advanced LLMs for model implementation and fine-tuning on diverse datasets, the experimental results show that the proposed method significantly improves the accuracy and usability of database queries, making data retrieval easy for users without specialized knowledge.
基金supported by the“Human Resources Program in Energy Technology”of the Korea Institute of Energy Technology Evaluation and Planning(KETEP)and granted financial resources from the Ministry of Trade,Industry,and Energy,Korea(No.20204010600090).
文摘This research aims to enhance Clinical Decision Support Systems(CDSS)within Wireless Body Area Networks(WBANs)by leveraging advanced machine learning techniques.Specifically,we target the challenges of accurate diagnosis in medical imaging and sequential data analysis using Recurrent Neural Networks(RNNs)with Long Short-Term Memory(LSTM)layers and echo state cells.These models are tailored to improve diagnostic precision,particularly for conditions like rotator cuff tears in osteoporosis patients and gastrointestinal diseases.Traditional diagnostic methods and existing CDSS frameworks often fall short in managing complex,sequential medical data,struggling with long-term dependencies and data imbalances,resulting in suboptimal accuracy and delayed decisions.Our goal is to develop Artificial Intelligence(AI)models that address these shortcomings,offering robust,real-time diagnostic support.We propose a hybrid RNN model that integrates SimpleRNN,LSTM layers,and echo state cells to manage long-term dependencies effectively.Additionally,we introduce CG-Net,a novel Convolutional Neural Network(CNN)framework for gastrointestinal disease classification,which outperforms traditional CNN models.We further enhance model performance through data augmentation and transfer learning,improving generalization and robustness against data scarcity and imbalance.Comprehensive validation,including 5-fold cross-validation and metrics such as accuracy,precision,recall,F1-score,and Area Under the Curve(AUC),confirms the models’reliability.Moreover,SHapley Additive exPlanations(SHAP)and Local Interpretable Model-agnostic Explanations(LIME)are employed to improve model interpretability.Our findings show that the proposed models significantly enhance diagnostic accuracy and efficiency,offering substantial advancements in WBANs and CDSS.
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
基金The National Natural Science Foundation of China(No.61070170)the Natural Science Foundation of Higher Education Institutions of Jiangsu Province(No.11KJB520017)Suzhou Application Foundation Research Project(No.SYG201238)
文摘Aiming at the problem that only some types of SPARQL ( simple protocal and resource description framework query language) queries can be answered by using the current resource description framework link traversal based query execution (RDF-LTE) approach, this paper discusses how the execution order of the triple pattern affects the query results and cost based on concrete SPARQL queries, and analyzes two properties of the web of linked data, missing backward links and missing contingency solution. Then three heuristic principles for logic query plan optimization, namely, the filtered basic graph pattern (FBGP) principle, the triple pattern chain principle and the seed URIs principle, are proposed. The three principles contribute to decrease the intermediate solutions and increase the types of queries that can be answered. The effectiveness and feasibility of the proposed approach is evaluated. The experimental results show that more query results can be returned with less cost, thus enabling users to develop the full potential of the web of linked data.
文摘Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.