期刊文献+
共找到11篇文章
< 1 >
每页显示 20 50 100
A Systematic Review of Automated Classification for Simple and Complex Query SQL on NoSQL Database
1
作者 Nurhadi Rabiah Abdul Kadir +1 位作者 Ely Salwana Mat Surin Mahidur R.Sarker 《Computer Systems Science & Engineering》 2024年第6期1405-1435,共31页
A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various form... A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various forms of semi-structured,structured,and unstructured information.These systems use a flat architecture and run different types of data analytics.NoSQL databases are nontabular and store data in a different manner than the relational table.NoSQL databases come in various forms,including key-value pairs,documents,wide columns,and graphs,each based on its data model.They offer simpler scalability and generally outperform traditional relational databases.While NoSQL databases can store diverse data types,they lack full support for atomicity,consistency,isolation,and durability features found in relational databases.Consequently,employing machine learning approaches becomes necessary to categorize complex structured query language(SQL)queries.Results indicate that the most frequently used automatic classification technique in processing SQL queries on NoSQL databases is machine learning-based classification.Overall,this study provides an overview of the automatic classification techniques used in processing SQL queries on NoSQL databases.Understanding these techniques can aid in the development of effective and efficient NoSQL database applications. 展开更多
关键词 NoSQL database data lake machine learning ACID complex query smart city
下载PDF
(r,QI)-Transform:Reversible Data Anonymity Based on Numeric Type of Data in Outsourced Database
2
作者 Iuon-Chang Lin Yang-Te Lee Chen-Yang Cheng 《Journal of Electronic Science and Technology》 CAS CSCD 2017年第3期222-230,共9页
An outsource database is a database service provided by cloud computing companies.Using the outsource database can reduce the hardware and software's cost and also get more efficient and reliable data processing capa... An outsource database is a database service provided by cloud computing companies.Using the outsource database can reduce the hardware and software's cost and also get more efficient and reliable data processing capacity.However,the outsource database still has some challenges.If the service provider does not have sufficient confidence,there is the possibility of data leakage.The data may has user's privacy,so data leakage may cause data privacy leak.Based on this factor,to protect the privacy of data in the outsource database becomes very important.In the past,scholars have proposed k-anonymity to protect data privacy in the database.It lets data become anonymous to avoid data privacy leak.But k-anonymity has some problems,it is irreversible,and easier to be attacked by homogeneity attack and background knowledge attack.Later on,scholars have proposed some studies to solve homogeneity attack and background knowledge attack.But their studies still cannot recover back to the original data.In this paper,we propose a data anonymity method.It can be reversible and also prevent those two attacks.Our study is based on the proposed r-transform.It can be used on the numeric type of attributes in the outsource database.In the experiment,we discussed the time required to anonymize and recover data.Furthermore,we investigated the defense against homogeneous attack and background knowledge attack.At the end,we summarized the proposed method and future researches. 展开更多
关键词 Index Terms--Cloud database data anonymity database privacy outsource database REVERSIBLE
下载PDF
Establishment and Assessment of Plasma Disruption and Warning Databases from EAST
3
作者 王勃 Robert GRANETZ +4 位作者 肖炳甲 李建刚 杨飞 李君君 陈大龙 《Plasma Science and Technology》 SCIE EI CAS CSCD 2016年第12期1162-1168,共7页
Disruption database and disruption warning database of the EAST tokamak had been established by a disruption research group. The disruption database, based on Structured Query Language(SQL), comprises 41 disruption ... Disruption database and disruption warning database of the EAST tokamak had been established by a disruption research group. The disruption database, based on Structured Query Language(SQL), comprises 41 disruption parameters, which include current quench characteristics, EFIT equilibrium characteristics, kinetic parameters, halo currents,and vertical motion. Presently most disruption databases are based on plasma experiments of non-superconducting tokamak devices. The purposes of the EAST database are to find disruption characteristics and disruption statistics to the fully superconducting tokamak EAST,to elucidate the physics underlying tokamak disruptions, to explore the influence of disruption on superconducting magnets and to extrapolate toward future burning plasma devices. In order to quantitatively assess the usefulness of various plasma parameters for predicting disruptions,a similar SQL database to Alcator C-Mod for EAST has been created by compiling values for a number of proposed disruption-relevant parameters sampled from all plasma discharges in the2015 campaign. The detailed statistic results and analysis of two databases on the EAST tokamak are presented. 展开更多
关键词 disruption database disruption warning database data analysis
下载PDF
The Use of Data Mining Techniques in Rockburst Risk Assessment 被引量:10
4
作者 Luis Ribeiro e Sousa Tiago Miranda +1 位作者 Rita Leal e Sousa Joaquim Tinoco 《Engineering》 SCIE EI 2017年第4期552-558,共7页
Rockburst is an important phenomenon that has affected many deep underground mines around the world. An understanding of this phenomenon is relevant to the management of such events, which can lead to saving both cost... Rockburst is an important phenomenon that has affected many deep underground mines around the world. An understanding of this phenomenon is relevant to the management of such events, which can lead to saving both costs and lives. Laboratory experiments are one way to obtain a deeper and better understanding of the mechanisms of rockburst. In a previous study by these authors, a database of rockburst laboratory tests was created; in addition, with the use of data mining (DM) techniques, models to predict rockburst maximum stress and rockburst risk indexes were developed. In this paper, we focus on the analysis of a database of in situ cases of rockburst in order to build influence diagrams, list the factors that interact in the occurrence of rockburst, and understand the relationships between these factors. The in situ rockburst database was further analyzed using different DM techniques ranging from artificial neural networks (ANNs) to naive Bayesian classifiers. The aim was to predict the type of rockburst-that is, the rockburst level-based on geologic and construction characteristics of the mine or tunnel. Conclusions are drawn at the end of the paper. 展开更多
关键词 Rockburst data mining Bayesian networks In situ database
下载PDF
Biological Databases for Human Research 被引量:2
5
作者 Dong Zou Lina Ma +1 位作者 Jun Yu Zhang Zhang 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2015年第1期55-63,共9页
The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases.With the explosive growth of biological data, ... The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases.With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of humanrelated biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation. 展开更多
关键词 Human database Big data database category Curation
原文传递
Matching CCD images to a stellar catalog using locality-sensitive hashing 被引量:1
6
作者 Bo Liu Jia-Zong Yu Qing-Yu Peng 《Research in Astronomy and Astrophysics》 SCIE CAS CSCD 2018年第2期105-114,共10页
The usage of a subset of observed stars in a CCD image to find their corresponding matched stars in a stellar catalog is an important issue in astronomical research. Subgraph isomorphic-based algorithms are the most w... The usage of a subset of observed stars in a CCD image to find their corresponding matched stars in a stellar catalog is an important issue in astronomical research. Subgraph isomorphic-based algorithms are the most widely used methods in star catalog matching. When more subgraph features are provided, the CCD images are recognized better. However, when the navigation feature database is large, the method requires more time to match the observing model. To solve this problem, this study investigates further and improves subgraph isomorphic matching algorithms. We present an algorithm based on a locality-sensitive hashing technique, which allocates quadrilateral models in the navigation feature database into different hash buckets and reduces the search range to the bucket in which the observed quadrilateral model is located. Experimental results indicate the effectivity of our method. 展开更多
关键词 astronomical databases: miscellaneous -- methods: data analysis -- techniques: imageprocessing
下载PDF
Estimating stellar effective temperatures and detected angular parameters using stochastic particle swarm optimization
7
作者 Chuan-Xin Zhang Yuan Yuan +2 位作者 Hao-Wei Zhang Yong Shuai He-Ping Tan 《Research in Astronomy and Astrophysics》 SCIE CAS CSCD 2016年第9期65-74,共10页
Considering features of stellar spectral radiation and sky surveys, we established a computational model for stellar effective temperatures, detected angular parameters and gray rates. Using known stellar flux data in... Considering features of stellar spectral radiation and sky surveys, we established a computational model for stellar effective temperatures, detected angular parameters and gray rates. Using known stellar flux data in some bands, we estimated stellar effective temperatures and detected angular parameters using stochastic particle swarm optimization (SPSO). We first verified the reliability of SPSO, and then determined reasonable parameters that produced highly accurate estimates under certain gray deviation levels. Finally, we calculated 177 860 stellar effective temperatures and detected angular parameters using data from the Midcourse Space Experiment (MSX) catalog. These derived stellar effective temperatures were accurate when we compared them to known values from literatures. This research makes full use of catalog data and presents an original technique for studying stellar characteristics. It proposes a novel method for calculating stellar effective temperatures and detecting angular parameters, and provides theoretical and practical data for finding information about radiation in any band. 展开更多
关键词 physical data and process radiative transfer -- methods data analysis -- astronomical databases miscellaneous -- stars ATMOSPHERES
下载PDF
A Comparison of BBN,ADTree and MLP in separating Quasars from Large Survey Catalogues
8
作者 Yan-Xia Zhang Yong-Heng Zhao 《Chinese Journal of Astronomy and Astrophysics》 CSCD 2007年第2期289-296,共8页
We compare the performance of Bayesian Belief Networks (BBN), Multilayer Perception (MLP) networks and Alternating Decision Trees (ADtree) on separating quasars from stars with the database from the 2MASS and FI... We compare the performance of Bayesian Belief Networks (BBN), Multilayer Perception (MLP) networks and Alternating Decision Trees (ADtree) on separating quasars from stars with the database from the 2MASS and FIRST survey catalogs. Having a training sample of sources of known object types, the classifiers are trained to separate quasars from stars. By the statistical properties of the sample, the features important for classifica- tion are selected. We compare the classification results with and without feature selection. Experiments show that the results with feature selection are better than those without feature selection. From the high accuracy found, it is concluded that these automated methods are robust and effective for classifying point sources. They may all be applied to large survey projects (e.g. selecting input catalogs) and for other astronomical issues, such as the parameter measurement of stars and the redshift estimation of galaxies and quasars. 展开更多
关键词 astronomical databases miscellaneous - catalogs - methods: data analysis methods statistical
下载PDF
Development of a data collection and management system in West Africa:challenges and sustainability 被引量:1
9
作者 Jeffrey G.Shaffer Seydou O.Doumbia +12 位作者 Daouda Ndiaye Ayouba Diarra Jules F.Gomis Davis Nwakanma Ismaela Abubakar Abdullahi Ahmad Muna Affara Mary Lukowski Clarissa Valim James C.Welty Frances J.Mather Joseph Keating Donald J.Krogstad 《Infectious Diseases of Poverty》 SCIE 2018年第1期1300-1313,共14页
Background:Developing and sustaining a data collection and management system(DCMS)is difficult in malariaendemic countries because of limitations in internet bandwidth,computer resources and numbers of trained personn... Background:Developing and sustaining a data collection and management system(DCMS)is difficult in malariaendemic countries because of limitations in internet bandwidth,computer resources and numbers of trained personnel.The premise of this paper is that development of a DCMS in West Africa was a critically important outcome of the West African International Centers of Excellence for Malaria Research.The purposes of this paper are to make that information available to other investigators and to encourage the linkage of DCMSs to international research and Ministry of Health data systems and repositories.Methods:We designed and implemented a DCMS to link study sites in Mali,Senegal and The Gambia.This system was based on case report forms for epidemiologic,entomologic,clinical and laboratory aspects of plasmodial infection and malarial disease for a longitudinal cohort study and included on-site training for Principal Investigators and Data Managers.Based on this experience,we propose guidelines for the design and sustainability of DCMSs in environments with limited resources and personnel.Results:From 2012 to 2017,we performed biannual thick smear surveys for plasmodial infection,mosquito collections for anopheline biting rates and sporozoite rates and year-round passive case detection for malarial disease in four longitudinal cohorts with 7708 individuals and 918 households in Senegal,The Gambia and Mali.Major challenges included the development of uniform definitions and reporting,assessment of data entry error rates,unstable and limited internet access and software and technology maintenance.Strengths included entomologic collections linked to longitudinal cohort studies,on-site data centres and a cloud-based data repository.Conclusions:At a time when research on diseases of poverty in low and middle-income countries is a global priority,the resources available to ensure accurate data collection and the electronic availability of those data remain severely limited.Based on our experience,we suggest the development of a regional DCMS.This approach is more economical than separate data centres and has the potential to improve data quality by encouraging shared case definitions,data validation strategies and analytic approaches including the molecular analysis of treatment successes and failures. 展开更多
关键词 data collection data(database)management system Diseases of poverty MALARIA Plasmodium falciparum
原文传递
Banian: A Cross-Platform Interactive Query System for Structured Big Data 被引量:2
10
作者 Tao Xu Dongsheng Wang Guodong Liu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2015年第1期62-71,共10页
The rapid growth of structured data has presented new technological challenges in the research fields of big data and relational database. In this paper, we present an efficient system for managing and analyzing PB le... The rapid growth of structured data has presented new technological challenges in the research fields of big data and relational database. In this paper, we present an efficient system for managing and analyzing PB level structured data called Banian. Banian overcomes the storage structure limitation of relational database and effectively integrates interactive query with large-scale storage management. It provides a uniform query interface for cross-platform datasets and thus shows favorable compatibility and scalability. Banian's system architecture mainly includes three layers:(1) a storage layer using HDFS for the distributed storage of massive data;(2) a scheduling and execution layer employing the splitting and scheduling technology of parallel database; and(3)an application layer providing a cross-platform query interface and supporting standard SQL. We evaluate Banian using PB level Internet data and the TPC-H benchmark. The results show that when compared with Hive, Banian improves the query performance to a maximum of 30 times and achieves better scalability and concurrency. 展开更多
关键词 big data interactive query relational database HDFS cross platform
原文传递
A Partition Checkpoint Strategy Based on Data Segment Priority
11
作者 LIANG Ping LIU Yunsheng 《Wuhan University Journal of Natural Sciences》 CAS 2012年第2期109-113,共5页
A partition checkpoint strategy based on data segment priority is presented to meet the timing constraints of the data and the transaction in embedded real-time main memory database systems(ERTMMDBS) as well as to r... A partition checkpoint strategy based on data segment priority is presented to meet the timing constraints of the data and the transaction in embedded real-time main memory database systems(ERTMMDBS) as well as to reduce the number of the transactions missing their deadlines and the recovery time.The partition checkpoint strategy takes into account the characteristics of the data and the transactions associated with it;moreover,it partitions the database according to the data segment priority and sets the corresponding checkpoint frequency to each partition for independent checkpoint operation.The simulation results show that the partition checkpoint strategy decreases the ratio of trans-actions missing their deadlines. 展开更多
关键词 embedded real-time main memory database systems database recovery partition checkpoint data segment priority
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部