Fault diagnosis is important for maintaining the safety and effectiveness of chemical process.Considering the multivariate,nonlinear,and dynamic characteristic of chemical process,many time-series-based data-driven fa...Fault diagnosis is important for maintaining the safety and effectiveness of chemical process.Considering the multivariate,nonlinear,and dynamic characteristic of chemical process,many time-series-based data-driven fault diagnosis methods have been developed in recent years.However,the existing methods have the problem of long-term dependency and are difficult to train due to the sequential way of training.To overcome these problems,a novel fault diagnosis method based on time-series and the hierarchical multihead self-attention(HMSAN)is proposed for chemical process.First,a sliding window strategy is adopted to construct the normalized time-series dataset.Second,the HMSAN is developed to extract the time-relevant features from the time-series process data.It improves the basic self-attention model in both width and depth.With the multihead structure,the HMSAN can pay attention to different aspects of the complicated chemical process and obtain the global dynamic features.However,the multiple heads in parallel lead to redundant information,which cannot improve the diagnosis performance.With the hierarchical structure,the redundant information is reduced and the deep local time-related features are further extracted.Besides,a novel many-to-one training strategy is introduced for HMSAN to simplify the training procedure and capture the long-term dependency.Finally,the effectiveness of the proposed method is demonstrated by two chemical cases.The experimental results show that the proposed method achieves a great performance on time-series industrial data and outperforms the state-of-the-art approaches.展开更多
Accurate mapping and timely monitoring of urban redevelopment are pivotal for urban studies and decisionmakers to foster sustainable urban development.Traditional mapping methods heavily depend on field surveys and su...Accurate mapping and timely monitoring of urban redevelopment are pivotal for urban studies and decisionmakers to foster sustainable urban development.Traditional mapping methods heavily depend on field surveys and subjective questionnaires,yielding less objective,reliable,and timely data.Recent advancements in Geographic Information Systems(GIS)and remote-sensing technologies have improved the identification and mapping of urban redevelopment through quantitative analysis using satellite-based observations.Nonetheless,challenges persist,particularly concerning accuracy and significant temporal delays.This study introduces a novel approach to modeling urban redevelopment,leveraging machine learning algorithms and remote-sensing data.This methodology can facilitate the accurate and timely identification of urban redevelopment activities.The study’s machine learning model can analyze time-series remote-sensing data to identify spatio-temporal and spectral patterns related to urban redevelopment.The model is thoroughly evaluated,and the results indicate that it can accurately capture the time-series patterns of urban redevelopment.This research’s findings are useful for evaluating urban demographic and economic changes,informing policymaking and urban planning,and contributing to sustainable urban development.The model can also serve as a foundation for future research on early-stage urban redevelopment detection and evaluation of the causes and impacts of urban redevelopment.展开更多
The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random mis...The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random missing(RM)that differs significantly from common missing patterns of RTT-AT.The method for solving the RM may experience performance degradation or failure when applied to RTT-AT imputation.Conventional autoregressive deep learning methods are prone to error accumulation and long-term dependency loss.In this paper,a non-autoregressive imputation model that addresses the issue of missing value imputation for two common missing patterns in RTT-AT is proposed.Our model consists of two probabilistic sparse diagonal masking self-attention(PSDMSA)units and a weight fusion unit.It learns missing values by combining the representations outputted by the two units,aiming to minimize the difference between the missing values and their actual values.The PSDMSA units effectively capture temporal dependencies and attribute correlations between time steps,improving imputation quality.The weight fusion unit automatically updates the weights of the output representations from the two units to obtain a more accurate final representation.The experimental results indicate that,despite varying missing rates in the two missing patterns,our model consistently outperforms other methods in imputation performance and exhibits a low frequency of deviations in estimates for specific missing entries.Compared to the state-of-the-art autoregressive deep learning imputation model Bidirectional Recurrent Imputation for Time Series(BRITS),our proposed model reduces mean absolute error(MAE)by 31%~50%.Additionally,the model attains a training speed that is 4 to 8 times faster when compared to both BRITS and a standard Transformer model when trained on the same dataset.Finally,the findings from the ablation experiments demonstrate that the PSDMSA,the weight fusion unit,cascade network design,and imputation loss enhance imputation performance and confirm the efficacy of our design.展开更多
The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users wit...The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users with the right to issue a Data Subject Access Request(DSAR).Responding to such requests requires database administrators to identify information related to an individual accurately.However,manual compliance poses significant challenges and is error-prone.Database administrators need to write queries through time-consuming labor.The demand for large amounts of data by AI systems has driven the development of NoSQL databases.Due to the flexible schema of NoSQL databases,identifying personal information becomes even more challenging.This paper develops an automated tool to identify personal information that can help organizations respond to DSAR.Our tool employs a combination of various technologies,including schema extraction of NoSQL databases and relationship identification from query logs.We describe the algorithm used by our tool,detailing how it discovers and extracts implicit relationships from NoSQL databases and generates relationship graphs to help developers accurately identify personal data.We evaluate our tool on three datasets,covering different database designs,achieving an F1 score of 0.77 to 1.Experimental results demonstrate that our tool successfully identifies information relevant to the data subject.Our tool reduces manual effort and simplifies GDPR compliance,showing practical application value in enhancing the privacy performance of NOSQL databases and AI systems.展开更多
Discovery of materials using“bottom-up”or“top-down”approach is of great interest in materials science.Layered materials consisting of two-dimensional(2D)building blocks provide a good platform to explore new mater...Discovery of materials using“bottom-up”or“top-down”approach is of great interest in materials science.Layered materials consisting of two-dimensional(2D)building blocks provide a good platform to explore new materials in this respect.In van der Waals(vdW)layered materials,these building blocks are charge neutral and can be isolated from their bulk phase(top-down),but usually grow on substrate.In ionic layered materials,they are charged and usually cannot exist independently but can serve as motifs to construct new materials(bottom-up).In this paper,we introduce our recently constructed databases for 2D material-substrate interface(2DMSI),and 2D charged building blocks.For 2DMSI database,we systematically build a workflow to predict appropriate substrates and their geometries at substrates,and construct the 2DMSI database.For the 2D charged building block database,1208 entries from bulk material database are identified.Information of crystal structure,valence state,source,dimension and so on is provided for each entry with a json format.We also show its application in designing and searching for new functional layered materials.The 2DMSI database,building block database,and designed layered materials are available in Science Data Bank at https://doi.org/10.57760/sciencedb.j00113.00188.展开更多
The increasing penetration rate of electric kickboard vehicles has been popularized and promoted primarily because of its clean and efficient features.Electric kickboards are gradually growing in popularity in tourist...The increasing penetration rate of electric kickboard vehicles has been popularized and promoted primarily because of its clean and efficient features.Electric kickboards are gradually growing in popularity in tourist and education-centric localities.In the upcoming arrival of electric kickboard vehicles,deploying a customer rental service is essential.Due to its freefloating nature,the shared electric kickboard is a common and practical means of transportation.Relocation plans for shared electric kickboards are required to increase the quality of service,and forecasting demand for their use in a specific region is crucial.Predicting demand accurately with small data is troublesome.Extensive data is necessary for training machine learning algorithms for effective prediction.Data generation is a method for expanding the amount of data that will be further accessible for training.In this work,we proposed a model that takes time-series customers’electric kickboard demand data as input,pre-processes it,and generates synthetic data according to the original data distribution using generative adversarial networks(GAN).The electric kickboard mobility demand prediction error was reduced when we combined synthetic data with the original data.We proposed Tabular-GAN-Modified-WGAN-GP for generating synthetic data for better prediction results.We modified The Wasserstein GAN-gradient penalty(GP)with the RMSprop optimizer and then employed Spectral Normalization(SN)to improve training stability and faster convergence.Finally,we applied a regression-based blending ensemble technique that can help us to improve performance of demand prediction.We used various evaluation criteria and visual representations to compare our proposed model’s performance.Synthetic data generated by our suggested GAN model is also evaluated.The TGAN-Modified-WGAN-GP model mitigates the overfitting and mode collapse problem,and it also converges faster than previous GAN models for synthetic data creation.The presented model’s performance is compared to existing ensemble and baseline models.The experimental findings imply that combining synthetic and actual data can significantly reduce prediction error rates in the mean absolute percentage error(MAPE)of 4.476 and increase prediction accuracy.展开更多
BACKGROUND The literature has discussed the relationship between environmental factors and depressive disorders;however,the results are inconsistent in different studies and regions,as are the interaction effects betw...BACKGROUND The literature has discussed the relationship between environmental factors and depressive disorders;however,the results are inconsistent in different studies and regions,as are the interaction effects between environmental factors.We hypo-thesized that meteorological factors and ambient air pollution individually affect and interact to affect depressive disorder morbidity.AIM To investigate the effects of meteorological factors and air pollution on depressive disorders,including their lagged effects and interactions.METHODS The samples were obtained from a class 3 hospital in Harbin,China.Daily hos-pital admission data for depressive disorders from January 1,2015 to December 31,2022 were obtained.Meteorological and air pollution data were also collected during the same period.Generalized additive models with quasi-Poisson regre-ssion were used for time-series modeling to measure the non-linear and delayed effects of environmental factors.We further incorporated each pair of environ-mental factors into a bivariate response surface model to examine the interaction effects on hospital admissions for depressive disorders.RESULTS Data for 2922 d were included in the study,with no missing values.The total number of depressive admissions was 83905.Medium to high correlations existed between environmental factors.Air temperature(AT)and wind speed(WS)significantly affected the number of admissions for depression.An extremely low temperature(-29.0℃)at lag 0 caused a 53%[relative risk(RR)=1.53,95%confidence interval(CI):1.23-1.89]increase in daily hospital admissions relative to the median temperature.Extremely low WSs(0.4 m/s)at lag 7 increased the number of admissions by 58%(RR=1.58,95%CI:1.07-2.31).In contrast,atmospheric pressure and relative humidity had smaller effects.Among the six air pollutants considered in the time-series model,nitrogen dioxide(NO_(2))was the only pollutant that showed significant effects over non-cumulative,cumulative,immediate,and lagged conditions.The cumulative effect of NO_(2) at lag 7 was 0.47%(RR=1.0047,95%CI:1.0024-1.0071).Interaction effects were found between AT and the five air pollutants,atmospheric temperature and the four air pollutants,WS and sulfur dioxide.CONCLUSION Meteorological factors and the air pollutant NO_(2) affect daily hospital admissions for depressive disorders,and interactions exist between meteorological factors and ambient air pollution.展开更多
Multivariate time-series forecasting(MTSF)plays an important role in diverse real-world applications.To achieve better accuracy in MTSF,time-series patterns in each variable and interrelationship patterns between vari...Multivariate time-series forecasting(MTSF)plays an important role in diverse real-world applications.To achieve better accuracy in MTSF,time-series patterns in each variable and interrelationship patterns between variables should be considered together.Recently,graph neural networks(GNNs)has gained much attention as they can learn both patterns using a graph.For accurate forecasting through GNN,a well-defined graph is required.However,existing GNNs have limitations in reflecting the spectral similarity and time delay between nodes,and consider all nodes with the same weight when constructing graph.In this paper,we propose a novel graph construction method that solves aforementioned limitations.We first calculate the Fourier transform-based spectral similarity and then update this similarity to reflect the time delay.Then,we weight each node according to the number of edge connections to get the final graph and utilize it to train the GNN model.Through experiments on various datasets,we demonstrated that the proposed method enhanced the performance of GNN-based MTSF models,and the proposed forecasting model achieve of up to 18.1%predictive performance improvement over the state-of-the-art model.展开更多
In the past two decades,because of the significant increase in the availability of differential interferometry from synthetic aperture radar and GPS data,spaceborne geodesy has been widely employed to determine the co...In the past two decades,because of the significant increase in the availability of differential interferometry from synthetic aperture radar and GPS data,spaceborne geodesy has been widely employed to determine the co-seismic displacement field of earthquakes.On April 18,2021,a moderate earthquake(Mw 5.8)occurred east of Bandar Ganaveh,southern Iran,followed by intensive seismic activity and aftershocks of various magnitudes.We use two-pass D-InSAR and Small Baseline Inversion techniques via the LiCSBAS suite to study the coseismic displacement and monitor the four-month post-seismic deformation of the Bandar Ganaveh earthquake,as well as constrain the fault geometry of the co-seismic faulting mechanism during the seismic sequence.Analyses show that the co-and postseismic deformation are distributed in relatively shallow depths along with an NW-SE striking and NE dipping complex reverse/thrust fault branches of the Zagros Mountain Front Fault,complying with the main trend of the Zagros structures.The average cumulative displacements were obtained from-137.5 to+113.3 mm/yr in the SW and NE blocks of the Mountain Front Fault,respectively.The received maximum uplift amount is approximately consistent with the overall orogen-normal shortening component of the Arabian-Eurasian convergence in the Zagros region.No surface ruptures were associated with the seismic source;therefore,we propose a shallow blind thrust/reverse fault(depth~10 km)connected to the deeper basal decollement fault within a complex tectonic zone,emphasizing the thin-skinned tectonics.展开更多
All-solid-state batteries(ASSBs)are a class of safer and higher-energy-density materials compared to conventional devices,from which solid-state electrolytes(SSEs)are their essential components.To date,investigations ...All-solid-state batteries(ASSBs)are a class of safer and higher-energy-density materials compared to conventional devices,from which solid-state electrolytes(SSEs)are their essential components.To date,investigations to search for high ion-conducting solid-state electrolytes have attracted broad concern.However,obtaining SSEs with high ionic conductivity is challenging due to the complex structural information and the less-explored structure-performance relationship.To provide a solution to these challenges,developing a database containing typical SSEs from available experimental reports would be a new avenue to understand the structureperformance relationships and find out new design guidelines for reasonable SSEs.Herein,a dynamic experimental database containing>600 materials was developed in a wide range of temperatures(132.40–1261.60 K),including mono-and divalent cations(e.g.,Li^(+),Na^(+),K^(+),Ag^(+),Ca^(2+),Mg^(2+),and Zn^(2+))and various types of anions(e.g.,halide,hydride,sulfide,and oxide).Data-mining was conducted to explore the relationships among different variates(e.g.,transport ion,composition,activation energy,and conductivity).Overall,we expect that this database can provide essential guidelines for the design and development of high-performance SSEs in ASSB applications.This database is dynamically updated,which can be accessed via our open-source online system.展开更多
Electronic patient data gives many advantages,but also new difficulties.Deadlocks may delay procedures like acquiring patient information.Distributed deadlock resolution solutions introduce uncertainty due to inaccura...Electronic patient data gives many advantages,but also new difficulties.Deadlocks may delay procedures like acquiring patient information.Distributed deadlock resolution solutions introduce uncertainty due to inaccurate transaction properties.Soft computing-based solutions have been developed to solve this challenge.In a single framework,ambiguous,vague,incomplete,and inconsistent transaction attribute information has received minimal attention.The work presented in this paper employed type-2 neutrosophic logic,an extension of type-1 neutrosophic logic,to handle uncertainty in real-time deadlock-resolving systems.The proposed method is structured to reflect multiple types of knowledge and relations among transactions’features that include validation factor degree,slackness degree,degree of deadline-missed transaction based on the degree of membership of truthiness,degree ofmembership of indeterminacy,and degree ofmembership of falsity.Here,the footprint of uncertainty(FOU)for truth,indeterminacy,and falsity represents the level of uncertainty that exists in the value of a grade of membership.We employed a distributed real-time transaction processing simulator(DRTTPS)to conduct the simulations and conducted experiments using the benchmark Pima Indians diabetes dataset(PIDD).As the results showed,there is an increase in detection rate and a large drop in rollback rate when this new strategy is used.The performance of Type-2 neutrosophicbased resolution is better than the Type-1 neutrosophic-based approach on the execution ratio scale.The improvement rate has reached 10%to 20%,depending on the number of arrived transactions.展开更多
Objective:To investigate the variation,expression and clinical significance of E2F3 gene in melanoma.Methods:Firstly,cBioportal database,Oncomine database and GEO database were used to analyze the variation and expres...Objective:To investigate the variation,expression and clinical significance of E2F3 gene in melanoma.Methods:Firstly,cBioportal database,Oncomine database and GEO database were used to analyze the variation and expression level of E2F3 gene in melanoma.OSskcm database and TISIDB database were used to analyze the relationship between E2F3 and melanoma prognosis and tumor immune infiltrating cells.Then,the LinkedOmics database was used to identify the differential genes related to E2F3 expression in melanoma and analyze their biological functions.Finally,small molecule compounds for the treatment of melanoma were screened through the CMap database.Results:The mutation rate of E2F3 gene in melanoma is about 4%,and there are 21 mutation sites.Compared with normal skin tissues,the expression of E2F3 gene in melanoma was significantly increased(P<0.01).The mutation and increased expression of E2F3 gene were associated with the shortened overall survival(OS)of melanoma patients(P<0.05).The CNA level of E2F3 was negatively correlated with the expression levels of lymphocytes such as pDC,Neutrophil,Act DC and Th17,and negatively correlated with the expression levels of chemokines such as CXCL5,CCL13 and CCR1.The methylation level of E2F3 was positively correlated with the expression levels of Th1,Neutrophil,Act DC and other lymphocytes,and positively correlated with the expression levels of CXCL16,CXCL12,CCR1 and other chemokines.The expression level of E2F3 was negatively correlated with the expression levels of lymphocytes such as Th17,Tcm CD4 and Th1,and negatively correlated with the expression levels of chemokines such as CXCL 16,CCL 22 and CCL 2.The expression of 96 genes such as UHRF1BP1 in melanoma was significantly correlated with the expression of E2F3(|cor|0.5,P<0.05).The above genes were mainly related to RNA transport,eukaryotic ribosome biogenesis,cell cycle and other pathways.Among them,WDR12,WDR43,RBM28,UTP18,DKC1,PAK1IP1,DDX31,TEX10,TRUB1 and TRMT61B were the top 10 hub genes.YC-1,simvastatin,cytochalasin-d,Deforolimus and cytochalasin-b may be five small molecule compounds for the treatment of melanoma.Conclusion:The mutation and increased expression level of E2F3 gene are related to the poor prognosis of melanoma and participate in the occurrence and development of melanoma by affecting the expression of different tumor immune infiltrating cell subtypes,which may be a potential diagnostic marker and therapeutic target for melanoma.展开更多
Analyzing polysorbate 20(PS20)composition and the impact of each component on stability and safety is crucial due to formulation variations and individual tolerance.The similar structures and polarities of PS20 compon...Analyzing polysorbate 20(PS20)composition and the impact of each component on stability and safety is crucial due to formulation variations and individual tolerance.The similar structures and polarities of PS20 components make accurate separation,identification,and quantification challenging.In this work,a high-resolution quantitative method was developed using single-dimensional high-performance liquid chromatography(HPLC)with charged aerosol detection(CAD)to separate 18 key components with multiple esters.The separated components were characterized by ultra-high-performance liquid chromatography-quadrupole time-of-flight mass spectrometry(UHPLC-Q-TOF-MS)with an identical gradient as the HPLC-CAD analysis.The polysorbate compound database and library were expanded over 7-time compared to the commercial database.The method investigated differences in PS20 samples from various origins and grades for different dosage forms to evaluate the composition-process relationship.UHPLC-Q-TOF-MS identified 1329 to 1511 compounds in 4 batches of PS20 from different sources.The method observed the impact of 4 degradation conditions on peak components,identifying stable components and their tendencies to change.HPLC-CAD and UHPLC-Q-TOF-MS results provided insights into fingerprint differences,distinguishing quasi products.展开更多
Database systems have consistently been prime targets for cyber-attacks and threats due to the critical nature of the data they store.Despite the increasing reliance on database management systems,this field continues...Database systems have consistently been prime targets for cyber-attacks and threats due to the critical nature of the data they store.Despite the increasing reliance on database management systems,this field continues to face numerous cyber-attacks.Database management systems serve as the foundation of any information system or application.Any cyber-attack can result in significant damage to the database system and loss of sensitive data.Consequently,cyber risk classifications and assessments play a crucial role in risk management and establish an essential framework for identifying and responding to cyber threats.Risk assessment aids in understanding the impact of cyber threats and developing appropriate security controls to mitigate risks.The primary objective of this study is to conduct a comprehensive analysis of cyber risks in database management systems,including classifying threats,vulnerabilities,impacts,and countermeasures.This classification helps to identify suitable security controls to mitigate cyber risks for each type of threat.Additionally,this research aims to explore technical countermeasures to protect database systems from cyber threats.This study employs the content analysis method to collect,analyze,and classify data in terms of types of threats,vulnerabilities,and countermeasures.The results indicate that SQL injection attacks and Denial of Service(DoS)attacks were the most prevalent technical threats in database systems,each accounting for 9%of incidents.Vulnerable audit trails,intrusion attempts,and ransomware attacks were classified as the second level of technical threats in database systems,comprising 7%and 5%of incidents,respectively.Furthermore,the findings reveal that insider threats were the most common non-technical threats in database systems,accounting for 5%of incidents.Moreover,the results indicate that weak authentication,unpatched databases,weak audit trails,and multiple usage of an account were the most common technical vulnerabilities in database systems,each accounting for 9%of vulnerabilities.Additionally,software bugs,insecure coding practices,weak security controls,insecure networks,password misuse,weak encryption practices,and weak data masking were classified as the second level of security vulnerabilities in database systems,each accounting for 4%of vulnerabilities.The findings from this work can assist organizations in understanding the types of cyber threats and developing robust strategies against cyber-attacks.展开更多
Advanced glycation end-products(AGEs)are a group of heterogeneous compounds formed in heatprocessed foods and are proven to be detrimental to human health.Currently,there is no comprehensive database for AGEs in foods...Advanced glycation end-products(AGEs)are a group of heterogeneous compounds formed in heatprocessed foods and are proven to be detrimental to human health.Currently,there is no comprehensive database for AGEs in foods that covers the entire range of food categories,which limits the accurate risk assessment of dietary AGEs in human diseases.In this study,we first established an isotope dilution UHPLCQq Q-MS/MS-based method for simultaneous quantification of 10 major AGEs in foods.The contents of these AGEs were detected in 334 foods covering all main groups consumed in Western and Chinese populations.Nε-Carboxymethyllysine,methylglyoxal-derived hydroimidazolone isomers,and glyoxal-derived hydroimidazolone-1 are predominant AGEs found in most foodstuffs.Total amounts of AGEs were high in processed nuts,bakery products,and certain types of cereals and meats(>150 mg/kg),while low in dairy products,vegetables,fruits,and beverages(<40 mg/kg).Assessment of estimated daily intake implied that the contribution of food groups to daily AGE intake varied a lot under different eating patterns,and selection of high-AGE foods leads to up to a 2.7-fold higher intake of AGEs through daily meals.The presented AGE database allows accurate assessment of dietary exposure to these glycotoxins to explore their physiological impacts on human health.展开更多
This study examines the database search behaviors of individuals, focusing on gender differences and the impact of planning habits on information retrieval. Data were collected from a survey of 198 respondents, catego...This study examines the database search behaviors of individuals, focusing on gender differences and the impact of planning habits on information retrieval. Data were collected from a survey of 198 respondents, categorized by their discipline, schooling background, internet usage, and information retrieval preferences. Key findings indicate that females are more likely to plan their searches in advance and prefer structured methods of information retrieval, such as using library portals and leading university websites. Males, however, tend to use web search engines and self-archiving methods more frequently. This analysis provides valuable insights for educational institutions and libraries to optimize their resources and services based on user behavior patterns.展开更多
A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various form...A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various forms of semi-structured,structured,and unstructured information.These systems use a flat architecture and run different types of data analytics.NoSQL databases are nontabular and store data in a different manner than the relational table.NoSQL databases come in various forms,including key-value pairs,documents,wide columns,and graphs,each based on its data model.They offer simpler scalability and generally outperform traditional relational databases.While NoSQL databases can store diverse data types,they lack full support for atomicity,consistency,isolation,and durability features found in relational databases.Consequently,employing machine learning approaches becomes necessary to categorize complex structured query language(SQL)queries.Results indicate that the most frequently used automatic classification technique in processing SQL queries on NoSQL databases is machine learning-based classification.Overall,this study provides an overview of the automatic classification techniques used in processing SQL queries on NoSQL databases.Understanding these techniques can aid in the development of effective and efficient NoSQL database applications.展开更多
The CALPHAD thermodynamic databases are very useful to analyze the complex chemical reactions happening in high temperature material process.The FactSage thermodynamic database can be used to calculate complex phase d...The CALPHAD thermodynamic databases are very useful to analyze the complex chemical reactions happening in high temperature material process.The FactSage thermodynamic database can be used to calculate complex phase diagrams and equilibrium phases involving refractories in industrial process.In this study,the FactSage thermodynamic database relevant to ZrO_(2)-based refractories was reviewed and the application of the database to understanding the corrosion of continuous casting nozzle refractories in steelmaking was presented.展开更多
BACKGROUND Elective cholecystectomy(CCY)is recommended for patients with gallstone-related acute cholangitis(AC)following endoscopic decompression to prevent recurrent biliary events.However,the optimal timing and imp...BACKGROUND Elective cholecystectomy(CCY)is recommended for patients with gallstone-related acute cholangitis(AC)following endoscopic decompression to prevent recurrent biliary events.However,the optimal timing and implications of CCY remain unclear.AIM To examine the impact of same-admission CCY compared to interval CCY on patients with gallstone-related AC using the National Readmission Database(NRD).METHODS We queried the NRD to identify all gallstone-related AC hospitalizations in adult patients with and without the same admission CCY between 2016 and 2020.Our primary outcome was all-cause 30-d readmission rates,and secondary outcomes included in-hospital mortality,length of stay(LOS),and hospitalization cost.RESULTS Among the 124964 gallstone-related AC hospitalizations,only 14.67%underwent the same admission CCY.The all-cause 30-d readmissions in the same admission CCY group were almost half that of the non-CCY group(5.56%vs 11.50%).Patients in the same admission CCY group had a longer mean LOS and higher hospitalization costs attrib-utable to surgery.Although the most common reason for readmission was sepsis in both groups,the second most common reason was AC in the interval CCY group.CONCLUSION Our study suggests that patients with gallstone-related AC who do not undergo the same admission CCY have twice the risk of readmission compared to those who undergo CCY during the same admission.These readmis-sions can potentially be prevented by performing same-admission CCY in appropriate patients,which may reduce subsequent hospitalization costs secondary to readmissions.展开更多
With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enha...With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enhance database query systems, enabling more intuitive and semantic query mechanisms. Our model leverages LLM’s deep learning architecture to interpret and process natural language queries and translate them into accurate database queries. The system integrates an LLM-powered semantic parser that translates user input into structured queries that can be understood by the database management system. First, the user query is pre-processed, the text is normalized, and the ambiguity is removed. This is followed by semantic parsing, where the LLM interprets the pre-processed text and identifies key entities and relationships. This is followed by query generation, which converts the parsed information into a structured query format and tailors it to the target database schema. Finally, there is query execution and feedback, where the resulting query is executed on the database and the results are returned to the user. The system also provides feedback mechanisms to improve and optimize future query interpretations. By using advanced LLMs for model implementation and fine-tuning on diverse datasets, the experimental results show that the proposed method significantly improves the accuracy and usability of database queries, making data retrieval easy for users without specialized knowledge.展开更多
基金supported by the National Natural Science Foundation of China(62073140,62073141)the Shanghai Rising-Star Program(21QA1401800).
文摘Fault diagnosis is important for maintaining the safety and effectiveness of chemical process.Considering the multivariate,nonlinear,and dynamic characteristic of chemical process,many time-series-based data-driven fault diagnosis methods have been developed in recent years.However,the existing methods have the problem of long-term dependency and are difficult to train due to the sequential way of training.To overcome these problems,a novel fault diagnosis method based on time-series and the hierarchical multihead self-attention(HMSAN)is proposed for chemical process.First,a sliding window strategy is adopted to construct the normalized time-series dataset.Second,the HMSAN is developed to extract the time-relevant features from the time-series process data.It improves the basic self-attention model in both width and depth.With the multihead structure,the HMSAN can pay attention to different aspects of the complicated chemical process and obtain the global dynamic features.However,the multiple heads in parallel lead to redundant information,which cannot improve the diagnosis performance.With the hierarchical structure,the redundant information is reduced and the deep local time-related features are further extracted.Besides,a novel many-to-one training strategy is introduced for HMSAN to simplify the training procedure and capture the long-term dependency.Finally,the effectiveness of the proposed method is demonstrated by two chemical cases.The experimental results show that the proposed method achieves a great performance on time-series industrial data and outperforms the state-of-the-art approaches.
文摘Accurate mapping and timely monitoring of urban redevelopment are pivotal for urban studies and decisionmakers to foster sustainable urban development.Traditional mapping methods heavily depend on field surveys and subjective questionnaires,yielding less objective,reliable,and timely data.Recent advancements in Geographic Information Systems(GIS)and remote-sensing technologies have improved the identification and mapping of urban redevelopment through quantitative analysis using satellite-based observations.Nonetheless,challenges persist,particularly concerning accuracy and significant temporal delays.This study introduces a novel approach to modeling urban redevelopment,leveraging machine learning algorithms and remote-sensing data.This methodology can facilitate the accurate and timely identification of urban redevelopment activities.The study’s machine learning model can analyze time-series remote-sensing data to identify spatio-temporal and spectral patterns related to urban redevelopment.The model is thoroughly evaluated,and the results indicate that it can accurately capture the time-series patterns of urban redevelopment.This research’s findings are useful for evaluating urban demographic and economic changes,informing policymaking and urban planning,and contributing to sustainable urban development.The model can also serve as a foundation for future research on early-stage urban redevelopment detection and evaluation of the causes and impacts of urban redevelopment.
基金supported by Graduate Funded Project(No.JY2022A017).
文摘The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random missing(RM)that differs significantly from common missing patterns of RTT-AT.The method for solving the RM may experience performance degradation or failure when applied to RTT-AT imputation.Conventional autoregressive deep learning methods are prone to error accumulation and long-term dependency loss.In this paper,a non-autoregressive imputation model that addresses the issue of missing value imputation for two common missing patterns in RTT-AT is proposed.Our model consists of two probabilistic sparse diagonal masking self-attention(PSDMSA)units and a weight fusion unit.It learns missing values by combining the representations outputted by the two units,aiming to minimize the difference between the missing values and their actual values.The PSDMSA units effectively capture temporal dependencies and attribute correlations between time steps,improving imputation quality.The weight fusion unit automatically updates the weights of the output representations from the two units to obtain a more accurate final representation.The experimental results indicate that,despite varying missing rates in the two missing patterns,our model consistently outperforms other methods in imputation performance and exhibits a low frequency of deviations in estimates for specific missing entries.Compared to the state-of-the-art autoregressive deep learning imputation model Bidirectional Recurrent Imputation for Time Series(BRITS),our proposed model reduces mean absolute error(MAE)by 31%~50%.Additionally,the model attains a training speed that is 4 to 8 times faster when compared to both BRITS and a standard Transformer model when trained on the same dataset.Finally,the findings from the ablation experiments demonstrate that the PSDMSA,the weight fusion unit,cascade network design,and imputation loss enhance imputation performance and confirm the efficacy of our design.
基金supported by the National Natural Science Foundation of China(No.62302242)the China Postdoctoral Science Foundation(No.2023M731802).
文摘The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users with the right to issue a Data Subject Access Request(DSAR).Responding to such requests requires database administrators to identify information related to an individual accurately.However,manual compliance poses significant challenges and is error-prone.Database administrators need to write queries through time-consuming labor.The demand for large amounts of data by AI systems has driven the development of NoSQL databases.Due to the flexible schema of NoSQL databases,identifying personal information becomes even more challenging.This paper develops an automated tool to identify personal information that can help organizations respond to DSAR.Our tool employs a combination of various technologies,including schema extraction of NoSQL databases and relationship identification from query logs.We describe the algorithm used by our tool,detailing how it discovers and extracts implicit relationships from NoSQL databases and generates relationship graphs to help developers accurately identify personal data.We evaluate our tool on three datasets,covering different database designs,achieving an F1 score of 0.77 to 1.Experimental results demonstrate that our tool successfully identifies information relevant to the data subject.Our tool reduces manual effort and simplifies GDPR compliance,showing practical application value in enhancing the privacy performance of NOSQL databases and AI systems.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61888102,52272172,and 52102193)the Major Program of the National Natural Science Foundation of China(Grant No.92163206)+2 种基金the National Key Research and Development Program of China(Grant Nos.2021YFA1201501 and 2022YFA1204100)the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDB30000000)the Fundamental Research Funds for the Central Universities.
文摘Discovery of materials using“bottom-up”or“top-down”approach is of great interest in materials science.Layered materials consisting of two-dimensional(2D)building blocks provide a good platform to explore new materials in this respect.In van der Waals(vdW)layered materials,these building blocks are charge neutral and can be isolated from their bulk phase(top-down),but usually grow on substrate.In ionic layered materials,they are charged and usually cannot exist independently but can serve as motifs to construct new materials(bottom-up).In this paper,we introduce our recently constructed databases for 2D material-substrate interface(2DMSI),and 2D charged building blocks.For 2DMSI database,we systematically build a workflow to predict appropriate substrates and their geometries at substrates,and construct the 2DMSI database.For the 2D charged building block database,1208 entries from bulk material database are identified.Information of crystal structure,valence state,source,dimension and so on is provided for each entry with a json format.We also show its application in designing and searching for new functional layered materials.The 2DMSI database,building block database,and designed layered materials are available in Science Data Bank at https://doi.org/10.57760/sciencedb.j00113.00188.
基金This work was supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0016977,The Establishment Project of Industry-University Fusion District).
文摘The increasing penetration rate of electric kickboard vehicles has been popularized and promoted primarily because of its clean and efficient features.Electric kickboards are gradually growing in popularity in tourist and education-centric localities.In the upcoming arrival of electric kickboard vehicles,deploying a customer rental service is essential.Due to its freefloating nature,the shared electric kickboard is a common and practical means of transportation.Relocation plans for shared electric kickboards are required to increase the quality of service,and forecasting demand for their use in a specific region is crucial.Predicting demand accurately with small data is troublesome.Extensive data is necessary for training machine learning algorithms for effective prediction.Data generation is a method for expanding the amount of data that will be further accessible for training.In this work,we proposed a model that takes time-series customers’electric kickboard demand data as input,pre-processes it,and generates synthetic data according to the original data distribution using generative adversarial networks(GAN).The electric kickboard mobility demand prediction error was reduced when we combined synthetic data with the original data.We proposed Tabular-GAN-Modified-WGAN-GP for generating synthetic data for better prediction results.We modified The Wasserstein GAN-gradient penalty(GP)with the RMSprop optimizer and then employed Spectral Normalization(SN)to improve training stability and faster convergence.Finally,we applied a regression-based blending ensemble technique that can help us to improve performance of demand prediction.We used various evaluation criteria and visual representations to compare our proposed model’s performance.Synthetic data generated by our suggested GAN model is also evaluated.The TGAN-Modified-WGAN-GP model mitigates the overfitting and mode collapse problem,and it also converges faster than previous GAN models for synthetic data creation.The presented model’s performance is compared to existing ensemble and baseline models.The experimental findings imply that combining synthetic and actual data can significantly reduce prediction error rates in the mean absolute percentage error(MAPE)of 4.476 and increase prediction accuracy.
基金This study was reviewed and approved by the Ethics Committee of The First Psychiatric Hospital of Harbin.
文摘BACKGROUND The literature has discussed the relationship between environmental factors and depressive disorders;however,the results are inconsistent in different studies and regions,as are the interaction effects between environmental factors.We hypo-thesized that meteorological factors and ambient air pollution individually affect and interact to affect depressive disorder morbidity.AIM To investigate the effects of meteorological factors and air pollution on depressive disorders,including their lagged effects and interactions.METHODS The samples were obtained from a class 3 hospital in Harbin,China.Daily hos-pital admission data for depressive disorders from January 1,2015 to December 31,2022 were obtained.Meteorological and air pollution data were also collected during the same period.Generalized additive models with quasi-Poisson regre-ssion were used for time-series modeling to measure the non-linear and delayed effects of environmental factors.We further incorporated each pair of environ-mental factors into a bivariate response surface model to examine the interaction effects on hospital admissions for depressive disorders.RESULTS Data for 2922 d were included in the study,with no missing values.The total number of depressive admissions was 83905.Medium to high correlations existed between environmental factors.Air temperature(AT)and wind speed(WS)significantly affected the number of admissions for depression.An extremely low temperature(-29.0℃)at lag 0 caused a 53%[relative risk(RR)=1.53,95%confidence interval(CI):1.23-1.89]increase in daily hospital admissions relative to the median temperature.Extremely low WSs(0.4 m/s)at lag 7 increased the number of admissions by 58%(RR=1.58,95%CI:1.07-2.31).In contrast,atmospheric pressure and relative humidity had smaller effects.Among the six air pollutants considered in the time-series model,nitrogen dioxide(NO_(2))was the only pollutant that showed significant effects over non-cumulative,cumulative,immediate,and lagged conditions.The cumulative effect of NO_(2) at lag 7 was 0.47%(RR=1.0047,95%CI:1.0024-1.0071).Interaction effects were found between AT and the five air pollutants,atmospheric temperature and the four air pollutants,WS and sulfur dioxide.CONCLUSION Meteorological factors and the air pollutant NO_(2) affect daily hospital admissions for depressive disorders,and interactions exist between meteorological factors and ambient air pollution.
基金supported by Energy Cloud R&D Program(grant number:2019M3F2A1073184)through the National Research Foundation of Korea(NRF)funded by the Ministry of Science and ICT.
文摘Multivariate time-series forecasting(MTSF)plays an important role in diverse real-world applications.To achieve better accuracy in MTSF,time-series patterns in each variable and interrelationship patterns between variables should be considered together.Recently,graph neural networks(GNNs)has gained much attention as they can learn both patterns using a graph.For accurate forecasting through GNN,a well-defined graph is required.However,existing GNNs have limitations in reflecting the spectral similarity and time delay between nodes,and consider all nodes with the same weight when constructing graph.In this paper,we propose a novel graph construction method that solves aforementioned limitations.We first calculate the Fourier transform-based spectral similarity and then update this similarity to reflect the time delay.Then,we weight each node according to the number of edge connections to get the final graph and utilize it to train the GNN model.Through experiments on various datasets,we demonstrated that the proposed method enhanced the performance of GNN-based MTSF models,and the proposed forecasting model achieve of up to 18.1%predictive performance improvement over the state-of-the-art model.
文摘In the past two decades,because of the significant increase in the availability of differential interferometry from synthetic aperture radar and GPS data,spaceborne geodesy has been widely employed to determine the co-seismic displacement field of earthquakes.On April 18,2021,a moderate earthquake(Mw 5.8)occurred east of Bandar Ganaveh,southern Iran,followed by intensive seismic activity and aftershocks of various magnitudes.We use two-pass D-InSAR and Small Baseline Inversion techniques via the LiCSBAS suite to study the coseismic displacement and monitor the four-month post-seismic deformation of the Bandar Ganaveh earthquake,as well as constrain the fault geometry of the co-seismic faulting mechanism during the seismic sequence.Analyses show that the co-and postseismic deformation are distributed in relatively shallow depths along with an NW-SE striking and NE dipping complex reverse/thrust fault branches of the Zagros Mountain Front Fault,complying with the main trend of the Zagros structures.The average cumulative displacements were obtained from-137.5 to+113.3 mm/yr in the SW and NE blocks of the Mountain Front Fault,respectively.The received maximum uplift amount is approximately consistent with the overall orogen-normal shortening component of the Arabian-Eurasian convergence in the Zagros region.No surface ruptures were associated with the seismic source;therefore,we propose a shallow blind thrust/reverse fault(depth~10 km)connected to the deeper basal decollement fault within a complex tectonic zone,emphasizing the thin-skinned tectonics.
基金supported by the Ensemble Grant for Early Career Researchers 2022 and the 2023 Ensemble Continuation Grant of Tohoku University,the Hirose Foundation,the Iwatani Naoji Foundation,and the AIMR Fusion Research Grantsupported by JSPS KAKENHI Nos.JP23K13599,JP23K13703,JP22H01803,and JP18H05513+2 种基金the Center for Computational Materials Science,Institute for Materials Research,Tohoku University for the use of MASAMUNEIMR(Nos.202212-SCKXX0204 and 202208-SCKXX-0212)the Institute for Solid State Physics(ISSP)at the University of Tokyo for the use of their supercomputersthe China Scholarship Council(CSC)fund to pursue studies in Japan.
文摘All-solid-state batteries(ASSBs)are a class of safer and higher-energy-density materials compared to conventional devices,from which solid-state electrolytes(SSEs)are their essential components.To date,investigations to search for high ion-conducting solid-state electrolytes have attracted broad concern.However,obtaining SSEs with high ionic conductivity is challenging due to the complex structural information and the less-explored structure-performance relationship.To provide a solution to these challenges,developing a database containing typical SSEs from available experimental reports would be a new avenue to understand the structureperformance relationships and find out new design guidelines for reasonable SSEs.Herein,a dynamic experimental database containing>600 materials was developed in a wide range of temperatures(132.40–1261.60 K),including mono-and divalent cations(e.g.,Li^(+),Na^(+),K^(+),Ag^(+),Ca^(2+),Mg^(2+),and Zn^(2+))and various types of anions(e.g.,halide,hydride,sulfide,and oxide).Data-mining was conducted to explore the relationships among different variates(e.g.,transport ion,composition,activation energy,and conductivity).Overall,we expect that this database can provide essential guidelines for the design and development of high-performance SSEs in ASSB applications.This database is dynamically updated,which can be accessed via our open-source online system.
文摘Electronic patient data gives many advantages,but also new difficulties.Deadlocks may delay procedures like acquiring patient information.Distributed deadlock resolution solutions introduce uncertainty due to inaccurate transaction properties.Soft computing-based solutions have been developed to solve this challenge.In a single framework,ambiguous,vague,incomplete,and inconsistent transaction attribute information has received minimal attention.The work presented in this paper employed type-2 neutrosophic logic,an extension of type-1 neutrosophic logic,to handle uncertainty in real-time deadlock-resolving systems.The proposed method is structured to reflect multiple types of knowledge and relations among transactions’features that include validation factor degree,slackness degree,degree of deadline-missed transaction based on the degree of membership of truthiness,degree ofmembership of indeterminacy,and degree ofmembership of falsity.Here,the footprint of uncertainty(FOU)for truth,indeterminacy,and falsity represents the level of uncertainty that exists in the value of a grade of membership.We employed a distributed real-time transaction processing simulator(DRTTPS)to conduct the simulations and conducted experiments using the benchmark Pima Indians diabetes dataset(PIDD).As the results showed,there is an increase in detection rate and a large drop in rollback rate when this new strategy is used.The performance of Type-2 neutrosophicbased resolution is better than the Type-1 neutrosophic-based approach on the execution ratio scale.The improvement rate has reached 10%to 20%,depending on the number of arrived transactions.
基金National Natural Science Foundation of China (No.82060503)。
文摘Objective:To investigate the variation,expression and clinical significance of E2F3 gene in melanoma.Methods:Firstly,cBioportal database,Oncomine database and GEO database were used to analyze the variation and expression level of E2F3 gene in melanoma.OSskcm database and TISIDB database were used to analyze the relationship between E2F3 and melanoma prognosis and tumor immune infiltrating cells.Then,the LinkedOmics database was used to identify the differential genes related to E2F3 expression in melanoma and analyze their biological functions.Finally,small molecule compounds for the treatment of melanoma were screened through the CMap database.Results:The mutation rate of E2F3 gene in melanoma is about 4%,and there are 21 mutation sites.Compared with normal skin tissues,the expression of E2F3 gene in melanoma was significantly increased(P<0.01).The mutation and increased expression of E2F3 gene were associated with the shortened overall survival(OS)of melanoma patients(P<0.05).The CNA level of E2F3 was negatively correlated with the expression levels of lymphocytes such as pDC,Neutrophil,Act DC and Th17,and negatively correlated with the expression levels of chemokines such as CXCL5,CCL13 and CCR1.The methylation level of E2F3 was positively correlated with the expression levels of Th1,Neutrophil,Act DC and other lymphocytes,and positively correlated with the expression levels of CXCL16,CXCL12,CCR1 and other chemokines.The expression level of E2F3 was negatively correlated with the expression levels of lymphocytes such as Th17,Tcm CD4 and Th1,and negatively correlated with the expression levels of chemokines such as CXCL 16,CCL 22 and CCL 2.The expression of 96 genes such as UHRF1BP1 in melanoma was significantly correlated with the expression of E2F3(|cor|0.5,P<0.05).The above genes were mainly related to RNA transport,eukaryotic ribosome biogenesis,cell cycle and other pathways.Among them,WDR12,WDR43,RBM28,UTP18,DKC1,PAK1IP1,DDX31,TEX10,TRUB1 and TRMT61B were the top 10 hub genes.YC-1,simvastatin,cytochalasin-d,Deforolimus and cytochalasin-b may be five small molecule compounds for the treatment of melanoma.Conclusion:The mutation and increased expression level of E2F3 gene are related to the poor prognosis of melanoma and participate in the occurrence and development of melanoma by affecting the expression of different tumor immune infiltrating cell subtypes,which may be a potential diagnostic marker and therapeutic target for melanoma.
基金financial support from the Science Research Program Project for Drug Regulation,Jiangsu Drug Administration,China(Grant No.:202207)the National Drug Standards Revision Project,China(Grant No.:2023Y41)+1 种基金the National Natural Science Foundation of China(Grant No.:22276080)the Foreign Expert Project,China(Grant No.:G2022014096L).
文摘Analyzing polysorbate 20(PS20)composition and the impact of each component on stability and safety is crucial due to formulation variations and individual tolerance.The similar structures and polarities of PS20 components make accurate separation,identification,and quantification challenging.In this work,a high-resolution quantitative method was developed using single-dimensional high-performance liquid chromatography(HPLC)with charged aerosol detection(CAD)to separate 18 key components with multiple esters.The separated components were characterized by ultra-high-performance liquid chromatography-quadrupole time-of-flight mass spectrometry(UHPLC-Q-TOF-MS)with an identical gradient as the HPLC-CAD analysis.The polysorbate compound database and library were expanded over 7-time compared to the commercial database.The method investigated differences in PS20 samples from various origins and grades for different dosage forms to evaluate the composition-process relationship.UHPLC-Q-TOF-MS identified 1329 to 1511 compounds in 4 batches of PS20 from different sources.The method observed the impact of 4 degradation conditions on peak components,identifying stable components and their tendencies to change.HPLC-CAD and UHPLC-Q-TOF-MS results provided insights into fingerprint differences,distinguishing quasi products.
基金supported by the Deanship of Scientific Research,Vice Presidency for Graduate Studies and Scientific Research,King Faisal University,Saudi Arabia(Grant No.KFU242068).
文摘Database systems have consistently been prime targets for cyber-attacks and threats due to the critical nature of the data they store.Despite the increasing reliance on database management systems,this field continues to face numerous cyber-attacks.Database management systems serve as the foundation of any information system or application.Any cyber-attack can result in significant damage to the database system and loss of sensitive data.Consequently,cyber risk classifications and assessments play a crucial role in risk management and establish an essential framework for identifying and responding to cyber threats.Risk assessment aids in understanding the impact of cyber threats and developing appropriate security controls to mitigate risks.The primary objective of this study is to conduct a comprehensive analysis of cyber risks in database management systems,including classifying threats,vulnerabilities,impacts,and countermeasures.This classification helps to identify suitable security controls to mitigate cyber risks for each type of threat.Additionally,this research aims to explore technical countermeasures to protect database systems from cyber threats.This study employs the content analysis method to collect,analyze,and classify data in terms of types of threats,vulnerabilities,and countermeasures.The results indicate that SQL injection attacks and Denial of Service(DoS)attacks were the most prevalent technical threats in database systems,each accounting for 9%of incidents.Vulnerable audit trails,intrusion attempts,and ransomware attacks were classified as the second level of technical threats in database systems,comprising 7%and 5%of incidents,respectively.Furthermore,the findings reveal that insider threats were the most common non-technical threats in database systems,accounting for 5%of incidents.Moreover,the results indicate that weak authentication,unpatched databases,weak audit trails,and multiple usage of an account were the most common technical vulnerabilities in database systems,each accounting for 9%of vulnerabilities.Additionally,software bugs,insecure coding practices,weak security controls,insecure networks,password misuse,weak encryption practices,and weak data masking were classified as the second level of security vulnerabilities in database systems,each accounting for 4%of vulnerabilities.The findings from this work can assist organizations in understanding the types of cyber threats and developing robust strategies against cyber-attacks.
基金the financial support received from the Natural Science Foundation of China(32202202 and 31871735)。
文摘Advanced glycation end-products(AGEs)are a group of heterogeneous compounds formed in heatprocessed foods and are proven to be detrimental to human health.Currently,there is no comprehensive database for AGEs in foods that covers the entire range of food categories,which limits the accurate risk assessment of dietary AGEs in human diseases.In this study,we first established an isotope dilution UHPLCQq Q-MS/MS-based method for simultaneous quantification of 10 major AGEs in foods.The contents of these AGEs were detected in 334 foods covering all main groups consumed in Western and Chinese populations.Nε-Carboxymethyllysine,methylglyoxal-derived hydroimidazolone isomers,and glyoxal-derived hydroimidazolone-1 are predominant AGEs found in most foodstuffs.Total amounts of AGEs were high in processed nuts,bakery products,and certain types of cereals and meats(>150 mg/kg),while low in dairy products,vegetables,fruits,and beverages(<40 mg/kg).Assessment of estimated daily intake implied that the contribution of food groups to daily AGE intake varied a lot under different eating patterns,and selection of high-AGE foods leads to up to a 2.7-fold higher intake of AGEs through daily meals.The presented AGE database allows accurate assessment of dietary exposure to these glycotoxins to explore their physiological impacts on human health.
文摘This study examines the database search behaviors of individuals, focusing on gender differences and the impact of planning habits on information retrieval. Data were collected from a survey of 198 respondents, categorized by their discipline, schooling background, internet usage, and information retrieval preferences. Key findings indicate that females are more likely to plan their searches in advance and prefer structured methods of information retrieval, such as using library portals and leading university websites. Males, however, tend to use web search engines and self-archiving methods more frequently. This analysis provides valuable insights for educational institutions and libraries to optimize their resources and services based on user behavior patterns.
基金supported by the Student Scheme provided by Universiti Kebangsaan Malaysia with the Code TAP-20558.
文摘A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various forms of semi-structured,structured,and unstructured information.These systems use a flat architecture and run different types of data analytics.NoSQL databases are nontabular and store data in a different manner than the relational table.NoSQL databases come in various forms,including key-value pairs,documents,wide columns,and graphs,each based on its data model.They offer simpler scalability and generally outperform traditional relational databases.While NoSQL databases can store diverse data types,they lack full support for atomicity,consistency,isolation,and durability features found in relational databases.Consequently,employing machine learning approaches becomes necessary to categorize complex structured query language(SQL)queries.Results indicate that the most frequently used automatic classification technique in processing SQL queries on NoSQL databases is machine learning-based classification.Overall,this study provides an overview of the automatic classification techniques used in processing SQL queries on NoSQL databases.Understanding these techniques can aid in the development of effective and efficient NoSQL database applications.
基金Tata Steel Netherlands,Posco,Hyundai Steel,Nucor Steel,RioTinto,Nippon Steel Corp.,JFE Steel,Voestalpine,RHi-Magnesita,Doosan Enerbility,Seah Besteel,Umicore,Vesuvius and Schott AG are gratefully acknowledged.
文摘The CALPHAD thermodynamic databases are very useful to analyze the complex chemical reactions happening in high temperature material process.The FactSage thermodynamic database can be used to calculate complex phase diagrams and equilibrium phases involving refractories in industrial process.In this study,the FactSage thermodynamic database relevant to ZrO_(2)-based refractories was reviewed and the application of the database to understanding the corrosion of continuous casting nozzle refractories in steelmaking was presented.
文摘BACKGROUND Elective cholecystectomy(CCY)is recommended for patients with gallstone-related acute cholangitis(AC)following endoscopic decompression to prevent recurrent biliary events.However,the optimal timing and implications of CCY remain unclear.AIM To examine the impact of same-admission CCY compared to interval CCY on patients with gallstone-related AC using the National Readmission Database(NRD).METHODS We queried the NRD to identify all gallstone-related AC hospitalizations in adult patients with and without the same admission CCY between 2016 and 2020.Our primary outcome was all-cause 30-d readmission rates,and secondary outcomes included in-hospital mortality,length of stay(LOS),and hospitalization cost.RESULTS Among the 124964 gallstone-related AC hospitalizations,only 14.67%underwent the same admission CCY.The all-cause 30-d readmissions in the same admission CCY group were almost half that of the non-CCY group(5.56%vs 11.50%).Patients in the same admission CCY group had a longer mean LOS and higher hospitalization costs attrib-utable to surgery.Although the most common reason for readmission was sepsis in both groups,the second most common reason was AC in the interval CCY group.CONCLUSION Our study suggests that patients with gallstone-related AC who do not undergo the same admission CCY have twice the risk of readmission compared to those who undergo CCY during the same admission.These readmis-sions can potentially be prevented by performing same-admission CCY in appropriate patients,which may reduce subsequent hospitalization costs secondary to readmissions.
文摘With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enhance database query systems, enabling more intuitive and semantic query mechanisms. Our model leverages LLM’s deep learning architecture to interpret and process natural language queries and translate them into accurate database queries. The system integrates an LLM-powered semantic parser that translates user input into structured queries that can be understood by the database management system. First, the user query is pre-processed, the text is normalized, and the ambiguity is removed. This is followed by semantic parsing, where the LLM interprets the pre-processed text and identifies key entities and relationships. This is followed by query generation, which converts the parsed information into a structured query format and tailors it to the target database schema. Finally, there is query execution and feedback, where the resulting query is executed on the database and the results are returned to the user. The system also provides feedback mechanisms to improve and optimize future query interpretations. By using advanced LLMs for model implementation and fine-tuning on diverse datasets, the experimental results show that the proposed method significantly improves the accuracy and usability of database queries, making data retrieval easy for users without specialized knowledge.