All-solid-state batteries(ASSBs)are a class of safer and higher-energy-density materials compared to conventional devices,from which solid-state electrolytes(SSEs)are their essential components.To date,investigations ...All-solid-state batteries(ASSBs)are a class of safer and higher-energy-density materials compared to conventional devices,from which solid-state electrolytes(SSEs)are their essential components.To date,investigations to search for high ion-conducting solid-state electrolytes have attracted broad concern.However,obtaining SSEs with high ionic conductivity is challenging due to the complex structural information and the less-explored structure-performance relationship.To provide a solution to these challenges,developing a database containing typical SSEs from available experimental reports would be a new avenue to understand the structureperformance relationships and find out new design guidelines for reasonable SSEs.Herein,a dynamic experimental database containing>600 materials was developed in a wide range of temperatures(132.40–1261.60 K),including mono-and divalent cations(e.g.,Li^(+),Na^(+),K^(+),Ag^(+),Ca^(2+),Mg^(2+),and Zn^(2+))and various types of anions(e.g.,halide,hydride,sulfide,and oxide).Data-mining was conducted to explore the relationships among different variates(e.g.,transport ion,composition,activation energy,and conductivity).Overall,we expect that this database can provide essential guidelines for the design and development of high-performance SSEs in ASSB applications.This database is dynamically updated,which can be accessed via our open-source online system.展开更多
Analyzing polysorbate 20(PS20)composition and the impact of each component on stability and safety is crucial due to formulation variations and individual tolerance.The similar structures and polarities of PS20 compon...Analyzing polysorbate 20(PS20)composition and the impact of each component on stability and safety is crucial due to formulation variations and individual tolerance.The similar structures and polarities of PS20 components make accurate separation,identification,and quantification challenging.In this work,a high-resolution quantitative method was developed using single-dimensional high-performance liquid chromatography(HPLC)with charged aerosol detection(CAD)to separate 18 key components with multiple esters.The separated components were characterized by ultra-high-performance liquid chromatography-quadrupole time-of-flight mass spectrometry(UHPLC-Q-TOF-MS)with an identical gradient as the HPLC-CAD analysis.The polysorbate compound database and library were expanded over 7-time compared to the commercial database.The method investigated differences in PS20 samples from various origins and grades for different dosage forms to evaluate the composition-process relationship.UHPLC-Q-TOF-MS identified 1329 to 1511 compounds in 4 batches of PS20 from different sources.The method observed the impact of 4 degradation conditions on peak components,identifying stable components and their tendencies to change.HPLC-CAD and UHPLC-Q-TOF-MS results provided insights into fingerprint differences,distinguishing quasi products.展开更多
The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users wit...The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users with the right to issue a Data Subject Access Request(DSAR).Responding to such requests requires database administrators to identify information related to an individual accurately.However,manual compliance poses significant challenges and is error-prone.Database administrators need to write queries through time-consuming labor.The demand for large amounts of data by AI systems has driven the development of NoSQL databases.Due to the flexible schema of NoSQL databases,identifying personal information becomes even more challenging.This paper develops an automated tool to identify personal information that can help organizations respond to DSAR.Our tool employs a combination of various technologies,including schema extraction of NoSQL databases and relationship identification from query logs.We describe the algorithm used by our tool,detailing how it discovers and extracts implicit relationships from NoSQL databases and generates relationship graphs to help developers accurately identify personal data.We evaluate our tool on three datasets,covering different database designs,achieving an F1 score of 0.77 to 1.Experimental results demonstrate that our tool successfully identifies information relevant to the data subject.Our tool reduces manual effort and simplifies GDPR compliance,showing practical application value in enhancing the privacy performance of NOSQL databases and AI systems.展开更多
Database systems have consistently been prime targets for cyber-attacks and threats due to the critical nature of the data they store.Despite the increasing reliance on database management systems,this field continues...Database systems have consistently been prime targets for cyber-attacks and threats due to the critical nature of the data they store.Despite the increasing reliance on database management systems,this field continues to face numerous cyber-attacks.Database management systems serve as the foundation of any information system or application.Any cyber-attack can result in significant damage to the database system and loss of sensitive data.Consequently,cyber risk classifications and assessments play a crucial role in risk management and establish an essential framework for identifying and responding to cyber threats.Risk assessment aids in understanding the impact of cyber threats and developing appropriate security controls to mitigate risks.The primary objective of this study is to conduct a comprehensive analysis of cyber risks in database management systems,including classifying threats,vulnerabilities,impacts,and countermeasures.This classification helps to identify suitable security controls to mitigate cyber risks for each type of threat.Additionally,this research aims to explore technical countermeasures to protect database systems from cyber threats.This study employs the content analysis method to collect,analyze,and classify data in terms of types of threats,vulnerabilities,and countermeasures.The results indicate that SQL injection attacks and Denial of Service(DoS)attacks were the most prevalent technical threats in database systems,each accounting for 9%of incidents.Vulnerable audit trails,intrusion attempts,and ransomware attacks were classified as the second level of technical threats in database systems,comprising 7%and 5%of incidents,respectively.Furthermore,the findings reveal that insider threats were the most common non-technical threats in database systems,accounting for 5%of incidents.Moreover,the results indicate that weak authentication,unpatched databases,weak audit trails,and multiple usage of an account were the most common technical vulnerabilities in database systems,each accounting for 9%of vulnerabilities.Additionally,software bugs,insecure coding practices,weak security controls,insecure networks,password misuse,weak encryption practices,and weak data masking were classified as the second level of security vulnerabilities in database systems,each accounting for 4%of vulnerabilities.The findings from this work can assist organizations in understanding the types of cyber threats and developing robust strategies against cyber-attacks.展开更多
Advanced glycation end-products(AGEs)are a group of heterogeneous compounds formed in heatprocessed foods and are proven to be detrimental to human health.Currently,there is no comprehensive database for AGEs in foods...Advanced glycation end-products(AGEs)are a group of heterogeneous compounds formed in heatprocessed foods and are proven to be detrimental to human health.Currently,there is no comprehensive database for AGEs in foods that covers the entire range of food categories,which limits the accurate risk assessment of dietary AGEs in human diseases.In this study,we first established an isotope dilution UHPLCQq Q-MS/MS-based method for simultaneous quantification of 10 major AGEs in foods.The contents of these AGEs were detected in 334 foods covering all main groups consumed in Western and Chinese populations.Nε-Carboxymethyllysine,methylglyoxal-derived hydroimidazolone isomers,and glyoxal-derived hydroimidazolone-1 are predominant AGEs found in most foodstuffs.Total amounts of AGEs were high in processed nuts,bakery products,and certain types of cereals and meats(>150 mg/kg),while low in dairy products,vegetables,fruits,and beverages(<40 mg/kg).Assessment of estimated daily intake implied that the contribution of food groups to daily AGE intake varied a lot under different eating patterns,and selection of high-AGE foods leads to up to a 2.7-fold higher intake of AGEs through daily meals.The presented AGE database allows accurate assessment of dietary exposure to these glycotoxins to explore their physiological impacts on human health.展开更多
This study examines the database search behaviors of individuals, focusing on gender differences and the impact of planning habits on information retrieval. Data were collected from a survey of 198 respondents, catego...This study examines the database search behaviors of individuals, focusing on gender differences and the impact of planning habits on information retrieval. Data were collected from a survey of 198 respondents, categorized by their discipline, schooling background, internet usage, and information retrieval preferences. Key findings indicate that females are more likely to plan their searches in advance and prefer structured methods of information retrieval, such as using library portals and leading university websites. Males, however, tend to use web search engines and self-archiving methods more frequently. This analysis provides valuable insights for educational institutions and libraries to optimize their resources and services based on user behavior patterns.展开更多
Discovery of materials using“bottom-up”or“top-down”approach is of great interest in materials science.Layered materials consisting of two-dimensional(2D)building blocks provide a good platform to explore new mater...Discovery of materials using“bottom-up”or“top-down”approach is of great interest in materials science.Layered materials consisting of two-dimensional(2D)building blocks provide a good platform to explore new materials in this respect.In van der Waals(vdW)layered materials,these building blocks are charge neutral and can be isolated from their bulk phase(top-down),but usually grow on substrate.In ionic layered materials,they are charged and usually cannot exist independently but can serve as motifs to construct new materials(bottom-up).In this paper,we introduce our recently constructed databases for 2D material-substrate interface(2DMSI),and 2D charged building blocks.For 2DMSI database,we systematically build a workflow to predict appropriate substrates and their geometries at substrates,and construct the 2DMSI database.For the 2D charged building block database,1208 entries from bulk material database are identified.Information of crystal structure,valence state,source,dimension and so on is provided for each entry with a json format.We also show its application in designing and searching for new functional layered materials.The 2DMSI database,building block database,and designed layered materials are available in Science Data Bank at https://doi.org/10.57760/sciencedb.j00113.00188.展开更多
A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various form...A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various forms of semi-structured,structured,and unstructured information.These systems use a flat architecture and run different types of data analytics.NoSQL databases are nontabular and store data in a different manner than the relational table.NoSQL databases come in various forms,including key-value pairs,documents,wide columns,and graphs,each based on its data model.They offer simpler scalability and generally outperform traditional relational databases.While NoSQL databases can store diverse data types,they lack full support for atomicity,consistency,isolation,and durability features found in relational databases.Consequently,employing machine learning approaches becomes necessary to categorize complex structured query language(SQL)queries.Results indicate that the most frequently used automatic classification technique in processing SQL queries on NoSQL databases is machine learning-based classification.Overall,this study provides an overview of the automatic classification techniques used in processing SQL queries on NoSQL databases.Understanding these techniques can aid in the development of effective and efficient NoSQL database applications.展开更多
The CALPHAD thermodynamic databases are very useful to analyze the complex chemical reactions happening in high temperature material process.The FactSage thermodynamic database can be used to calculate complex phase d...The CALPHAD thermodynamic databases are very useful to analyze the complex chemical reactions happening in high temperature material process.The FactSage thermodynamic database can be used to calculate complex phase diagrams and equilibrium phases involving refractories in industrial process.In this study,the FactSage thermodynamic database relevant to ZrO_(2)-based refractories was reviewed and the application of the database to understanding the corrosion of continuous casting nozzle refractories in steelmaking was presented.展开更多
BACKGROUND Elective cholecystectomy(CCY)is recommended for patients with gallstone-related acute cholangitis(AC)following endoscopic decompression to prevent recurrent biliary events.However,the optimal timing and imp...BACKGROUND Elective cholecystectomy(CCY)is recommended for patients with gallstone-related acute cholangitis(AC)following endoscopic decompression to prevent recurrent biliary events.However,the optimal timing and implications of CCY remain unclear.AIM To examine the impact of same-admission CCY compared to interval CCY on patients with gallstone-related AC using the National Readmission Database(NRD).METHODS We queried the NRD to identify all gallstone-related AC hospitalizations in adult patients with and without the same admission CCY between 2016 and 2020.Our primary outcome was all-cause 30-d readmission rates,and secondary outcomes included in-hospital mortality,length of stay(LOS),and hospitalization cost.RESULTS Among the 124964 gallstone-related AC hospitalizations,only 14.67%underwent the same admission CCY.The all-cause 30-d readmissions in the same admission CCY group were almost half that of the non-CCY group(5.56%vs 11.50%).Patients in the same admission CCY group had a longer mean LOS and higher hospitalization costs attrib-utable to surgery.Although the most common reason for readmission was sepsis in both groups,the second most common reason was AC in the interval CCY group.CONCLUSION Our study suggests that patients with gallstone-related AC who do not undergo the same admission CCY have twice the risk of readmission compared to those who undergo CCY during the same admission.These readmis-sions can potentially be prevented by performing same-admission CCY in appropriate patients,which may reduce subsequent hospitalization costs secondary to readmissions.展开更多
With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enha...With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enhance database query systems, enabling more intuitive and semantic query mechanisms. Our model leverages LLM’s deep learning architecture to interpret and process natural language queries and translate them into accurate database queries. The system integrates an LLM-powered semantic parser that translates user input into structured queries that can be understood by the database management system. First, the user query is pre-processed, the text is normalized, and the ambiguity is removed. This is followed by semantic parsing, where the LLM interprets the pre-processed text and identifies key entities and relationships. This is followed by query generation, which converts the parsed information into a structured query format and tailors it to the target database schema. Finally, there is query execution and feedback, where the resulting query is executed on the database and the results are returned to the user. The system also provides feedback mechanisms to improve and optimize future query interpretations. By using advanced LLMs for model implementation and fine-tuning on diverse datasets, the experimental results show that the proposed method significantly improves the accuracy and usability of database queries, making data retrieval easy for users without specialized knowledge.展开更多
Antibiotic resistance,which is encoded by antibiotic-resistance genes(ARGs),has proliferated to become a growing threat to public health around the world.With technical advances,especially in the popularization of met...Antibiotic resistance,which is encoded by antibiotic-resistance genes(ARGs),has proliferated to become a growing threat to public health around the world.With technical advances,especially in the popularization of metagenomic sequencing,scientists have gained the ability to decipher the profiles of ARGs in diverse samples with high accuracy at an accelerated speed.To analyze thousands of ARGs in a highthroughput way,standardized and integrated pipelines are needed.The new version(v3.0)of the widely used ARGs online analysis pipeline(ARGs-OAP)has made significant improvements to both the reference database-the structured ARG(SARG)database-and the integrated analysis pipeline.SARG has been enhanced with sequence curation to improve annotation reliability,incorporate emerging resistance genotypes,and determine rigorous mechanism classification.The database has been further organized and visualized online in the format of a tree-like structure with a dictionary.It has also been divided into sub-databases for different application scenarios.In addition,the ARGs-OAP has been improved with adjusted quantification methods,simplified tool implementation,and multiple functions with userdefined reference databases.Moreover,the online platform now provides a diverse biostatistical analysis workflow with visualization packages for the efficient interpretation of ARG profiles.The ARGs-OAP v3.0 with an improved database and analysis pipeline will benefit academia,governmental management,and consultation regarding risk assessment of the environmental prevalence of ARGs.展开更多
The bone extracellular matrix(ECM) contains minerals deposited on highly crosslinked collagen fibrils and hundreds of noncollagenous proteins. Some of these proteins are key to the regulation of bone formation and reg...The bone extracellular matrix(ECM) contains minerals deposited on highly crosslinked collagen fibrils and hundreds of noncollagenous proteins. Some of these proteins are key to the regulation of bone formation and regeneration via signaling pathways,and play important regulatory and structural roles. However, the complete list of bone extracellular matrix proteins, their roles, and the extent of individual and cross-species variations have not been fully captured in both humans and model organisms. Here, we introduce the most comprehensive resource of bone extracellular matrix(ECM) proteins that can be used in research fields such as bone regeneration, osteoporosis, and mechanobiology. The Phylobone database(available at https://phylobone.com) includes 255proteins potentially expressed in the bone extracellular matrix(ECM) of humans and 30 species of vertebrates. A bioinformatics pipeline was used to identify the evolutionary relationships of bone ECM proteins. The analysis facilitated the identification of potential model organisms to study the molecular mechanisms of bone regeneration. A network analysis showed high connectivity of bone ECM proteins. A total of 214 functional protein domains were identified, including collagen and the domains involved in bone formation and resorption. Information from public drug repositories was used to identify potential repurposing of existing drugs. The Phylobone database provides a platform to study bone regeneration and osteoporosis in light of(biological) evolution,and will substantially contribute to the identification of molecular mechanisms and drug targets.展开更多
Background:Skin aging has recently gained significant attention in both society and skin care research.Understanding the biological processes of photoaging caused by long-term skin exposure to ultraviolet radiation is...Background:Skin aging has recently gained significant attention in both society and skin care research.Understanding the biological processes of photoaging caused by long-term skin exposure to ultraviolet radiation is critical for preventing and treating skin aging.Therefore,it is important to identify genes related to skin photoaging and shed light on their functions.Methods:We used data from the Gene Expression Omnibus(GEO)database and conducted bioinformatics analyses to screen and extract microRNAs(miRNAs)and their downstream target genes related to skin photoaging,and to determine possible biological mechanisms of skin photoaging.Results:A total of 34 differentially expressed miRNAs and their downstream target genes potentially related to the biological process of skin photoaging were identified.Gene Ontology enrichment analysis and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis showed that these target genes were enriched in pathways related to human papillomavirus infection,extracellular matrix(ECM)-receptor signaling,estrogen receptor,skin development,epidermal development,epidermal cell differentiation,keratinocyte differentiation,structural components of the ECM,structural components of the skin epidermis,and others.Conclusion:Based on the GEO database-derived findings,we determined that target genes of two miRNAs,namely miR-4667-5P-KRT79 and miR-139-5P-FOS,play an important role in skin photoaging.These observations could provide theoretical support and guidance for further research on skin aging-related biological processes.展开更多
CHDTEPDB(URL:http://chdtepdb.com/)is a manually integrated database for congenital heart disease(CHD)that stores the expression profiling data of CHD derived from published papers,aiming to provide rich resources for i...CHDTEPDB(URL:http://chdtepdb.com/)is a manually integrated database for congenital heart disease(CHD)that stores the expression profiling data of CHD derived from published papers,aiming to provide rich resources for investigating a deeper correlation between human CHD and aberrant transcriptome expression.The develop-ment of human diseases involves important regulatory roles of RNAs,and expression profiling data can reflect the underlying etiology of inherited diseases.Hence,collecting and compiling expression profiling data is of critical significance for a comprehensive understanding of the mechanisms and functions that underpin genetic diseases.CHDTEPDB stores the expression profiles of over 200 sets of 7 types of CHD and provides users with more convenient basic analytical functions.Due to the differences in clinical indicators such as disease type and unavoidable detection errors among various datasets,users are able to customize their selection of corresponding data for personalized analysis.Moreover,we provide a submission page for researchers to submit their own data so that increasing expression profiles as well as some other histological data could be supplemented to the database.CHDTEPDB is a user-friendly interface that allows users to quickly browse,retrieve,download,and analyze their target samples.CHDTEPDB will significantly improve the current knowledge of expression profiling data in CHD and has the potential to be exploited as an important tool for future research on the disease.展开更多
Pulsar polarization profiles form a very basic database for understanding the emission processes in a pulsar magnetosphere.After careful polarization calibration of the 19-beam L-band receiver and verification of beam...Pulsar polarization profiles form a very basic database for understanding the emission processes in a pulsar magnetosphere.After careful polarization calibration of the 19-beam L-band receiver and verification of beamoffset observation results,we obtain polarization profiles of 682 pulsars from observations by the Five-hundredmeter Aperture Spherical radio Telescope(FAST)duringthe Galactic Plane Pulsar Snapshot survey and other normal FAST projects.Among them,polarization profiles of about 460 pulsars are observed for the first time.The profiles exhibit diverse features.Some pulsars have a polarization position angle curve with a good S-shaped swing,some with orthogonal modes;some have components with highly linearly polarized components or strong circularly polarized components;some have a very wide profile,coming from an aligned rotator,and some have an interpulse from a perpendicular rotator;some wide profiles are caused by interstellar scattering.We derive geometric parameters for 190 pulsars from the S-shaped position angle curves or with orthogonal modes.We find that the linear and circular polarization or the widths of pulse profiles have various frequency dependencies.Pulsars with a large fraction of linear polarization are more likely to have a large Edot.展开更多
With the development of the digital city,data and data analysis have become more and more important.The database is the foundation of data analysis.In this paper,the software system of the urban land planning database...With the development of the digital city,data and data analysis have become more and more important.The database is the foundation of data analysis.In this paper,the software system of the urban land planning database of Shanghai in China is developed based on MySQL.The conceptual model of the urban land planning database is proposed,and the entities,attributes and connections of this model are discussed.Then the E-R conceptual model is transformed into a logical structure,which is supported by the relational databasemanagement system(DBMS).Based on the conceptual and logical structures,by using Spring Boot as the back-end framework and using MySQL and Java API as the development tools,a platformwith datamanagement,information sharing,map assistance and other functions is established.The functionalmodules in this platformare designed.The results of J Meter test show that the DBMS can add,store and retrieve information data stably,and it has the advantages of fast response and low error rate.The software system of the urban land planning database developed in this paper can improve the efficiency of storing and managing land data,eliminating redundant data and sharing data.展开更多
Objective We aimed to identify new,more accurate risk factors of liver transplantation for liver cancer through using the Surveillance,Epidemiology,and End Results(SEER)database.Methods Using the SEER database,we iden...Objective We aimed to identify new,more accurate risk factors of liver transplantation for liver cancer through using the Surveillance,Epidemiology,and End Results(SEER)database.Methods Using the SEER database,we identified patients that had undergone surgical resection for non-metastatic hepatocellular carcinoma(HCC)and subsequent liver transplantation between 2010 and 2017.Overall survival(OS)was estimated using Kaplan-Meier plotter.Cox proportional hazards regression modelling was used to identify factors independently associated with recurrent disease[presented as adjusted hazard ratios(HR)with 95%CIs].Results Totally,1530 eligible patients were included in the analysis.There were significant differences in ethnicity(P=0.04),cancer stage(P<0.001),vascular invasion(P<0.001)and gall bladder involvement(P<0.001)between the groups that survived,died due to cancer,or died due to other causes.In the Cox regression model,there were no significant differences in OS at 5 years with different operative strategies(autotransplantation versus allotransplantation),nor at survival at 1 year with neoadjuvant radiotherapy.However,neoadjuvant radiotherapy did appear to improve survival at both 3 years(HR:0.540,95%CI:0.326–0.896,P=0.017)and 5 years(HR:0.338,95%CI:0.153–0.747,P=0.007)from diagnosis.Conclusion This study demonstrated differences in patient characteristics between prognostic groups after liver resection and transplantation for HCC.These criteria can be used to inform patient selection and consent in this setting.Preoperative radiotherapy may improve long-term survival post-transplantation.展开更多
The utilization of digital picture search and retrieval has grown substantially in numerous fields for different purposes during the last decade,owing to the continuing advances in image processing and computer vision...The utilization of digital picture search and retrieval has grown substantially in numerous fields for different purposes during the last decade,owing to the continuing advances in image processing and computer vision approaches.In multiple real-life applications,for example,social media,content-based face picture retrieval is a well-invested technique for large-scale databases,where there is a significant necessity for reliable retrieval capabilities enabling quick search in a vast number of pictures.Humans widely employ faces for recognizing and identifying people.Thus,face recognition through formal or personal pictures is increasingly used in various real-life applications,such as helping crime investigators retrieve matching images from face image databases to identify victims and criminals.However,such face image retrieval becomes more challenging in large-scale databases,where traditional vision-based face analysis requires ample additional storage space than the raw face images already occupied to store extracted lengthy feature vectors and takes much longer to process and match thousands of face images.This work mainly contributes to enhancing face image retrieval performance in large-scale databases using hash codes inferred by locality-sensitive hashing(LSH)for facial hard and soft biometrics as(Hard BioHash)and(Soft BioHash),respectively,to be used as a search input for retrieving the top-k matching faces.Moreover,we propose the multi-biometric score-level fusion of both face hard and soft BioHashes(Hard-Soft BioHash Fusion)for further augmented face image retrieval.The experimental outcomes applied on the Labeled Faces in the Wild(LFW)dataset and the related attributes dataset(LFW-attributes),demonstrate that the retrieval performance of the suggested fusion approach(Hard-Soft BioHash Fusion)significantly improved the retrieval performance compared to solely using Hard BioHash or Soft BioHash in isolation,where the suggested method provides an augmented accuracy of 87%when executed on 1000 specimens and 77%on 5743 samples.These results remarkably outperform the results of the Hard BioHash method by(50%on the 1000 samples and 30%on the 5743 samples),and the Soft BioHash method by(78%on the 1000 samples and 63%on the 5743 samples).展开更多
Rosaceae is a large plant family consisting of many economically important fruit crops including peach,apple,pear,strawberry,raspberry,plum,and others.Investigations into their growth and development will promote both...Rosaceae is a large plant family consisting of many economically important fruit crops including peach,apple,pear,strawberry,raspberry,plum,and others.Investigations into their growth and development will promote both basic understanding and progress toward increasing fruit yield and quality.With the ever-increasing high-throughput sequencing data of Rosaceae,comparative studies are hindered by inconsistency of sample collection with regard to tissue,stage,growth conditions,and by vastly different handling of the data.Therefore,databases that enable easy access and effective utilization of directly comparable transcript data are highly desirable.Here,we describe a database for comparative analysis,ROsaceae Fruit Transcriptome database(ROFT),based on RNA-seq data generated from the same laboratory using similarly dissected and staged fruit tissues of four important Rosaceae fruit crops:apple,peach,strawberry,and red raspberry.Hence,the database is unique in allowing easy and robust comparisons among fruit gene expression across the four species.ROFT enables researchers to query orthologous genes and their expression patterns during different fruit developmental stages in the four species,identify tissue-specific and tissue-/stage-specific genes,visualize and compare ortholog expression in different fruit types,explore consensus co-expression networks,and download different data types.The database provides users access to vast amounts of RNA-seq data across the four economically important fruits,enables investigations of fruit type specification and evolution,and facilitates the selection of genes with critical roles in fruit development for further studies.展开更多
基金supported by the Ensemble Grant for Early Career Researchers 2022 and the 2023 Ensemble Continuation Grant of Tohoku University,the Hirose Foundation,the Iwatani Naoji Foundation,and the AIMR Fusion Research Grantsupported by JSPS KAKENHI Nos.JP23K13599,JP23K13703,JP22H01803,and JP18H05513+2 种基金the Center for Computational Materials Science,Institute for Materials Research,Tohoku University for the use of MASAMUNEIMR(Nos.202212-SCKXX0204 and 202208-SCKXX-0212)the Institute for Solid State Physics(ISSP)at the University of Tokyo for the use of their supercomputersthe China Scholarship Council(CSC)fund to pursue studies in Japan.
文摘All-solid-state batteries(ASSBs)are a class of safer and higher-energy-density materials compared to conventional devices,from which solid-state electrolytes(SSEs)are their essential components.To date,investigations to search for high ion-conducting solid-state electrolytes have attracted broad concern.However,obtaining SSEs with high ionic conductivity is challenging due to the complex structural information and the less-explored structure-performance relationship.To provide a solution to these challenges,developing a database containing typical SSEs from available experimental reports would be a new avenue to understand the structureperformance relationships and find out new design guidelines for reasonable SSEs.Herein,a dynamic experimental database containing>600 materials was developed in a wide range of temperatures(132.40–1261.60 K),including mono-and divalent cations(e.g.,Li^(+),Na^(+),K^(+),Ag^(+),Ca^(2+),Mg^(2+),and Zn^(2+))and various types of anions(e.g.,halide,hydride,sulfide,and oxide).Data-mining was conducted to explore the relationships among different variates(e.g.,transport ion,composition,activation energy,and conductivity).Overall,we expect that this database can provide essential guidelines for the design and development of high-performance SSEs in ASSB applications.This database is dynamically updated,which can be accessed via our open-source online system.
基金financial support from the Science Research Program Project for Drug Regulation,Jiangsu Drug Administration,China(Grant No.:202207)the National Drug Standards Revision Project,China(Grant No.:2023Y41)+1 种基金the National Natural Science Foundation of China(Grant No.:22276080)the Foreign Expert Project,China(Grant No.:G2022014096L).
文摘Analyzing polysorbate 20(PS20)composition and the impact of each component on stability and safety is crucial due to formulation variations and individual tolerance.The similar structures and polarities of PS20 components make accurate separation,identification,and quantification challenging.In this work,a high-resolution quantitative method was developed using single-dimensional high-performance liquid chromatography(HPLC)with charged aerosol detection(CAD)to separate 18 key components with multiple esters.The separated components were characterized by ultra-high-performance liquid chromatography-quadrupole time-of-flight mass spectrometry(UHPLC-Q-TOF-MS)with an identical gradient as the HPLC-CAD analysis.The polysorbate compound database and library were expanded over 7-time compared to the commercial database.The method investigated differences in PS20 samples from various origins and grades for different dosage forms to evaluate the composition-process relationship.UHPLC-Q-TOF-MS identified 1329 to 1511 compounds in 4 batches of PS20 from different sources.The method observed the impact of 4 degradation conditions on peak components,identifying stable components and their tendencies to change.HPLC-CAD and UHPLC-Q-TOF-MS results provided insights into fingerprint differences,distinguishing quasi products.
基金supported by the National Natural Science Foundation of China(No.62302242)the China Postdoctoral Science Foundation(No.2023M731802).
文摘The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users with the right to issue a Data Subject Access Request(DSAR).Responding to such requests requires database administrators to identify information related to an individual accurately.However,manual compliance poses significant challenges and is error-prone.Database administrators need to write queries through time-consuming labor.The demand for large amounts of data by AI systems has driven the development of NoSQL databases.Due to the flexible schema of NoSQL databases,identifying personal information becomes even more challenging.This paper develops an automated tool to identify personal information that can help organizations respond to DSAR.Our tool employs a combination of various technologies,including schema extraction of NoSQL databases and relationship identification from query logs.We describe the algorithm used by our tool,detailing how it discovers and extracts implicit relationships from NoSQL databases and generates relationship graphs to help developers accurately identify personal data.We evaluate our tool on three datasets,covering different database designs,achieving an F1 score of 0.77 to 1.Experimental results demonstrate that our tool successfully identifies information relevant to the data subject.Our tool reduces manual effort and simplifies GDPR compliance,showing practical application value in enhancing the privacy performance of NOSQL databases and AI systems.
基金supported by the Deanship of Scientific Research,Vice Presidency for Graduate Studies and Scientific Research,King Faisal University,Saudi Arabia(Grant No.KFU242068).
文摘Database systems have consistently been prime targets for cyber-attacks and threats due to the critical nature of the data they store.Despite the increasing reliance on database management systems,this field continues to face numerous cyber-attacks.Database management systems serve as the foundation of any information system or application.Any cyber-attack can result in significant damage to the database system and loss of sensitive data.Consequently,cyber risk classifications and assessments play a crucial role in risk management and establish an essential framework for identifying and responding to cyber threats.Risk assessment aids in understanding the impact of cyber threats and developing appropriate security controls to mitigate risks.The primary objective of this study is to conduct a comprehensive analysis of cyber risks in database management systems,including classifying threats,vulnerabilities,impacts,and countermeasures.This classification helps to identify suitable security controls to mitigate cyber risks for each type of threat.Additionally,this research aims to explore technical countermeasures to protect database systems from cyber threats.This study employs the content analysis method to collect,analyze,and classify data in terms of types of threats,vulnerabilities,and countermeasures.The results indicate that SQL injection attacks and Denial of Service(DoS)attacks were the most prevalent technical threats in database systems,each accounting for 9%of incidents.Vulnerable audit trails,intrusion attempts,and ransomware attacks were classified as the second level of technical threats in database systems,comprising 7%and 5%of incidents,respectively.Furthermore,the findings reveal that insider threats were the most common non-technical threats in database systems,accounting for 5%of incidents.Moreover,the results indicate that weak authentication,unpatched databases,weak audit trails,and multiple usage of an account were the most common technical vulnerabilities in database systems,each accounting for 9%of vulnerabilities.Additionally,software bugs,insecure coding practices,weak security controls,insecure networks,password misuse,weak encryption practices,and weak data masking were classified as the second level of security vulnerabilities in database systems,each accounting for 4%of vulnerabilities.The findings from this work can assist organizations in understanding the types of cyber threats and developing robust strategies against cyber-attacks.
基金the financial support received from the Natural Science Foundation of China(32202202 and 31871735)。
文摘Advanced glycation end-products(AGEs)are a group of heterogeneous compounds formed in heatprocessed foods and are proven to be detrimental to human health.Currently,there is no comprehensive database for AGEs in foods that covers the entire range of food categories,which limits the accurate risk assessment of dietary AGEs in human diseases.In this study,we first established an isotope dilution UHPLCQq Q-MS/MS-based method for simultaneous quantification of 10 major AGEs in foods.The contents of these AGEs were detected in 334 foods covering all main groups consumed in Western and Chinese populations.Nε-Carboxymethyllysine,methylglyoxal-derived hydroimidazolone isomers,and glyoxal-derived hydroimidazolone-1 are predominant AGEs found in most foodstuffs.Total amounts of AGEs were high in processed nuts,bakery products,and certain types of cereals and meats(>150 mg/kg),while low in dairy products,vegetables,fruits,and beverages(<40 mg/kg).Assessment of estimated daily intake implied that the contribution of food groups to daily AGE intake varied a lot under different eating patterns,and selection of high-AGE foods leads to up to a 2.7-fold higher intake of AGEs through daily meals.The presented AGE database allows accurate assessment of dietary exposure to these glycotoxins to explore their physiological impacts on human health.
文摘This study examines the database search behaviors of individuals, focusing on gender differences and the impact of planning habits on information retrieval. Data were collected from a survey of 198 respondents, categorized by their discipline, schooling background, internet usage, and information retrieval preferences. Key findings indicate that females are more likely to plan their searches in advance and prefer structured methods of information retrieval, such as using library portals and leading university websites. Males, however, tend to use web search engines and self-archiving methods more frequently. This analysis provides valuable insights for educational institutions and libraries to optimize their resources and services based on user behavior patterns.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61888102,52272172,and 52102193)the Major Program of the National Natural Science Foundation of China(Grant No.92163206)+2 种基金the National Key Research and Development Program of China(Grant Nos.2021YFA1201501 and 2022YFA1204100)the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDB30000000)the Fundamental Research Funds for the Central Universities.
文摘Discovery of materials using“bottom-up”or“top-down”approach is of great interest in materials science.Layered materials consisting of two-dimensional(2D)building blocks provide a good platform to explore new materials in this respect.In van der Waals(vdW)layered materials,these building blocks are charge neutral and can be isolated from their bulk phase(top-down),but usually grow on substrate.In ionic layered materials,they are charged and usually cannot exist independently but can serve as motifs to construct new materials(bottom-up).In this paper,we introduce our recently constructed databases for 2D material-substrate interface(2DMSI),and 2D charged building blocks.For 2DMSI database,we systematically build a workflow to predict appropriate substrates and their geometries at substrates,and construct the 2DMSI database.For the 2D charged building block database,1208 entries from bulk material database are identified.Information of crystal structure,valence state,source,dimension and so on is provided for each entry with a json format.We also show its application in designing and searching for new functional layered materials.The 2DMSI database,building block database,and designed layered materials are available in Science Data Bank at https://doi.org/10.57760/sciencedb.j00113.00188.
基金supported by the Student Scheme provided by Universiti Kebangsaan Malaysia with the Code TAP-20558.
文摘A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various forms of semi-structured,structured,and unstructured information.These systems use a flat architecture and run different types of data analytics.NoSQL databases are nontabular and store data in a different manner than the relational table.NoSQL databases come in various forms,including key-value pairs,documents,wide columns,and graphs,each based on its data model.They offer simpler scalability and generally outperform traditional relational databases.While NoSQL databases can store diverse data types,they lack full support for atomicity,consistency,isolation,and durability features found in relational databases.Consequently,employing machine learning approaches becomes necessary to categorize complex structured query language(SQL)queries.Results indicate that the most frequently used automatic classification technique in processing SQL queries on NoSQL databases is machine learning-based classification.Overall,this study provides an overview of the automatic classification techniques used in processing SQL queries on NoSQL databases.Understanding these techniques can aid in the development of effective and efficient NoSQL database applications.
基金Tata Steel Netherlands,Posco,Hyundai Steel,Nucor Steel,RioTinto,Nippon Steel Corp.,JFE Steel,Voestalpine,RHi-Magnesita,Doosan Enerbility,Seah Besteel,Umicore,Vesuvius and Schott AG are gratefully acknowledged.
文摘The CALPHAD thermodynamic databases are very useful to analyze the complex chemical reactions happening in high temperature material process.The FactSage thermodynamic database can be used to calculate complex phase diagrams and equilibrium phases involving refractories in industrial process.In this study,the FactSage thermodynamic database relevant to ZrO_(2)-based refractories was reviewed and the application of the database to understanding the corrosion of continuous casting nozzle refractories in steelmaking was presented.
文摘BACKGROUND Elective cholecystectomy(CCY)is recommended for patients with gallstone-related acute cholangitis(AC)following endoscopic decompression to prevent recurrent biliary events.However,the optimal timing and implications of CCY remain unclear.AIM To examine the impact of same-admission CCY compared to interval CCY on patients with gallstone-related AC using the National Readmission Database(NRD).METHODS We queried the NRD to identify all gallstone-related AC hospitalizations in adult patients with and without the same admission CCY between 2016 and 2020.Our primary outcome was all-cause 30-d readmission rates,and secondary outcomes included in-hospital mortality,length of stay(LOS),and hospitalization cost.RESULTS Among the 124964 gallstone-related AC hospitalizations,only 14.67%underwent the same admission CCY.The all-cause 30-d readmissions in the same admission CCY group were almost half that of the non-CCY group(5.56%vs 11.50%).Patients in the same admission CCY group had a longer mean LOS and higher hospitalization costs attrib-utable to surgery.Although the most common reason for readmission was sepsis in both groups,the second most common reason was AC in the interval CCY group.CONCLUSION Our study suggests that patients with gallstone-related AC who do not undergo the same admission CCY have twice the risk of readmission compared to those who undergo CCY during the same admission.These readmis-sions can potentially be prevented by performing same-admission CCY in appropriate patients,which may reduce subsequent hospitalization costs secondary to readmissions.
文摘With the rapid development of artificial intelligence, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. These models have great potential to enhance database query systems, enabling more intuitive and semantic query mechanisms. Our model leverages LLM’s deep learning architecture to interpret and process natural language queries and translate them into accurate database queries. The system integrates an LLM-powered semantic parser that translates user input into structured queries that can be understood by the database management system. First, the user query is pre-processed, the text is normalized, and the ambiguity is removed. This is followed by semantic parsing, where the LLM interprets the pre-processed text and identifies key entities and relationships. This is followed by query generation, which converts the parsed information into a structured query format and tailors it to the target database schema. Finally, there is query execution and feedback, where the resulting query is executed on the database and the results are returned to the user. The system also provides feedback mechanisms to improve and optimize future query interpretations. By using advanced LLMs for model implementation and fine-tuning on diverse datasets, the experimental results show that the proposed method significantly improves the accuracy and usability of database queries, making data retrieval easy for users without specialized knowledge.
基金supported by a Theme-based Research Scheme grant from the Research Grants Council of the Hong Kong Special Administrative Region,China(T21-705/20-N)。
文摘Antibiotic resistance,which is encoded by antibiotic-resistance genes(ARGs),has proliferated to become a growing threat to public health around the world.With technical advances,especially in the popularization of metagenomic sequencing,scientists have gained the ability to decipher the profiles of ARGs in diverse samples with high accuracy at an accelerated speed.To analyze thousands of ARGs in a highthroughput way,standardized and integrated pipelines are needed.The new version(v3.0)of the widely used ARGs online analysis pipeline(ARGs-OAP)has made significant improvements to both the reference database-the structured ARG(SARG)database-and the integrated analysis pipeline.SARG has been enhanced with sequence curation to improve annotation reliability,incorporate emerging resistance genotypes,and determine rigorous mechanism classification.The database has been further organized and visualized online in the format of a tree-like structure with a dictionary.It has also been divided into sub-databases for different application scenarios.In addition,the ARGs-OAP has been improved with adjusted quantification methods,simplified tool implementation,and multiple functions with userdefined reference databases.Moreover,the online platform now provides a diverse biostatistical analysis workflow with visualization packages for the efficient interpretation of ARG profiles.The ARGs-OAP v3.0 with an improved database and analysis pipeline will benefit academia,governmental management,and consultation regarding risk assessment of the environmental prevalence of ARGs.
基金supported by continuation funds from the Turku Collegium for Science,Medicine and Technologythe Japan Society for the Promotion of Science (#23K08670)+1 种基金the Sigrid Jusélius Foundation (#230131)MF-R internship at the University of Turku was funded by the Erasmus+program。
文摘The bone extracellular matrix(ECM) contains minerals deposited on highly crosslinked collagen fibrils and hundreds of noncollagenous proteins. Some of these proteins are key to the regulation of bone formation and regeneration via signaling pathways,and play important regulatory and structural roles. However, the complete list of bone extracellular matrix proteins, their roles, and the extent of individual and cross-species variations have not been fully captured in both humans and model organisms. Here, we introduce the most comprehensive resource of bone extracellular matrix(ECM) proteins that can be used in research fields such as bone regeneration, osteoporosis, and mechanobiology. The Phylobone database(available at https://phylobone.com) includes 255proteins potentially expressed in the bone extracellular matrix(ECM) of humans and 30 species of vertebrates. A bioinformatics pipeline was used to identify the evolutionary relationships of bone ECM proteins. The analysis facilitated the identification of potential model organisms to study the molecular mechanisms of bone regeneration. A network analysis showed high connectivity of bone ECM proteins. A total of 214 functional protein domains were identified, including collagen and the domains involved in bone formation and resorption. Information from public drug repositories was used to identify potential repurposing of existing drugs. The Phylobone database provides a platform to study bone regeneration and osteoporosis in light of(biological) evolution,and will substantially contribute to the identification of molecular mechanisms and drug targets.
基金supported by Zhejiang Provincial Natural Science Foundation of China(grant no.LQ22H150005)。
文摘Background:Skin aging has recently gained significant attention in both society and skin care research.Understanding the biological processes of photoaging caused by long-term skin exposure to ultraviolet radiation is critical for preventing and treating skin aging.Therefore,it is important to identify genes related to skin photoaging and shed light on their functions.Methods:We used data from the Gene Expression Omnibus(GEO)database and conducted bioinformatics analyses to screen and extract microRNAs(miRNAs)and their downstream target genes related to skin photoaging,and to determine possible biological mechanisms of skin photoaging.Results:A total of 34 differentially expressed miRNAs and their downstream target genes potentially related to the biological process of skin photoaging were identified.Gene Ontology enrichment analysis and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis showed that these target genes were enriched in pathways related to human papillomavirus infection,extracellular matrix(ECM)-receptor signaling,estrogen receptor,skin development,epidermal development,epidermal cell differentiation,keratinocyte differentiation,structural components of the ECM,structural components of the skin epidermis,and others.Conclusion:Based on the GEO database-derived findings,we determined that target genes of two miRNAs,namely miR-4667-5P-KRT79 and miR-139-5P-FOS,play an important role in skin photoaging.These observations could provide theoretical support and guidance for further research on skin aging-related biological processes.
文摘CHDTEPDB(URL:http://chdtepdb.com/)is a manually integrated database for congenital heart disease(CHD)that stores the expression profiling data of CHD derived from published papers,aiming to provide rich resources for investigating a deeper correlation between human CHD and aberrant transcriptome expression.The develop-ment of human diseases involves important regulatory roles of RNAs,and expression profiling data can reflect the underlying etiology of inherited diseases.Hence,collecting and compiling expression profiling data is of critical significance for a comprehensive understanding of the mechanisms and functions that underpin genetic diseases.CHDTEPDB stores the expression profiles of over 200 sets of 7 types of CHD and provides users with more convenient basic analytical functions.Due to the differences in clinical indicators such as disease type and unavoidable detection errors among various datasets,users are able to customize their selection of corresponding data for personalized analysis.Moreover,we provide a submission page for researchers to submit their own data so that increasing expression profiles as well as some other histological data could be supplemented to the database.CHDTEPDB is a user-friendly interface that allows users to quickly browse,retrieve,download,and analyze their target samples.CHDTEPDB will significantly improve the current knowledge of expression profiling data in CHD and has the potential to be exploited as an important tool for future research on the disease.
基金supported by the National Natural Science Foundation of China(NSFC,grant Nos.11988101 and 11833009),supported by the National Natural Science Foundation of China(NSFC,grant No.U2031115)supported by the National Key R&D Program of China(No.2021YFA1600401 and 2021YFA1600400)+1 种基金National Natural Science Foundation of China(NSFC,grant Nos.11873058 and 12133004)the National SKA program of China(No.2020SKA0120200)。
文摘Pulsar polarization profiles form a very basic database for understanding the emission processes in a pulsar magnetosphere.After careful polarization calibration of the 19-beam L-band receiver and verification of beamoffset observation results,we obtain polarization profiles of 682 pulsars from observations by the Five-hundredmeter Aperture Spherical radio Telescope(FAST)duringthe Galactic Plane Pulsar Snapshot survey and other normal FAST projects.Among them,polarization profiles of about 460 pulsars are observed for the first time.The profiles exhibit diverse features.Some pulsars have a polarization position angle curve with a good S-shaped swing,some with orthogonal modes;some have components with highly linearly polarized components or strong circularly polarized components;some have a very wide profile,coming from an aligned rotator,and some have an interpulse from a perpendicular rotator;some wide profiles are caused by interstellar scattering.We derive geometric parameters for 190 pulsars from the S-shaped position angle curves or with orthogonal modes.We find that the linear and circular polarization or the widths of pulse profiles have various frequency dependencies.Pulsars with a large fraction of linear polarization are more likely to have a large Edot.
基金funded by Start-Up Funds for Scientific Research of Shenzhen University,Grant No.000002112313.
文摘With the development of the digital city,data and data analysis have become more and more important.The database is the foundation of data analysis.In this paper,the software system of the urban land planning database of Shanghai in China is developed based on MySQL.The conceptual model of the urban land planning database is proposed,and the entities,attributes and connections of this model are discussed.Then the E-R conceptual model is transformed into a logical structure,which is supported by the relational databasemanagement system(DBMS).Based on the conceptual and logical structures,by using Spring Boot as the back-end framework and using MySQL and Java API as the development tools,a platformwith datamanagement,information sharing,map assistance and other functions is established.The functionalmodules in this platformare designed.The results of J Meter test show that the DBMS can add,store and retrieve information data stably,and it has the advantages of fast response and low error rate.The software system of the urban land planning database developed in this paper can improve the efficiency of storing and managing land data,eliminating redundant data and sharing data.
基金supported by funds from the National Natural Science Foundation of China(No.82000602)the Chen Xiao-Ping Foundation for the Development of Science and Technology of Hubei Province(No.CXPJJH11900001-2019330)Innovation Team Project of Health Commission of Hubei Province(No.WJ2021C001).
文摘Objective We aimed to identify new,more accurate risk factors of liver transplantation for liver cancer through using the Surveillance,Epidemiology,and End Results(SEER)database.Methods Using the SEER database,we identified patients that had undergone surgical resection for non-metastatic hepatocellular carcinoma(HCC)and subsequent liver transplantation between 2010 and 2017.Overall survival(OS)was estimated using Kaplan-Meier plotter.Cox proportional hazards regression modelling was used to identify factors independently associated with recurrent disease[presented as adjusted hazard ratios(HR)with 95%CIs].Results Totally,1530 eligible patients were included in the analysis.There were significant differences in ethnicity(P=0.04),cancer stage(P<0.001),vascular invasion(P<0.001)and gall bladder involvement(P<0.001)between the groups that survived,died due to cancer,or died due to other causes.In the Cox regression model,there were no significant differences in OS at 5 years with different operative strategies(autotransplantation versus allotransplantation),nor at survival at 1 year with neoadjuvant radiotherapy.However,neoadjuvant radiotherapy did appear to improve survival at both 3 years(HR:0.540,95%CI:0.326–0.896,P=0.017)and 5 years(HR:0.338,95%CI:0.153–0.747,P=0.007)from diagnosis.Conclusion This study demonstrated differences in patient characteristics between prognostic groups after liver resection and transplantation for HCC.These criteria can be used to inform patient selection and consent in this setting.Preoperative radiotherapy may improve long-term survival post-transplantation.
基金supported and funded by KAU Scientific Endowment,King Abdulaziz University,Jeddah,Saudi Arabia,grant number 077416-04.
文摘The utilization of digital picture search and retrieval has grown substantially in numerous fields for different purposes during the last decade,owing to the continuing advances in image processing and computer vision approaches.In multiple real-life applications,for example,social media,content-based face picture retrieval is a well-invested technique for large-scale databases,where there is a significant necessity for reliable retrieval capabilities enabling quick search in a vast number of pictures.Humans widely employ faces for recognizing and identifying people.Thus,face recognition through formal or personal pictures is increasingly used in various real-life applications,such as helping crime investigators retrieve matching images from face image databases to identify victims and criminals.However,such face image retrieval becomes more challenging in large-scale databases,where traditional vision-based face analysis requires ample additional storage space than the raw face images already occupied to store extracted lengthy feature vectors and takes much longer to process and match thousands of face images.This work mainly contributes to enhancing face image retrieval performance in large-scale databases using hash codes inferred by locality-sensitive hashing(LSH)for facial hard and soft biometrics as(Hard BioHash)and(Soft BioHash),respectively,to be used as a search input for retrieving the top-k matching faces.Moreover,we propose the multi-biometric score-level fusion of both face hard and soft BioHashes(Hard-Soft BioHash Fusion)for further augmented face image retrieval.The experimental outcomes applied on the Labeled Faces in the Wild(LFW)dataset and the related attributes dataset(LFW-attributes),demonstrate that the retrieval performance of the suggested fusion approach(Hard-Soft BioHash Fusion)significantly improved the retrieval performance compared to solely using Hard BioHash or Soft BioHash in isolation,where the suggested method provides an augmented accuracy of 87%when executed on 1000 specimens and 77%on 5743 samples.These results remarkably outperform the results of the Hard BioHash method by(50%on the 1000 samples and 30%on the 5743 samples),and the Soft BioHash method by(78%on the 1000 samples and 63%on the 5743 samples).
基金We would like to thank Mr.Andrew Tong for help in implementing the hyperlinks of the database and members of the Liu lab for helpful comments on the website.This work has been supported by a grant from the National Science Foundation(NSF)(IOS-1444987)to ZLSMM.ML was supported in part by an National Science Foundation award(DGE-1632976).
文摘Rosaceae is a large plant family consisting of many economically important fruit crops including peach,apple,pear,strawberry,raspberry,plum,and others.Investigations into their growth and development will promote both basic understanding and progress toward increasing fruit yield and quality.With the ever-increasing high-throughput sequencing data of Rosaceae,comparative studies are hindered by inconsistency of sample collection with regard to tissue,stage,growth conditions,and by vastly different handling of the data.Therefore,databases that enable easy access and effective utilization of directly comparable transcript data are highly desirable.Here,we describe a database for comparative analysis,ROsaceae Fruit Transcriptome database(ROFT),based on RNA-seq data generated from the same laboratory using similarly dissected and staged fruit tissues of four important Rosaceae fruit crops:apple,peach,strawberry,and red raspberry.Hence,the database is unique in allowing easy and robust comparisons among fruit gene expression across the four species.ROFT enables researchers to query orthologous genes and their expression patterns during different fruit developmental stages in the four species,identify tissue-specific and tissue-/stage-specific genes,visualize and compare ortholog expression in different fruit types,explore consensus co-expression networks,and download different data types.The database provides users access to vast amounts of RNA-seq data across the four economically important fruits,enables investigations of fruit type specification and evolution,and facilitates the selection of genes with critical roles in fruit development for further studies.