At present,the database cache model of power information system has problems such as slow running speed and low database hit rate.To this end,this paper proposes a database cache model for power information systems ba...At present,the database cache model of power information system has problems such as slow running speed and low database hit rate.To this end,this paper proposes a database cache model for power information systems based on deep machine learning.The caching model includes program caching,Structured Query Language(SQL)preprocessing,and core caching modules.Among them,the method to improve the efficiency of the statement is to adjust operations such as multi-table joins and replacement keywords in the SQL optimizer.Build predictive models using boosted regression trees in the core caching module.Generate a series of regression tree models using machine learning algorithms.Analyze the resource occupancy rate in the power information system to dynamically adjust the voting selection of the regression tree.At the same time,the voting threshold of the prediction model is dynamically adjusted.By analogy,the cache model is re-initialized.The experimental results show that the model has a good cache hit rate and cache efficiency,and can improve the data cache performance of the power information system.It has a high hit rate and short delay time,and always maintains a good hit rate even under different computer memory;at the same time,it only occupies less space and less CPU during actual operation,which is beneficial to power The information system operates efficiently and quickly.展开更多
All-solid-state batteries(ASSBs)are a class of safer and higher-energy-density materials compared to conventional devices,from which solid-state electrolytes(SSEs)are their essential components.To date,investigations ...All-solid-state batteries(ASSBs)are a class of safer and higher-energy-density materials compared to conventional devices,from which solid-state electrolytes(SSEs)are their essential components.To date,investigations to search for high ion-conducting solid-state electrolytes have attracted broad concern.However,obtaining SSEs with high ionic conductivity is challenging due to the complex structural information and the less-explored structure-performance relationship.To provide a solution to these challenges,developing a database containing typical SSEs from available experimental reports would be a new avenue to understand the structureperformance relationships and find out new design guidelines for reasonable SSEs.Herein,a dynamic experimental database containing>600 materials was developed in a wide range of temperatures(132.40–1261.60 K),including mono-and divalent cations(e.g.,Li^(+),Na^(+),K^(+),Ag^(+),Ca^(2+),Mg^(2+),and Zn^(2+))and various types of anions(e.g.,halide,hydride,sulfide,and oxide).Data-mining was conducted to explore the relationships among different variates(e.g.,transport ion,composition,activation energy,and conductivity).Overall,we expect that this database can provide essential guidelines for the design and development of high-performance SSEs in ASSB applications.This database is dynamically updated,which can be accessed via our open-source online system.展开更多
A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various form...A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various forms of semi-structured,structured,and unstructured information.These systems use a flat architecture and run different types of data analytics.NoSQL databases are nontabular and store data in a different manner than the relational table.NoSQL databases come in various forms,including key-value pairs,documents,wide columns,and graphs,each based on its data model.They offer simpler scalability and generally outperform traditional relational databases.While NoSQL databases can store diverse data types,they lack full support for atomicity,consistency,isolation,and durability features found in relational databases.Consequently,employing machine learning approaches becomes necessary to categorize complex structured query language(SQL)queries.Results indicate that the most frequently used automatic classification technique in processing SQL queries on NoSQL databases is machine learning-based classification.Overall,this study provides an overview of the automatic classification techniques used in processing SQL queries on NoSQL databases.Understanding these techniques can aid in the development of effective and efficient NoSQL database applications.展开更多
As an information-rich collective, there are always some people who choose to take risks for some ulterior purpose and others are committed to finding ways to deal with database security threats. The purpose of databa...As an information-rich collective, there are always some people who choose to take risks for some ulterior purpose and others are committed to finding ways to deal with database security threats. The purpose of database security research is to prevent the database from being illegally used or destroyed. This paper introduces the main literature in the field of database security research in recent years. First of all, we classify these papers, the classification criteria </span><span style="font-size:12px;font-family:Verdana;">are</span><span style="font-size:12px;font-family:Verdana;"> the influencing factors of database security. Compared with the traditional and machine learning (ML) methods, some explanations of concepts are interspersed to make these methods easier to understand. Secondly, we find that the related research has achieved some gratifying results, but there are also some shortcomings, such as weak generalization, deviation from reality. Then, possible future work in this research is proposed. Finally, we summarize the main contribution.展开更多
Chemical structure searching based on databases and machine learning has at-tracted great attention recently for fast screening materials with target func-tionalities.To this end,we estab-lished a high-performance che...Chemical structure searching based on databases and machine learning has at-tracted great attention recently for fast screening materials with target func-tionalities.To this end,we estab-lished a high-performance chemical struc-ture database based on MYSQL engines,named MYDB.More than 160000 metal-organic frameworks(MOFs)have been collected and stored by using new retrieval algorithms for effcient searching and recom-mendation.The evaluations results show that MYDB could realize fast and effcient key-word searching against millions of records and provide real-time recommendations for similar structures.Combining machine learning method and materials database,we developed an adsorption model to determine the adsorption capacitor of metal-organic frameworks to-ward argon and hydrogen under certain conditions.We expect that MYDB together with the developed machine learning techniques could support large-scale,low-cost,and highly convenient structural research towards accelerating discovery of materials with target func-tionalities in the eld of computational materials research.展开更多
Typically, magnesium alloys have been designed using a so-called hill-climbing approach, with rather incremental advances over the past century. Iterative and incremental alloy design is slow and expensive, but more i...Typically, magnesium alloys have been designed using a so-called hill-climbing approach, with rather incremental advances over the past century. Iterative and incremental alloy design is slow and expensive, but more importantly it does not harness all the data that exists in the field. In this work, a new approach is proposed that utilises data science and provides a detailed understanding of the data that exists in the field of Mg-alloy design to date. In this approach, first a consolidated alloy database that incorporates 916 datapoints was developed from the literature and experimental work. To analyse the characteristics of the database, alloying and thermomechanical processing effects on mechanical properties were explored via composition-process-property matrices. An unsupervised machine learning(ML) method of clustering was also implemented, using unlabelled data, with the aim of revealing potentially useful information for an alloy representation space of low dimensionality. In addition, the alloy database was correlated to thermodynamically stable secondary phases to further understand the relationships between microstructure and mechanical properties. This work not only introduces an invaluable open-source database, but it also provides, for the first-time data, insights that enable future accelerated digital Mg-alloy design.展开更多
Data protection in databases is critical for any organization,as unauthorized access or manipulation can have severe negative consequences.Intrusion detection systems are essential for keeping databases secure.Advance...Data protection in databases is critical for any organization,as unauthorized access or manipulation can have severe negative consequences.Intrusion detection systems are essential for keeping databases secure.Advancements in technology will lead to significant changes in the medical field,improving healthcare services through real-time information sharing.However,reliability and consistency still need to be solved.Safeguards against cyber-attacks are necessary due to the risk of unauthorized access to sensitive information and potential data corruption.Dis-ruptions to data items can propagate throughout the database,making it crucial to reverse fraudulent transactions without delay,especially in the healthcare industry,where real-time data access is vital.This research presents a role-based access control architecture for an anomaly detection technique.Additionally,the Structured Query Language(SQL)queries are stored in a new data structure called Pentaplet.These pentaplets allow us to maintain the correlation between SQL statements within the same transaction by employing the transaction-log entry information,thereby increasing detection accuracy,particularly for individuals within the company exhibiting unusual behavior.To identify anomalous queries,this system employs a supervised machine learning technique called Support Vector Machine(SVM).According to experimental findings,the proposed model performed well in terms of detection accuracy,achieving 99.92%through SVM with One Hot Encoding and Principal Component Analysis(PCA).展开更多
Applying high-speed machining technology in shop floor has many benefits, such as manufacturing more accurate parts with better surface finishes. The selection of the appropriate machining parameters plays a very impo...Applying high-speed machining technology in shop floor has many benefits, such as manufacturing more accurate parts with better surface finishes. The selection of the appropriate machining parameters plays a very important role in the implementation of high-speed machining technology. The case-based reasoning is used in the developing of high-speed machining database to overcome the shortage of available high-speed cutting parameters in machining data handbooks and shop floors. The high-speed machining database developed in this paper includes two main components: the machining database and the case-base. The machining database stores the cutting parameters, cutting tool data, work pieces and their materials data, and other relative data, while the case-base stores mainly the successfully solved cases that are problems of work pieces and their machining. The case description and case retrieval methods are described to establish the case-based reasoning high-speed machining database. With the case retrieval method, some succeeded cases similar to the new machining problem can be retrieved from the case-base. The solution of the most matched case is evaluated and modified, and then it is regarded as the proposed solution to the new machining problem. After verification, the problem and its solution are packed up into a new case, and are stored in the case-base for future applications.展开更多
Traditionally,nonlinear time history analysis(NLTHA)is used to assess the performance of structures under fu-ture hazards which is necessary to develop effective disaster risk management strategies.However,this method...Traditionally,nonlinear time history analysis(NLTHA)is used to assess the performance of structures under fu-ture hazards which is necessary to develop effective disaster risk management strategies.However,this method is computationally intensive and not suitable for analyzing a large number of structures on a city-wide scale.Surrogate models offer an efficient and reliable alternative and facilitate evaluating the performance of multiple structures under different hazard scenarios.However,creating a comprehensive database for surrogate mod-elling at the city level presents challenges.To overcome this,the present study proposes meta databases and a general framework for surrogate modelling of steel structures.The dataset includes 30,000 steel moment-resisting frame buildings,representing low-rise,mid-rise and high-rise buildings,with criteria for connections,beams,and columns.Pushover analysis is performed and structural parameters are extracted,and finally,incorporating two different machine learning algorithms,random forest and Shapley additive explanations,sensitivity and explain-ability analyses of the structural parameters are performed to identify the most significant factors in designing steel moment resisting frames.The framework and databases can be used as a validated source of surrogate modelling of steel frame structures in order for disaster risk management.展开更多
The history and current status of materials data activities from handbook to database are reviewed, with introduction to some important products. Through an example of prediction of interfacial thermal resistance base...The history and current status of materials data activities from handbook to database are reviewed, with introduction to some important products. Through an example of prediction of interfacial thermal resistance based on data and data science methods, we show the advantages and potential of material informatics to study material issues which are too complicated or time consuming for conventional theoretical and experimental methods. Materials big data is the fundamental of material informatics. The challenges and strategy to construct materials big data are discussed, and some solutions are proposed as the results of our experiences to construct National Institute for Materials Science(NIMS) materials databases.展开更多
A large database is desired for machine learning(ML) technology to make accurate predictions of materials physicochemical properties based on their molecular structure.When a large database is not available,the develo...A large database is desired for machine learning(ML) technology to make accurate predictions of materials physicochemical properties based on their molecular structure.When a large database is not available,the development of proper featurization method based on physicochemical nature of target proprieties can improve the predictive power of ML models with a smaller database.In this work,we show that two new featurization methods,volume occupation spatial matrix and heat contribution spatial matrix,can improve the accuracy in predicting energetic materials' crystal density(ρ_(crystal)) and solid phase enthalpy of formation(H_(f,solid)) using a database containing 451 energetic molecules.Their mean absolute errors are reduced from 0.048 g/cm~3 and 24.67 kcal/mol to 0.035 g/cm~3 and 9.66 kcal/mol,respectively.By leave-one-out-cross-validation,the newly developed ML models can be used to determine the performance of most kinds of energetic materials except cubanes.Our ML models are applied to predict ρ_(crystal) and H_(f,solid) of CHON-based molecules of the 150 million sized PubChem database,and screened out 56 candidates with competitive detonation performance and reasonable chemical structures.With further improvement in future,spatial matrices have the potential of becoming multifunctional ML simulation tools that could provide even better predictions in wider fields of materials science.展开更多
Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical I...Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.展开更多
The variation of crustal thickness is a critical index to reveal how the continental crust evolved over its four billion years.Generally,ratios of whole-rock trace elements,such as Sr/Y,(La/Yb)n and Ce/Y,are used to c...The variation of crustal thickness is a critical index to reveal how the continental crust evolved over its four billion years.Generally,ratios of whole-rock trace elements,such as Sr/Y,(La/Yb)n and Ce/Y,are used to characterize crustal thicknesses.However,sometimes confusing results are obtained since there is no enough filtered data.Here,a state-of-the-art approach,based on a machine-learning algorithm,is proposed to predict crustal thickness using global major-and trace-element geochemical data of intermediate arc rocks and intraplate basalts,and their corresponding crustal thicknesses.After the validation processes,the root-mean-square error(RMSE)and the coefficient of determination(R2)score were used to evaluate the performance of the machine learning algorithm based on the learning dataset which has never been used during the training phase.The results demonstrate that the machine learning algorithm is more reliable in predicting crustal thickness than the conventional methods.The trained model predicts that the crustal thickness of the eastern North China Craton(ENCC)was-45 km from the Late Triassic to the Early Cretaceous,but-35 km from the Early Cretaceous,which corresponds to the paleo-elevation of 3.0±1.5 km at Early Mesozoic,and decease to the present-day elevation in the ENCC.The estimates are generally consistent with the previous studies on xenoliths from the lower crust and on the paleoenvironment of the coastal mountain of the ENCC,which indicates that the lower crust of the ENCC was delaminated abruptly at the Early Cretaceous.展开更多
The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to th...The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to the CAST-NEONS environmental databases used for ocean modeling and prediction. We describe a discovery-learning process (Automatic Data Analysis System) which combines the features of two machine learning techniques to generate sets of production rules that efficiently describe the observational raw data contained in the database. Data clustering allows the system to classify the raw data into meaningful conceptual clusters, which the system learns by induction to build decision trees, from which are automatically deduced the production rules.展开更多
During the last two decades signicant work has been reported in the eld of cursive language’s recognition especially,in the Arabic,the Urdu and the Persian languages.The unavailability of such work in the Pashto lang...During the last two decades signicant work has been reported in the eld of cursive language’s recognition especially,in the Arabic,the Urdu and the Persian languages.The unavailability of such work in the Pashto language is because of:the absence of a standard database and of signicant research work that ultimately acts as a big barrier for the research community.The slight change in the Pashto characters’shape is an additional challenge for researchers.This paper presents an efcient OCR system for the handwritten Pashto characters based on multi-class enabled support vector machine using manifold feature extraction techniques.These feature extraction techniques include,tools such as zoning feature extractor,discrete cosine transform,discrete wavelet transform,and Gabor lters and histogram of oriented gradients.A hybrid feature map is developed by combining the manifold feature maps.This research work is performed by developing a medium-sized dataset of handwritten Pashto characters that encapsulate 200 handwritten samples for each 44 characters in the Pashto language.Recognition results are generated for the proposed model based on a manifold and hybrid feature map.An overall accuracy rates of 63.30%,65.13%,68.55%,68.28%,67.02%and 83%are generated based on a zoning technique,HoGs,Gabor lter,DCT,DWT and hybrid feature maps respectively.Applicability of the proposed model is also tested by comparing its results with a convolution neural network model.The convolution neural network-based model generated an accuracy rate of 81.02%smaller than the multi-class support vector machine.The highest accuracy rate of 83%for the multi-class SVM model based on a hybrid feature map reects the applicability of the proposed model.展开更多
The design and developmental steps for an auxiliary machining module utilizing a database framework are discussed in this work to contribute to an improvement in workshop operations. The underlining objective is for t...The design and developmental steps for an auxiliary machining module utilizing a database framework are discussed in this work to contribute to an improvement in workshop operations. The underlining objective is for the provision of easily accessible and applicable machining operations data to enable and improve job accuracy and conformity to industrial standards. The design of the database for the decision support system is based on a relational frame with Microsoft Access Application package and Microsoft Structured Query Language Server, which serves as the back end of the module. A user interface designed on .Net Framework 3.5 and the windows installer 3.1 running on windows XP operating system serve as the software front end. The developed module is to serve as a decision support system for machine tool operations.展开更多
The theory and its method of machining parameter optimization for high-speed machining are studied. The machining data collected from workshops, labs and references are analyzed. An optimization method based on the ge...The theory and its method of machining parameter optimization for high-speed machining are studied. The machining data collected from workshops, labs and references are analyzed. An optimization method based on the genetic algorithm (GA) is investigated. Its calculation speed is faster than that of traditional optimization methods, and it is suitable for the machining parameter optimization in the automatic manufacturing system. Based on the theoretical studies, a system of machining parameter management and optimization is developed. The system can improve productivity of the high-speed machining centers.展开更多
文摘At present,the database cache model of power information system has problems such as slow running speed and low database hit rate.To this end,this paper proposes a database cache model for power information systems based on deep machine learning.The caching model includes program caching,Structured Query Language(SQL)preprocessing,and core caching modules.Among them,the method to improve the efficiency of the statement is to adjust operations such as multi-table joins and replacement keywords in the SQL optimizer.Build predictive models using boosted regression trees in the core caching module.Generate a series of regression tree models using machine learning algorithms.Analyze the resource occupancy rate in the power information system to dynamically adjust the voting selection of the regression tree.At the same time,the voting threshold of the prediction model is dynamically adjusted.By analogy,the cache model is re-initialized.The experimental results show that the model has a good cache hit rate and cache efficiency,and can improve the data cache performance of the power information system.It has a high hit rate and short delay time,and always maintains a good hit rate even under different computer memory;at the same time,it only occupies less space and less CPU during actual operation,which is beneficial to power The information system operates efficiently and quickly.
基金supported by the Ensemble Grant for Early Career Researchers 2022 and the 2023 Ensemble Continuation Grant of Tohoku University,the Hirose Foundation,the Iwatani Naoji Foundation,and the AIMR Fusion Research Grantsupported by JSPS KAKENHI Nos.JP23K13599,JP23K13703,JP22H01803,and JP18H05513+2 种基金the Center for Computational Materials Science,Institute for Materials Research,Tohoku University for the use of MASAMUNEIMR(Nos.202212-SCKXX0204 and 202208-SCKXX-0212)the Institute for Solid State Physics(ISSP)at the University of Tokyo for the use of their supercomputersthe China Scholarship Council(CSC)fund to pursue studies in Japan.
文摘All-solid-state batteries(ASSBs)are a class of safer and higher-energy-density materials compared to conventional devices,from which solid-state electrolytes(SSEs)are their essential components.To date,investigations to search for high ion-conducting solid-state electrolytes have attracted broad concern.However,obtaining SSEs with high ionic conductivity is challenging due to the complex structural information and the less-explored structure-performance relationship.To provide a solution to these challenges,developing a database containing typical SSEs from available experimental reports would be a new avenue to understand the structureperformance relationships and find out new design guidelines for reasonable SSEs.Herein,a dynamic experimental database containing>600 materials was developed in a wide range of temperatures(132.40–1261.60 K),including mono-and divalent cations(e.g.,Li^(+),Na^(+),K^(+),Ag^(+),Ca^(2+),Mg^(2+),and Zn^(2+))and various types of anions(e.g.,halide,hydride,sulfide,and oxide).Data-mining was conducted to explore the relationships among different variates(e.g.,transport ion,composition,activation energy,and conductivity).Overall,we expect that this database can provide essential guidelines for the design and development of high-performance SSEs in ASSB applications.This database is dynamically updated,which can be accessed via our open-source online system.
基金supported by the Student Scheme provided by Universiti Kebangsaan Malaysia with the Code TAP-20558.
文摘A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various forms of semi-structured,structured,and unstructured information.These systems use a flat architecture and run different types of data analytics.NoSQL databases are nontabular and store data in a different manner than the relational table.NoSQL databases come in various forms,including key-value pairs,documents,wide columns,and graphs,each based on its data model.They offer simpler scalability and generally outperform traditional relational databases.While NoSQL databases can store diverse data types,they lack full support for atomicity,consistency,isolation,and durability features found in relational databases.Consequently,employing machine learning approaches becomes necessary to categorize complex structured query language(SQL)queries.Results indicate that the most frequently used automatic classification technique in processing SQL queries on NoSQL databases is machine learning-based classification.Overall,this study provides an overview of the automatic classification techniques used in processing SQL queries on NoSQL databases.Understanding these techniques can aid in the development of effective and efficient NoSQL database applications.
文摘As an information-rich collective, there are always some people who choose to take risks for some ulterior purpose and others are committed to finding ways to deal with database security threats. The purpose of database security research is to prevent the database from being illegally used or destroyed. This paper introduces the main literature in the field of database security research in recent years. First of all, we classify these papers, the classification criteria </span><span style="font-size:12px;font-family:Verdana;">are</span><span style="font-size:12px;font-family:Verdana;"> the influencing factors of database security. Compared with the traditional and machine learning (ML) methods, some explanations of concepts are interspersed to make these methods easier to understand. Secondly, we find that the related research has achieved some gratifying results, but there are also some shortcomings, such as weak generalization, deviation from reality. Then, possible future work in this research is proposed. Finally, we summarize the main contribution.
基金This work was supported by the National Natu-ral Science Foundation of China(No.21573204 and No.21421063),Fundamental Research Funds for the Central Universities,National Program for Support of Top-notch Young Professional,CAS Interdisciplinary Innovation Team,and Super Computer Center of USTCSCC and SCCAS.
文摘Chemical structure searching based on databases and machine learning has at-tracted great attention recently for fast screening materials with target func-tionalities.To this end,we estab-lished a high-performance chemical struc-ture database based on MYSQL engines,named MYDB.More than 160000 metal-organic frameworks(MOFs)have been collected and stored by using new retrieval algorithms for effcient searching and recom-mendation.The evaluations results show that MYDB could realize fast and effcient key-word searching against millions of records and provide real-time recommendations for similar structures.Combining machine learning method and materials database,we developed an adsorption model to determine the adsorption capacitor of metal-organic frameworks to-ward argon and hydrogen under certain conditions.We expect that MYDB together with the developed machine learning techniques could support large-scale,low-cost,and highly convenient structural research towards accelerating discovery of materials with target func-tionalities in the eld of computational materials research.
基金the support of the Monash-IITB Academy Scholarshipfunded in part by the Australian Research Council (DP190103592)。
文摘Typically, magnesium alloys have been designed using a so-called hill-climbing approach, with rather incremental advances over the past century. Iterative and incremental alloy design is slow and expensive, but more importantly it does not harness all the data that exists in the field. In this work, a new approach is proposed that utilises data science and provides a detailed understanding of the data that exists in the field of Mg-alloy design to date. In this approach, first a consolidated alloy database that incorporates 916 datapoints was developed from the literature and experimental work. To analyse the characteristics of the database, alloying and thermomechanical processing effects on mechanical properties were explored via composition-process-property matrices. An unsupervised machine learning(ML) method of clustering was also implemented, using unlabelled data, with the aim of revealing potentially useful information for an alloy representation space of low dimensionality. In addition, the alloy database was correlated to thermodynamically stable secondary phases to further understand the relationships between microstructure and mechanical properties. This work not only introduces an invaluable open-source database, but it also provides, for the first-time data, insights that enable future accelerated digital Mg-alloy design.
基金thankful to the Dean of Scientific Research at Najran University for funding this work under the Research Groups Funding Program,Grant Code(NU/RG/SERC/12/6).
文摘Data protection in databases is critical for any organization,as unauthorized access or manipulation can have severe negative consequences.Intrusion detection systems are essential for keeping databases secure.Advancements in technology will lead to significant changes in the medical field,improving healthcare services through real-time information sharing.However,reliability and consistency still need to be solved.Safeguards against cyber-attacks are necessary due to the risk of unauthorized access to sensitive information and potential data corruption.Dis-ruptions to data items can propagate throughout the database,making it crucial to reverse fraudulent transactions without delay,especially in the healthcare industry,where real-time data access is vital.This research presents a role-based access control architecture for an anomaly detection technique.Additionally,the Structured Query Language(SQL)queries are stored in a new data structure called Pentaplet.These pentaplets allow us to maintain the correlation between SQL statements within the same transaction by employing the transaction-log entry information,thereby increasing detection accuracy,particularly for individuals within the company exhibiting unusual behavior.To identify anomalous queries,this system employs a supervised machine learning technique called Support Vector Machine(SVM).According to experimental findings,the proposed model performed well in terms of detection accuracy,achieving 99.92%through SVM with One Hot Encoding and Principal Component Analysis(PCA).
文摘Applying high-speed machining technology in shop floor has many benefits, such as manufacturing more accurate parts with better surface finishes. The selection of the appropriate machining parameters plays a very important role in the implementation of high-speed machining technology. The case-based reasoning is used in the developing of high-speed machining database to overcome the shortage of available high-speed cutting parameters in machining data handbooks and shop floors. The high-speed machining database developed in this paper includes two main components: the machining database and the case-base. The machining database stores the cutting parameters, cutting tool data, work pieces and their materials data, and other relative data, while the case-base stores mainly the successfully solved cases that are problems of work pieces and their machining. The case description and case retrieval methods are described to establish the case-based reasoning high-speed machining database. With the case retrieval method, some succeeded cases similar to the new machining problem can be retrieved from the case-base. The solution of the most matched case is evaluated and modified, and then it is regarded as the proposed solution to the new machining problem. After verification, the problem and its solution are packed up into a new case, and are stored in the case-base for future applications.
基金financial support from Teesside University to support the Ph.D.programme of the first author.
文摘Traditionally,nonlinear time history analysis(NLTHA)is used to assess the performance of structures under fu-ture hazards which is necessary to develop effective disaster risk management strategies.However,this method is computationally intensive and not suitable for analyzing a large number of structures on a city-wide scale.Surrogate models offer an efficient and reliable alternative and facilitate evaluating the performance of multiple structures under different hazard scenarios.However,creating a comprehensive database for surrogate mod-elling at the city level presents challenges.To overcome this,the present study proposes meta databases and a general framework for surrogate modelling of steel structures.The dataset includes 30,000 steel moment-resisting frame buildings,representing low-rise,mid-rise and high-rise buildings,with criteria for connections,beams,and columns.Pushover analysis is performed and structural parameters are extracted,and finally,incorporating two different machine learning algorithms,random forest and Shapley additive explanations,sensitivity and explain-ability analyses of the structural parameters are performed to identify the most significant factors in designing steel moment resisting frames.The framework and databases can be used as a validated source of surrogate modelling of steel frame structures in order for disaster risk management.
基金Project supported by “Materials Research by Information Integration” Initiative(MI2I) project of the Support Program for Starting Up Innovation Hub from Japan Science and Technology Agency(JST)
文摘The history and current status of materials data activities from handbook to database are reviewed, with introduction to some important products. Through an example of prediction of interfacial thermal resistance based on data and data science methods, we show the advantages and potential of material informatics to study material issues which are too complicated or time consuming for conventional theoretical and experimental methods. Materials big data is the fundamental of material informatics. The challenges and strategy to construct materials big data are discussed, and some solutions are proposed as the results of our experiences to construct National Institute for Materials Science(NIMS) materials databases.
基金support from the Ministry of Education(MOE) Singapore Tier 1 (RG8/20)。
文摘A large database is desired for machine learning(ML) technology to make accurate predictions of materials physicochemical properties based on their molecular structure.When a large database is not available,the development of proper featurization method based on physicochemical nature of target proprieties can improve the predictive power of ML models with a smaller database.In this work,we show that two new featurization methods,volume occupation spatial matrix and heat contribution spatial matrix,can improve the accuracy in predicting energetic materials' crystal density(ρ_(crystal)) and solid phase enthalpy of formation(H_(f,solid)) using a database containing 451 energetic molecules.Their mean absolute errors are reduced from 0.048 g/cm~3 and 24.67 kcal/mol to 0.035 g/cm~3 and 9.66 kcal/mol,respectively.By leave-one-out-cross-validation,the newly developed ML models can be used to determine the performance of most kinds of energetic materials except cubanes.Our ML models are applied to predict ρ_(crystal) and H_(f,solid) of CHON-based molecules of the 150 million sized PubChem database,and screened out 56 candidates with competitive detonation performance and reasonable chemical structures.With further improvement in future,spatial matrices have the potential of becoming multifunctional ML simulation tools that could provide even better predictions in wider fields of materials science.
基金the National Social Science Foundation of China(No.16BGL183).
文摘Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.
基金co-funded by the National Natural Science Foundation of China(Grant Nos.42002089,41930428)the National Key R&D Program of China(Grant Nos.2016YFC0600401 and 2017YFC0602302)+1 种基金by Open Research Fund Program of Key Laboratory of Metallogenic Prediction of Nonferrous Metals and Geological Environment Monitoring(Central South University)Ministry of Education(Grant Nos.2020YSJS02,2020YSJS01).
文摘The variation of crustal thickness is a critical index to reveal how the continental crust evolved over its four billion years.Generally,ratios of whole-rock trace elements,such as Sr/Y,(La/Yb)n and Ce/Y,are used to characterize crustal thicknesses.However,sometimes confusing results are obtained since there is no enough filtered data.Here,a state-of-the-art approach,based on a machine-learning algorithm,is proposed to predict crustal thickness using global major-and trace-element geochemical data of intermediate arc rocks and intraplate basalts,and their corresponding crustal thicknesses.After the validation processes,the root-mean-square error(RMSE)and the coefficient of determination(R2)score were used to evaluate the performance of the machine learning algorithm based on the learning dataset which has never been used during the training phase.The results demonstrate that the machine learning algorithm is more reliable in predicting crustal thickness than the conventional methods.The trained model predicts that the crustal thickness of the eastern North China Craton(ENCC)was-45 km from the Late Triassic to the Early Cretaceous,but-35 km from the Early Cretaceous,which corresponds to the paleo-elevation of 3.0±1.5 km at Early Mesozoic,and decease to the present-day elevation in the ENCC.The estimates are generally consistent with the previous studies on xenoliths from the lower crust and on the paleoenvironment of the coastal mountain of the ENCC,which indicates that the lower crust of the ENCC was delaminated abruptly at the Early Cretaceous.
文摘The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to the CAST-NEONS environmental databases used for ocean modeling and prediction. We describe a discovery-learning process (Automatic Data Analysis System) which combines the features of two machine learning techniques to generate sets of production rules that efficiently describe the observational raw data contained in the database. Data clustering allows the system to classify the raw data into meaningful conceptual clusters, which the system learns by induction to build decision trees, from which are automatically deduced the production rules.
基金funded by Qatar University Internal Grant under Grant No.IRCC-2020-009.The ndings achieved herein are solely the responsibility of the authors。
文摘During the last two decades signicant work has been reported in the eld of cursive language’s recognition especially,in the Arabic,the Urdu and the Persian languages.The unavailability of such work in the Pashto language is because of:the absence of a standard database and of signicant research work that ultimately acts as a big barrier for the research community.The slight change in the Pashto characters’shape is an additional challenge for researchers.This paper presents an efcient OCR system for the handwritten Pashto characters based on multi-class enabled support vector machine using manifold feature extraction techniques.These feature extraction techniques include,tools such as zoning feature extractor,discrete cosine transform,discrete wavelet transform,and Gabor lters and histogram of oriented gradients.A hybrid feature map is developed by combining the manifold feature maps.This research work is performed by developing a medium-sized dataset of handwritten Pashto characters that encapsulate 200 handwritten samples for each 44 characters in the Pashto language.Recognition results are generated for the proposed model based on a manifold and hybrid feature map.An overall accuracy rates of 63.30%,65.13%,68.55%,68.28%,67.02%and 83%are generated based on a zoning technique,HoGs,Gabor lter,DCT,DWT and hybrid feature maps respectively.Applicability of the proposed model is also tested by comparing its results with a convolution neural network model.The convolution neural network-based model generated an accuracy rate of 81.02%smaller than the multi-class support vector machine.The highest accuracy rate of 83%for the multi-class SVM model based on a hybrid feature map reects the applicability of the proposed model.
文摘The design and developmental steps for an auxiliary machining module utilizing a database framework are discussed in this work to contribute to an improvement in workshop operations. The underlining objective is for the provision of easily accessible and applicable machining operations data to enable and improve job accuracy and conformity to industrial standards. The design of the database for the decision support system is based on a relational frame with Microsoft Access Application package and Microsoft Structured Query Language Server, which serves as the back end of the module. A user interface designed on .Net Framework 3.5 and the windows installer 3.1 running on windows XP operating system serve as the software front end. The developed module is to serve as a decision support system for machine tool operations.
文摘The theory and its method of machining parameter optimization for high-speed machining are studied. The machining data collected from workshops, labs and references are analyzed. An optimization method based on the genetic algorithm (GA) is investigated. Its calculation speed is faster than that of traditional optimization methods, and it is suitable for the machining parameter optimization in the automatic manufacturing system. Based on the theoretical studies, a system of machining parameter management and optimization is developed. The system can improve productivity of the high-speed machining centers.