Engineering data are separately organized and their schemas are increasingly complex and variable. Engineering data management systems are needed to be able to manage the unified data and to be both customizable and e...Engineering data are separately organized and their schemas are increasingly complex and variable. Engineering data management systems are needed to be able to manage the unified data and to be both customizable and extensible. The design of the systems is heavily dependent on the flexibility and self-description of the data model. The characteristics of engineering data and their management facts are analyzed. Then engineering data warehouse (EDW) architecture and multi-layer metamodels are presented. Also an approach to manage anduse engineering data by a meta object is proposed. Finally, an application flight test EDW system (FTEDWS) is described and meta-objects to manage engineering data in the data warehouse are used. It shows that adopting a meta-modeling approach provides a support for interchangeability and a sufficiently flexible environment in which the system evolution and the reusability can be handled.展开更多
The nature of the measured data varies among different disciplines of geosciences.In rock engineering,features of data play a leading role in determining the feasible methods of its proper manipulation.The present stu...The nature of the measured data varies among different disciplines of geosciences.In rock engineering,features of data play a leading role in determining the feasible methods of its proper manipulation.The present study focuses on resolving one of the major deficiencies of conventional neural networks(NNs)in dealing with rock engineering data.Herein,since the samples are obtained from hundreds of meters below the surface with the utmost difficulty,the number of samples is always limited.Meanwhile,the experimental analysis of these samples may result in many repetitive values and 0 s.However,conventional neural networks are incapable of making robust models in the presence of such data.On the other hand,these networks strongly depend on the initial weights and bias values for making reliable predictions.With this in mind,the current research introduces a novel kind of neural network processing framework for the geological that does not suffer from the limitations of the conventional NNs.The introduced single-data-based feature engineering network extracts all the information wrapped in every single data point without being affected by the other points.This method,being completely different from the conventional NNs,re-arranges all the basic elements of the neuron model into a new structure.Therefore,its mathematical calculations were performed from the very beginning.Moreover,the corresponding programming codes were developed in MATLAB and Python since they could not be found in any common programming software at the time being.This new kind of network was first evaluated through computer-based simulations of rock cracks in the 3 DEC environment.After the model’s reliability was confirmed,it was adopted in two case studies for estimating respectively tensile strength and shear strength of real rock samples.These samples were coal core samples from the Southern Qinshui Basin of China,and gas hydrate-bearing sediment(GHBS)samples from the Nankai Trough of Japan.The coal samples used in the experiments underwent nuclear magnetic resonance(NMR)measurements,and Scanning Electron Microscopy(SEM)imaging to investigate their original micro and macro fractures.Once done with these experiments,measurement of the rock mechanical properties,including tensile strength,was performed using a rock mechanical test system.However,the shear strength of GHBS samples was acquired through triaxial and direct shear tests.According to the obtained result,the new network structure outperformed the conventional neural networks in both cases of simulation-based and case study estimations of the tensile and shear strength.Even though the proposed approach of the current study originally aimed at resolving the issue of having a limited dataset,its unique properties would also be applied to larger datasets from other subsurface measurements.展开更多
This paper describes how database information and electronic 3D models are integrated to produce power plant designs more efficiently and accurately. Engineering CAD/CAE systems have evolved from strictly 3D modeling ...This paper describes how database information and electronic 3D models are integrated to produce power plant designs more efficiently and accurately. Engineering CAD/CAE systems have evolved from strictly 3D modeling to spatial data management tools. This paper describes how process data, commodities, and location data are disseminated to the various project team members through a central integrated database. The database and 3D model also provide a cache of information that is valuable to the constructor, and operations and maintenance Personnel.展开更多
With the advent of Big Data, the fields of Statistics and Computer Science coexist in current information systems. In addition to this, technological advances in embedded systems, in particular Internet of Things tech...With the advent of Big Data, the fields of Statistics and Computer Science coexist in current information systems. In addition to this, technological advances in embedded systems, in particular Internet of Things technologies, make it possible to develop real-time applications. These technological developments are disrupting Software Engineering because the use of large amounts of real-time data requires advanced thinking in terms of software architecture. The purpose of this article is to propose an architecture unifying not only Software Engineering and Big Data activities, but also batch and streaming architectures for the exploitation of massive data. This architecture has the advantage of making possible the development of applications and digital services exploiting very large volumes of data in real time;both for management needs and for analytical purposes. This architecture was tested on COVID-19 data as part of the development of an application for real-time monitoring of the evolution of the pandemic in Côte d’Ivoire using PostgreSQL, ELasticsearch, Kafka, Kafka Connect, NiFi, Spark, Node-Red and MoleculerJS to operationalize the architecture.展开更多
With a review of the recent development in digitalization and application of seabed data, this paper systematically proposed methods for integrating seabed data by analyzing its feature based on ORACLE database manage...With a review of the recent development in digitalization and application of seabed data, this paper systematically proposed methods for integrating seabed data by analyzing its feature based on ORACLE database management system and advanced techniques of spatial data management. We did research on storage structure of seabed data, distributed-integrated database system, standardized spatial database and seabed metadata management system in order to effectively manage and use these seabed information in practical application. Finally, we applied the methods researched and proposed in this paper to build the Bohai Sea engineering geology database that stores engineering geology data and other seabed information from the Bohai Sea area. As a result, the Bohai Sea engineering geology database can effectively integrate huge amount of distributed and complicated seabed data to meet the practical requisition of Bohai Sea en-gineering geology environment exploration and exploitation.展开更多
Through analyzing theprinciple of data sharing in the data-base system, this paper discusses theprinciple and method for integratingand sharing GIS data by data engine,introduces a way to achieve the highintegration a...Through analyzing theprinciple of data sharing in the data-base system, this paper discusses theprinciple and method for integratingand sharing GIS data by data engine,introduces a way to achieve the highintegration and sharing of GIS data on the basis of VCT in VC++, and pro-vides the method for uniting VCT intoRDBMS in order to implement a spa-tial database with object-oriented datamodel.展开更多
A new B-spline surface reconstruction method from layer data based on deformable model is presented. An initial deformable surface, which is represented as a closed cylinder, is firstly given. The surface is subject t...A new B-spline surface reconstruction method from layer data based on deformable model is presented. An initial deformable surface, which is represented as a closed cylinder, is firstly given. The surface is subject to internal forces describing its implicit smoothness property and external forces attracting it toward the layer data points. And then finite element method is adopted to solve its energy minimization problem, which results a bicubic closed B-spline surface with C^2 continuity. The proposed method can provide a smoothness and accurate surface model directly from the layer data, without the need to fit cross-sectional curves and make them compatible. The feasibility of the proposed method is verified by the experimental results.展开更多
In this study, we delve into the realm of efficient Big Data Engineering and Extract, Transform, Load (ETL) processes within the healthcare sector, leveraging the robust foundation provided by the MIMIC-III Clinical D...In this study, we delve into the realm of efficient Big Data Engineering and Extract, Transform, Load (ETL) processes within the healthcare sector, leveraging the robust foundation provided by the MIMIC-III Clinical Database. Our investigation entails a comprehensive exploration of various methodologies aimed at enhancing the efficiency of ETL processes, with a primary emphasis on optimizing time and resource utilization. Through meticulous experimentation utilizing a representative dataset, we shed light on the advantages associated with the incorporation of PySpark and Docker containerized applications. Our research illuminates significant advancements in time efficiency, process streamlining, and resource optimization attained through the utilization of PySpark for distributed computing within Big Data Engineering workflows. Additionally, we underscore the strategic integration of Docker containers, delineating their pivotal role in augmenting scalability and reproducibility within the ETL pipeline. This paper encapsulates the pivotal insights gleaned from our experimental journey, accentuating the practical implications and benefits entailed in the adoption of PySpark and Docker. By streamlining Big Data Engineering and ETL processes in the context of clinical big data, our study contributes to the ongoing discourse on optimizing data processing efficiency in healthcare applications. The source code is available on request.展开更多
A method is proposed for the prospecting prediction of subsurface mineral deposits based on soil geochemistry data and a deep convolutional neural network model.This method uses three techniques(window offset,scaling,...A method is proposed for the prospecting prediction of subsurface mineral deposits based on soil geochemistry data and a deep convolutional neural network model.This method uses three techniques(window offset,scaling,and rotation)to enhance the number of training data for the model.A window area is used to extract the spatial distribution characteristics of soil geochemistry and measure their correspondence with the occurrence of known subsurface deposits.Prospecting prediction is achieved by matching the characteristics of the window area of an unknown area with the relationships established in the known area.This method can efficiently predict mineral prospective areas where there are few ore deposits used for generating the training dataset,meaning that the deep-learning method can be effectively used for deposit prospecting prediction.Using soil active geochemical measurement data,this method was applied in the Daqiao area,Gansu Province,for which seven favorable gold prospecting target areas were predicted.The Daqiao orogenic gold deposit of latest Jurassic and Early Jurassic age in the southern domain has more than 105 t of gold resources at an average grade of 3-4 g/t.In 2020,the project team drilled and verified the K prediction area,and found 66 m gold mineralized bodies.The new method should be applicable to prospecting prediction using conventional geochemical data in other areas.展开更多
Guest Editors (Editorial Board of EET):Prof. Wei-Dong He Univ. of Electron. Sci. & Tech. of China weidong.he@uestc.edu.cnProf. Terrenee Mak The Chinese Univ. ofHong Kong, stmak@cse.cuhk.edu.hkProf. Qiang Li Univ. ...Guest Editors (Editorial Board of EET):Prof. Wei-Dong He Univ. of Electron. Sci. & Tech. of China weidong.he@uestc.edu.cnProf. Terrenee Mak The Chinese Univ. ofHong Kong, stmak@cse.cuhk.edu.hkProf. Qiang Li Univ. of Electron. Sci. & Tech. of China qli@uestc.edu.cnProf. Wei-Sheng Zhao Centre National de la Recherche Scientifique (National Center for Scientific Research) weisheng.zhao@u-psud, fr From energy generation to transportation, from energy distribution to storage, from semiconductor processing to communications, and from portable devices to data centers, energy consumption has grown to be a major limitation to usability and performance. Therefore, energy-efficient technologies become an active research area motivated by energy necessity and environmental concerns. With energy-efficient technologies, a number of epoch-making technical approaches can be expected. Energy efficiency technologies are affecting all forms of energy conversion and all aspects of life.展开更多
This paper presents a literature review in the field of summarizing software artifacts, focusing on bug reports, source code, mailing lists and developer discussions artifacts. From Jan. 2010 to Apr. 2016, numerous su...This paper presents a literature review in the field of summarizing software artifacts, focusing on bug reports, source code, mailing lists and developer discussions artifacts. From Jan. 2010 to Apr. 2016, numerous summarization techniques, approaches, and tools have been proposed to satisfy the ongoing demand of improving software performance and quality and facilitating developers in understanding the problems at hand. Since aforementioned artifacts contain both structured and unstructured data at the same time, researchers have applied different machine learning and data mining techniques to generate summaries. Therefore, this paper first intends to provide a general perspective on the state of the art, describing the type of artifacts, approaches for summarization, as well as the common portions of experimental procedures shared among these artifacts. Moreover, we discuss the applications of summarization, i.e., what tasks at hand have been achieved through summarization. Next, this paper presents tools that are generated for summarization tasks or employed during summarization tasks. In addition, we present different summarization evaluation methods employed in selected studies as well as other important factors that are used for the evaluation of generated summaries such as adequacy and quality. Moreover, we briefly present modern communication channels and complementarities with commonalities among different software artifacts. Finally, some thoughts about the challenges applicable to the existing studies in general as well as future research directions are also discussed. The survey of existing studies will allow future researchers to have a wide and useful background knowledge on the main and important aspects of this research field.展开更多
The problem of matching schemas or ontologies consists of providing corresponding entities in two or more knowledge models that belong to a same domain but have been developed separately. Nowadays there are a lot of t...The problem of matching schemas or ontologies consists of providing corresponding entities in two or more knowledge models that belong to a same domain but have been developed separately. Nowadays there are a lot of techniques and tools for addressing this problem, however, the complex nature of the matching problem make existing solutions for real situations not fully satisfactory. The Google Similarity Distance has appeared recently. Its purpose is to mine knowledge from the Web using the Google search engine in order to semantically compare text expressions. Our work consists of developing a software application for validating results discovered by schema and ontolog2/ matching tools using the philosophy behind this distance. Moreover, we are interested in using not only Google, but other popular search engines with this similarity distance. The results reveal three main facts. Firstly, some web search engines can help us to validate semantic correspondences satisfactorily. Secondly there are significant differences among the web search engines. And thirdly the best results are obtained when using combinations of the web search engines that we have studied.展开更多
文摘Engineering data are separately organized and their schemas are increasingly complex and variable. Engineering data management systems are needed to be able to manage the unified data and to be both customizable and extensible. The design of the systems is heavily dependent on the flexibility and self-description of the data model. The characteristics of engineering data and their management facts are analyzed. Then engineering data warehouse (EDW) architecture and multi-layer metamodels are presented. Also an approach to manage anduse engineering data by a meta object is proposed. Finally, an application flight test EDW system (FTEDWS) is described and meta-objects to manage engineering data in the data warehouse are used. It shows that adopting a meta-modeling approach provides a support for interchangeability and a sufficiently flexible environment in which the system evolution and the reusability can be handled.
文摘The nature of the measured data varies among different disciplines of geosciences.In rock engineering,features of data play a leading role in determining the feasible methods of its proper manipulation.The present study focuses on resolving one of the major deficiencies of conventional neural networks(NNs)in dealing with rock engineering data.Herein,since the samples are obtained from hundreds of meters below the surface with the utmost difficulty,the number of samples is always limited.Meanwhile,the experimental analysis of these samples may result in many repetitive values and 0 s.However,conventional neural networks are incapable of making robust models in the presence of such data.On the other hand,these networks strongly depend on the initial weights and bias values for making reliable predictions.With this in mind,the current research introduces a novel kind of neural network processing framework for the geological that does not suffer from the limitations of the conventional NNs.The introduced single-data-based feature engineering network extracts all the information wrapped in every single data point without being affected by the other points.This method,being completely different from the conventional NNs,re-arranges all the basic elements of the neuron model into a new structure.Therefore,its mathematical calculations were performed from the very beginning.Moreover,the corresponding programming codes were developed in MATLAB and Python since they could not be found in any common programming software at the time being.This new kind of network was first evaluated through computer-based simulations of rock cracks in the 3 DEC environment.After the model’s reliability was confirmed,it was adopted in two case studies for estimating respectively tensile strength and shear strength of real rock samples.These samples were coal core samples from the Southern Qinshui Basin of China,and gas hydrate-bearing sediment(GHBS)samples from the Nankai Trough of Japan.The coal samples used in the experiments underwent nuclear magnetic resonance(NMR)measurements,and Scanning Electron Microscopy(SEM)imaging to investigate their original micro and macro fractures.Once done with these experiments,measurement of the rock mechanical properties,including tensile strength,was performed using a rock mechanical test system.However,the shear strength of GHBS samples was acquired through triaxial and direct shear tests.According to the obtained result,the new network structure outperformed the conventional neural networks in both cases of simulation-based and case study estimations of the tensile and shear strength.Even though the proposed approach of the current study originally aimed at resolving the issue of having a limited dataset,its unique properties would also be applied to larger datasets from other subsurface measurements.
文摘This paper describes how database information and electronic 3D models are integrated to produce power plant designs more efficiently and accurately. Engineering CAD/CAE systems have evolved from strictly 3D modeling to spatial data management tools. This paper describes how process data, commodities, and location data are disseminated to the various project team members through a central integrated database. The database and 3D model also provide a cache of information that is valuable to the constructor, and operations and maintenance Personnel.
文摘With the advent of Big Data, the fields of Statistics and Computer Science coexist in current information systems. In addition to this, technological advances in embedded systems, in particular Internet of Things technologies, make it possible to develop real-time applications. These technological developments are disrupting Software Engineering because the use of large amounts of real-time data requires advanced thinking in terms of software architecture. The purpose of this article is to propose an architecture unifying not only Software Engineering and Big Data activities, but also batch and streaming architectures for the exploitation of massive data. This architecture has the advantage of making possible the development of applications and digital services exploiting very large volumes of data in real time;both for management needs and for analytical purposes. This architecture was tested on COVID-19 data as part of the development of an application for real-time monitoring of the evolution of the pandemic in Côte d’Ivoire using PostgreSQL, ELasticsearch, Kafka, Kafka Connect, NiFi, Spark, Node-Red and MoleculerJS to operationalize the architecture.
文摘With a review of the recent development in digitalization and application of seabed data, this paper systematically proposed methods for integrating seabed data by analyzing its feature based on ORACLE database management system and advanced techniques of spatial data management. We did research on storage structure of seabed data, distributed-integrated database system, standardized spatial database and seabed metadata management system in order to effectively manage and use these seabed information in practical application. Finally, we applied the methods researched and proposed in this paper to build the Bohai Sea engineering geology database that stores engineering geology data and other seabed information from the Bohai Sea area. As a result, the Bohai Sea engineering geology database can effectively integrate huge amount of distributed and complicated seabed data to meet the practical requisition of Bohai Sea en-gineering geology environment exploration and exploitation.
文摘Through analyzing theprinciple of data sharing in the data-base system, this paper discusses theprinciple and method for integratingand sharing GIS data by data engine,introduces a way to achieve the highintegration and sharing of GIS data on the basis of VCT in VC++, and pro-vides the method for uniting VCT intoRDBMS in order to implement a spa-tial database with object-oriented datamodel.
基金This project is supported by National Natural Science Foundation of China(No. 10272033) and Provincial Natural Science Foundation of Guangdong,China(No.04105385).
文摘A new B-spline surface reconstruction method from layer data based on deformable model is presented. An initial deformable surface, which is represented as a closed cylinder, is firstly given. The surface is subject to internal forces describing its implicit smoothness property and external forces attracting it toward the layer data points. And then finite element method is adopted to solve its energy minimization problem, which results a bicubic closed B-spline surface with C^2 continuity. The proposed method can provide a smoothness and accurate surface model directly from the layer data, without the need to fit cross-sectional curves and make them compatible. The feasibility of the proposed method is verified by the experimental results.
文摘In this study, we delve into the realm of efficient Big Data Engineering and Extract, Transform, Load (ETL) processes within the healthcare sector, leveraging the robust foundation provided by the MIMIC-III Clinical Database. Our investigation entails a comprehensive exploration of various methodologies aimed at enhancing the efficiency of ETL processes, with a primary emphasis on optimizing time and resource utilization. Through meticulous experimentation utilizing a representative dataset, we shed light on the advantages associated with the incorporation of PySpark and Docker containerized applications. Our research illuminates significant advancements in time efficiency, process streamlining, and resource optimization attained through the utilization of PySpark for distributed computing within Big Data Engineering workflows. Additionally, we underscore the strategic integration of Docker containers, delineating their pivotal role in augmenting scalability and reproducibility within the ETL pipeline. This paper encapsulates the pivotal insights gleaned from our experimental journey, accentuating the practical implications and benefits entailed in the adoption of PySpark and Docker. By streamlining Big Data Engineering and ETL processes in the context of clinical big data, our study contributes to the ongoing discourse on optimizing data processing efficiency in healthcare applications. The source code is available on request.
基金funded by a pilot project entitled“Deep Geological Survey of Benxi-Linjiang Area”(1212011220247)of the 3D Geological Mapping and Deep Geological Survey of China Geological Survey。
文摘A method is proposed for the prospecting prediction of subsurface mineral deposits based on soil geochemistry data and a deep convolutional neural network model.This method uses three techniques(window offset,scaling,and rotation)to enhance the number of training data for the model.A window area is used to extract the spatial distribution characteristics of soil geochemistry and measure their correspondence with the occurrence of known subsurface deposits.Prospecting prediction is achieved by matching the characteristics of the window area of an unknown area with the relationships established in the known area.This method can efficiently predict mineral prospective areas where there are few ore deposits used for generating the training dataset,meaning that the deep-learning method can be effectively used for deposit prospecting prediction.Using soil active geochemical measurement data,this method was applied in the Daqiao area,Gansu Province,for which seven favorable gold prospecting target areas were predicted.The Daqiao orogenic gold deposit of latest Jurassic and Early Jurassic age in the southern domain has more than 105 t of gold resources at an average grade of 3-4 g/t.In 2020,the project team drilled and verified the K prediction area,and found 66 m gold mineralized bodies.The new method should be applicable to prospecting prediction using conventional geochemical data in other areas.
文摘Guest Editors (Editorial Board of EET):Prof. Wei-Dong He Univ. of Electron. Sci. & Tech. of China weidong.he@uestc.edu.cnProf. Terrenee Mak The Chinese Univ. ofHong Kong, stmak@cse.cuhk.edu.hkProf. Qiang Li Univ. of Electron. Sci. & Tech. of China qli@uestc.edu.cnProf. Wei-Sheng Zhao Centre National de la Recherche Scientifique (National Center for Scientific Research) weisheng.zhao@u-psud, fr From energy generation to transportation, from energy distribution to storage, from semiconductor processing to communications, and from portable devices to data centers, energy consumption has grown to be a major limitation to usability and performance. Therefore, energy-efficient technologies become an active research area motivated by energy necessity and environmental concerns. With energy-efficient technologies, a number of epoch-making technical approaches can be expected. Energy efficiency technologies are affecting all forms of energy conversion and all aspects of life.
基金This work was supported in part by the National Basic Research 973 Program of China under Grant No. 2013CB035906, the Fundamental Research Funds for the Central Universities of China under Grant No. DUT13RC(3)53, and in part by the New Century Excellent Talents in University of China under Grant No. NCET-13-0073 and the National Natural Science Foundation of China under Grant No. 61300017.
文摘This paper presents a literature review in the field of summarizing software artifacts, focusing on bug reports, source code, mailing lists and developer discussions artifacts. From Jan. 2010 to Apr. 2016, numerous summarization techniques, approaches, and tools have been proposed to satisfy the ongoing demand of improving software performance and quality and facilitating developers in understanding the problems at hand. Since aforementioned artifacts contain both structured and unstructured data at the same time, researchers have applied different machine learning and data mining techniques to generate summaries. Therefore, this paper first intends to provide a general perspective on the state of the art, describing the type of artifacts, approaches for summarization, as well as the common portions of experimental procedures shared among these artifacts. Moreover, we discuss the applications of summarization, i.e., what tasks at hand have been achieved through summarization. Next, this paper presents tools that are generated for summarization tasks or employed during summarization tasks. In addition, we present different summarization evaluation methods employed in selected studies as well as other important factors that are used for the evaluation of generated summaries such as adequacy and quality. Moreover, we briefly present modern communication channels and complementarities with commonalities among different software artifacts. Finally, some thoughts about the challenges applicable to the existing studies in general as well as future research directions are also discussed. The survey of existing studies will allow future researchers to have a wide and useful background knowledge on the main and important aspects of this research field.
基金supported by Spanish Ministry of Innovation and Science through REALIDAD:Gestion,Analisis y Explotacion Eficiente de Datos Vinculados under Grant No.TIN2011-25840
文摘The problem of matching schemas or ontologies consists of providing corresponding entities in two or more knowledge models that belong to a same domain but have been developed separately. Nowadays there are a lot of techniques and tools for addressing this problem, however, the complex nature of the matching problem make existing solutions for real situations not fully satisfactory. The Google Similarity Distance has appeared recently. Its purpose is to mine knowledge from the Web using the Google search engine in order to semantically compare text expressions. Our work consists of developing a software application for validating results discovered by schema and ontolog2/ matching tools using the philosophy behind this distance. Moreover, we are interested in using not only Google, but other popular search engines with this similarity distance. The results reveal three main facts. Firstly, some web search engines can help us to validate semantic correspondences satisfactorily. Secondly there are significant differences among the web search engines. And thirdly the best results are obtained when using combinations of the web search engines that we have studied.