In this paper, we adopt a novel applied approach to fault analysis based on data mining theory. In our researches, global information will be introduced into the electric power system, we are using mainly cluster anal...In this paper, we adopt a novel applied approach to fault analysis based on data mining theory. In our researches, global information will be introduced into the electric power system, we are using mainly cluster analysis technology of data mining theory to resolve quickly and exactly detection of fault components and fault sections, and finally accomplish fault analysis. The main technical contributions and innovations in this paper include, introducing global information into electrical engineering, developing a new application to fault analysis in electrical engineering. Data mining theory is defined as the process of automatically extracting valid, novel, potentially useful and ultimately comprehensive information from large databases. It has been widely utilized in both academic and applied scientific researches in which the data sets are generated by experiments. Data mining theory will contribute a lot in the study of electrical engineering.展开更多
Client software on mobile devices that can cause the remote control perform data mining tasks and show production results is significantly added the value for the nomadic users and organizations that need to perform d...Client software on mobile devices that can cause the remote control perform data mining tasks and show production results is significantly added the value for the nomadic users and organizations that need to perform data analysis stored in the repository, far away from the site, where users work, allowing them to generate knowledge regardless of their physical location. This paper presents new data analysis methods and new ways to detect people work location via mobile computing technology. The growing number of applications, content, and data can be accessed from a wide range of devices. It becomes necessary to introduce a centralized mobile device management. MDM is a KDE software package working with enterprise systems using mobile devices. The paper discussed the design system in detail.展开更多
A comprehensive but simple-to-use software package called DPS (Data Pro- cessing System) has been developed to execute a range of standard numerical analyses and operations used in experimental design, statistics an...A comprehensive but simple-to-use software package called DPS (Data Pro- cessing System) has been developed to execute a range of standard numerical analyses and operations used in experimental design, statistics and data mining. This program runs on standard Windows computers. Many of the functions are specific to entomological and other biological research and are not found in standard statistical sottware. This paper presents applications of DPS to experimental design, statistical analysis and data mining in entomology.展开更多
Prediction and diagnosis of cardiovascular diseases(CVDs)based,among other things,on medical examinations and patient symptoms are the biggest challenges in medicine.About 17.9 million people die from CVDs annually,ac...Prediction and diagnosis of cardiovascular diseases(CVDs)based,among other things,on medical examinations and patient symptoms are the biggest challenges in medicine.About 17.9 million people die from CVDs annually,accounting for 31%of all deaths worldwide.With a timely prognosis and thorough consideration of the patient’s medical history and lifestyle,it is possible to predict CVDs and take preventive measures to eliminate or control this life-threatening disease.In this study,we used various patient datasets from a major hospital in the United States as prognostic factors for CVD.The data was obtained by monitoring a total of 918 patients whose criteria for adults were 28-77 years old.In this study,we present a data mining modeling approach to analyze the performance,classification accuracy and number of clusters on Cardiovascular Disease Prognostic datasets in unsupervised machine learning(ML)using the Orange data mining software.Various techniques are then used to classify the model parameters,such as k-nearest neighbors,support vector machine,random forest,artificial neural network(ANN),naïve bayes,logistic regression,stochastic gradient descent(SGD),and AdaBoost.To determine the number of clusters,various unsupervised ML clustering methods were used,such as k-means,hierarchical,and density-based spatial clustering of applications with noise clustering.The results showed that the best model performance analysis and classification accuracy were SGD and ANN,both of which had a high score of 0.900 on Cardiovascular Disease Prognostic datasets.Based on the results of most clustering methods,such as k-means and hierarchical clustering,Cardiovascular Disease Prognostic datasets can be divided into two clusters.The prognostic accuracy of CVD depends on the accuracy of the proposed model in determining the diagnostic model.The more accurate the model,the better it can predict which patients are at risk for CVD.展开更多
Background knowledge is important for data mining, especially in complicated situation. Ontological engineering is the successor of knowledge engineering. The sharable knowledge bases built on ontology can be used to ...Background knowledge is important for data mining, especially in complicated situation. Ontological engineering is the successor of knowledge engineering. The sharable knowledge bases built on ontology can be used to provide background knowledge to direct the process of data mining. This paper gives a common introduction to the method and presents a practical analysis example using SVM (support vector machine) as the classifier. Gene Ontology and the accompanying annotations compose a big knowledge base, on which many researches have been carried out. Microarray dataset is the output of DNA chip. With the help of Gene Ontology we present a more elaborate analysis on microarray data than former researchers. The method can also be used in other fields with similar scenario.展开更多
With the advent of Big Data, the fields of Statistics and Computer Science coexist in current information systems. In addition to this, technological advances in embedded systems, in particular Internet of Things tech...With the advent of Big Data, the fields of Statistics and Computer Science coexist in current information systems. In addition to this, technological advances in embedded systems, in particular Internet of Things technologies, make it possible to develop real-time applications. These technological developments are disrupting Software Engineering because the use of large amounts of real-time data requires advanced thinking in terms of software architecture. The purpose of this article is to propose an architecture unifying not only Software Engineering and Big Data activities, but also batch and streaming architectures for the exploitation of massive data. This architecture has the advantage of making possible the development of applications and digital services exploiting very large volumes of data in real time;both for management needs and for analytical purposes. This architecture was tested on COVID-19 data as part of the development of an application for real-time monitoring of the evolution of the pandemic in Côte d’Ivoire using PostgreSQL, ELasticsearch, Kafka, Kafka Connect, NiFi, Spark, Node-Red and MoleculerJS to operationalize the architecture.展开更多
BETA-85 is the kernel of an integrated software engineering environment, hosted by UNIX operating system. It is general-purposed and open-ended, using programming language C as its base language and supporting a varie...BETA-85 is the kernel of an integrated software engineering environment, hosted by UNIX operating system. It is general-purposed and open-ended, using programming language C as its base language and supporting a variety of software development and maintenance methodologies.BETA-85 is organized as a hierarchical structure of environment work bench which, corresponds to a multi-base facility for organizing and managing information entities in the environment. A general-purposed interactive editing system is designed as its user interface. The technical and managerial supports at different levels are specially provided for programming in the small, in the large, and in the many. Therefore, the visibility and traceability of software engineering project are greatly increased, the software productivity is significantly raised, the quality of software products is effectively improved, and the cost of software development and maintenance is strictly controlled.展开更多
The fact that most engineering applications are developed by engineers themselves rather than computer professionals calls for the data modeling methods to be powerful enough to represent complex engineering phenomena...The fact that most engineering applications are developed by engineers themselves rather than computer professionals calls for the data modeling methods to be powerful enough to represent complex engineering phenomena, but simple enough to use. A data modeling method which can help engineers to write C++ code with high quality is introduced.展开更多
The analytical capacity of massive data has become increasingly necessary, given the high volume of data that has been generated daily by different sources. The data sources are varied and can generate a huge amount o...The analytical capacity of massive data has become increasingly necessary, given the high volume of data that has been generated daily by different sources. The data sources are varied and can generate a huge amount of data, which can be processed in batch or stream settings. The stream setting corresponds to the treatment of a continuous sequence of data that arrives in real-time flow and needs to be processed in real-time. The models, tools, methods and algorithms for generating intelligence from data stream culminate in the approaches of Data Stream Mining and Data Stream Learning. The activities of such approaches can be organized and structured according to Engineering principles, thus allowing the principles of Analytical Engineering, or more specifically, Analytical Engineering for Data Stream (AEDS). Thus, this article presents the AEDS conceptual framework composed of four pillars (Data, Model, Tool, People) and three processes (Acquisition, Retention, Review). The definition of these pillars and processes is carried out based on the main components of data stream setting, corresponding to four pillars, and also on the necessity to operationalize the activities of an Analytical Organization (AO) in the use of AEDS four pillars, which determines the three proposed processes. The AEDS framework favors the projects carried out in an AO, that is, its Analytical Projects (AP), to favor the delivery of results, or Analytical Deliverables (AD), carried out by the Analytical Teams (AT) in order to provide intelligence from stream data.展开更多
In the early time of oilfield development, insufficient production data and unclear understanding of oil production presented a challenge to reservoir engineers in devising effective development plans. To address this...In the early time of oilfield development, insufficient production data and unclear understanding of oil production presented a challenge to reservoir engineers in devising effective development plans. To address this challenge, this study proposes a method using data mining technology to search for similar oil fields and predict well productivity. A query system of 135 analogy parameters is established based on geological and reservoir engineering research, and the weight values of these parameters are calculated using a data algorithm to establish an analogy system. The fuzzy matter-element algorithm is then used to calculate the similarity between oil fields, with fields having similarity greater than 70% identified as similar oil fields. Using similar oil fields as sample data, 8 important factors affecting well productivity are identified using the Pearson coefficient and mean decrease impurity(MDI) method. To establish productivity prediction models, linear regression(LR), random forest regression(RF), support vector regression(SVR), backpropagation(BP), extreme gradient boosting(XGBoost), and light gradient boosting machine(Light GBM) algorithms are used. Their performance is evaluated using the coefficient of determination(R^(2)), explained variance score(EV), mean squared error(MSE), and mean absolute error(MAE) metrics. The Light GBM model is selected to predict the productivity of 30 wells in the PL field with an average error of only 6.31%, which significantly improves the accuracy of the productivity prediction and meets the application requirements in the field. Finally, a software platform integrating data query,oil field analogy, productivity prediction, and knowledge base is established to identify patterns in massive reservoir development data and provide valuable technical references for new reservoir development.展开更多
文摘In this paper, we adopt a novel applied approach to fault analysis based on data mining theory. In our researches, global information will be introduced into the electric power system, we are using mainly cluster analysis technology of data mining theory to resolve quickly and exactly detection of fault components and fault sections, and finally accomplish fault analysis. The main technical contributions and innovations in this paper include, introducing global information into electrical engineering, developing a new application to fault analysis in electrical engineering. Data mining theory is defined as the process of automatically extracting valid, novel, potentially useful and ultimately comprehensive information from large databases. It has been widely utilized in both academic and applied scientific researches in which the data sets are generated by experiments. Data mining theory will contribute a lot in the study of electrical engineering.
文摘Client software on mobile devices that can cause the remote control perform data mining tasks and show production results is significantly added the value for the nomadic users and organizations that need to perform data analysis stored in the repository, far away from the site, where users work, allowing them to generate knowledge regardless of their physical location. This paper presents new data analysis methods and new ways to detect people work location via mobile computing technology. The growing number of applications, content, and data can be accessed from a wide range of devices. It becomes necessary to introduce a centralized mobile device management. MDM is a KDE software package working with enterprise systems using mobile devices. The paper discussed the design system in detail.
文摘A comprehensive but simple-to-use software package called DPS (Data Pro- cessing System) has been developed to execute a range of standard numerical analyses and operations used in experimental design, statistics and data mining. This program runs on standard Windows computers. Many of the functions are specific to entomological and other biological research and are not found in standard statistical sottware. This paper presents applications of DPS to experimental design, statistical analysis and data mining in entomology.
文摘Prediction and diagnosis of cardiovascular diseases(CVDs)based,among other things,on medical examinations and patient symptoms are the biggest challenges in medicine.About 17.9 million people die from CVDs annually,accounting for 31%of all deaths worldwide.With a timely prognosis and thorough consideration of the patient’s medical history and lifestyle,it is possible to predict CVDs and take preventive measures to eliminate or control this life-threatening disease.In this study,we used various patient datasets from a major hospital in the United States as prognostic factors for CVD.The data was obtained by monitoring a total of 918 patients whose criteria for adults were 28-77 years old.In this study,we present a data mining modeling approach to analyze the performance,classification accuracy and number of clusters on Cardiovascular Disease Prognostic datasets in unsupervised machine learning(ML)using the Orange data mining software.Various techniques are then used to classify the model parameters,such as k-nearest neighbors,support vector machine,random forest,artificial neural network(ANN),naïve bayes,logistic regression,stochastic gradient descent(SGD),and AdaBoost.To determine the number of clusters,various unsupervised ML clustering methods were used,such as k-means,hierarchical,and density-based spatial clustering of applications with noise clustering.The results showed that the best model performance analysis and classification accuracy were SGD and ANN,both of which had a high score of 0.900 on Cardiovascular Disease Prognostic datasets.Based on the results of most clustering methods,such as k-means and hierarchical clustering,Cardiovascular Disease Prognostic datasets can be divided into two clusters.The prognostic accuracy of CVD depends on the accuracy of the proposed model in determining the diagnostic model.The more accurate the model,the better it can predict which patients are at risk for CVD.
基金Project (No. 20040248001) supported by the Ph.D. Programs Foun-dation of Ministry of Education of China
文摘Background knowledge is important for data mining, especially in complicated situation. Ontological engineering is the successor of knowledge engineering. The sharable knowledge bases built on ontology can be used to provide background knowledge to direct the process of data mining. This paper gives a common introduction to the method and presents a practical analysis example using SVM (support vector machine) as the classifier. Gene Ontology and the accompanying annotations compose a big knowledge base, on which many researches have been carried out. Microarray dataset is the output of DNA chip. With the help of Gene Ontology we present a more elaborate analysis on microarray data than former researchers. The method can also be used in other fields with similar scenario.
文摘With the advent of Big Data, the fields of Statistics and Computer Science coexist in current information systems. In addition to this, technological advances in embedded systems, in particular Internet of Things technologies, make it possible to develop real-time applications. These technological developments are disrupting Software Engineering because the use of large amounts of real-time data requires advanced thinking in terms of software architecture. The purpose of this article is to propose an architecture unifying not only Software Engineering and Big Data activities, but also batch and streaming architectures for the exploitation of massive data. This architecture has the advantage of making possible the development of applications and digital services exploiting very large volumes of data in real time;both for management needs and for analytical purposes. This architecture was tested on COVID-19 data as part of the development of an application for real-time monitoring of the evolution of the pandemic in Côte d’Ivoire using PostgreSQL, ELasticsearch, Kafka, Kafka Connect, NiFi, Spark, Node-Red and MoleculerJS to operationalize the architecture.
文摘BETA-85 is the kernel of an integrated software engineering environment, hosted by UNIX operating system. It is general-purposed and open-ended, using programming language C as its base language and supporting a variety of software development and maintenance methodologies.BETA-85 is organized as a hierarchical structure of environment work bench which, corresponds to a multi-base facility for organizing and managing information entities in the environment. A general-purposed interactive editing system is designed as its user interface. The technical and managerial supports at different levels are specially provided for programming in the small, in the large, and in the many. Therefore, the visibility and traceability of software engineering project are greatly increased, the software productivity is significantly raised, the quality of software products is effectively improved, and the cost of software development and maintenance is strictly controlled.
文摘The fact that most engineering applications are developed by engineers themselves rather than computer professionals calls for the data modeling methods to be powerful enough to represent complex engineering phenomena, but simple enough to use. A data modeling method which can help engineers to write C++ code with high quality is introduced.
文摘The analytical capacity of massive data has become increasingly necessary, given the high volume of data that has been generated daily by different sources. The data sources are varied and can generate a huge amount of data, which can be processed in batch or stream settings. The stream setting corresponds to the treatment of a continuous sequence of data that arrives in real-time flow and needs to be processed in real-time. The models, tools, methods and algorithms for generating intelligence from data stream culminate in the approaches of Data Stream Mining and Data Stream Learning. The activities of such approaches can be organized and structured according to Engineering principles, thus allowing the principles of Analytical Engineering, or more specifically, Analytical Engineering for Data Stream (AEDS). Thus, this article presents the AEDS conceptual framework composed of four pillars (Data, Model, Tool, People) and three processes (Acquisition, Retention, Review). The definition of these pillars and processes is carried out based on the main components of data stream setting, corresponding to four pillars, and also on the necessity to operationalize the activities of an Analytical Organization (AO) in the use of AEDS four pillars, which determines the three proposed processes. The AEDS framework favors the projects carried out in an AO, that is, its Analytical Projects (AP), to favor the delivery of results, or Analytical Deliverables (AD), carried out by the Analytical Teams (AT) in order to provide intelligence from stream data.
基金supported by the National Natural Science Fund of China (No.52104049)the Science Foundation of China University of Petroleum,Beijing (No.2462022BJRC004)。
文摘In the early time of oilfield development, insufficient production data and unclear understanding of oil production presented a challenge to reservoir engineers in devising effective development plans. To address this challenge, this study proposes a method using data mining technology to search for similar oil fields and predict well productivity. A query system of 135 analogy parameters is established based on geological and reservoir engineering research, and the weight values of these parameters are calculated using a data algorithm to establish an analogy system. The fuzzy matter-element algorithm is then used to calculate the similarity between oil fields, with fields having similarity greater than 70% identified as similar oil fields. Using similar oil fields as sample data, 8 important factors affecting well productivity are identified using the Pearson coefficient and mean decrease impurity(MDI) method. To establish productivity prediction models, linear regression(LR), random forest regression(RF), support vector regression(SVR), backpropagation(BP), extreme gradient boosting(XGBoost), and light gradient boosting machine(Light GBM) algorithms are used. Their performance is evaluated using the coefficient of determination(R^(2)), explained variance score(EV), mean squared error(MSE), and mean absolute error(MAE) metrics. The Light GBM model is selected to predict the productivity of 30 wells in the PL field with an average error of only 6.31%, which significantly improves the accuracy of the productivity prediction and meets the application requirements in the field. Finally, a software platform integrating data query,oil field analogy, productivity prediction, and knowledge base is established to identify patterns in massive reservoir development data and provide valuable technical references for new reservoir development.