Tuberculosis remains an important problem in public health that threatens the world, including the Philippines. Treatment relapse continues to place a severe problem on patients and TB programs worldwide. A significan...Tuberculosis remains an important problem in public health that threatens the world, including the Philippines. Treatment relapse continues to place a severe problem on patients and TB programs worldwide. A significant reason for the development of decline is poor compliance with medical treatments. The objectives of this research are to generate a predictive data mining model to classify the treatment relapse of TB patients and to identify the features influencing the category of treatment relapse. The TB patient dataset is applied and tested in decision tree J48 algorithm using WEKA. The J48 model identified the three (3) significant independent variables (DSSM Result, Age, and Sex) as predictors of category treatment relapse.展开更多
Two important performance indicators for data mining algorithms are accuracy of classification/ prediction and time taken for training. These indicators are useful for selecting best algorithms for classification/pred...Two important performance indicators for data mining algorithms are accuracy of classification/ prediction and time taken for training. These indicators are useful for selecting best algorithms for classification/prediction tasks in data mining. Empirical studies on these performance indicators in data mining are few. Therefore, this study was designed to determine how data mining classification algorithm perform with increase in input data sizes. Three data mining classification algorithms—Decision Tree, Multi-Layer Perceptron (MLP) Neural Network and Naïve Bayes— were subjected to varying simulated data sizes. The time taken by the algorithms for trainings and accuracies of their classifications were analyzed for the different data sizes. Results show that Naïve Bayes takes least time to train data but with least accuracy as compared to MLP and Decision Tree algorithms.展开更多
Under the modern education system of China, the annual scholarship evaluation is a vital thing for many of the collegestudents. This paper adopts the classification algorithm of decision tree C4.5 based on the betteri...Under the modern education system of China, the annual scholarship evaluation is a vital thing for many of the collegestudents. This paper adopts the classification algorithm of decision tree C4.5 based on the bettering of ID3 algorithm and constructa data set of the scholarship evaluation system through the analysis of the related attributes in scholarship evaluation information.And also having found some factors that plays a significant role in the growing up of the college students through analysis and re-search of moral education, intellectural education and culture&PE.展开更多
Data mining is the process of extracting implicit but potentially useful information from incomplete, noisy, and fuzzy data. Data mining offers excellent nonlinear modeling and self-organized learning, and it can play...Data mining is the process of extracting implicit but potentially useful information from incomplete, noisy, and fuzzy data. Data mining offers excellent nonlinear modeling and self-organized learning, and it can play a vital role in the interpretation of well logging data of complex reservoirs. We used data mining to identify the lithologies in a complex reservoir. The reservoir lithologies served as the classification task target and were identified using feature extraction, feature selection, and modeling of data streams. We used independent component analysis to extract information from well curves. We then used the branch-and- bound algorithm to look for the optimal feature subsets and eliminate redundant information. Finally, we used the C5.0 decision-tree algorithm to set up disaggregated models of the well logging curves. The modeling and actual logging data were in good agreement, showing the usefulness of data mining methods in complex reservoirs.展开更多
Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting mo...Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting model of agro-meteorological disaster grade was established by adopting the C4.5 classification algorithm of decision tree,which can forecast the direct economic loss degree to provide rational data mining model and obtain effective analysis results.展开更多
Background:Given the importance of customers as the most valuable assets of organizations,customer retention seems to be an essential,basic requirement for any organization.Banks are no exception to this rule.The comp...Background:Given the importance of customers as the most valuable assets of organizations,customer retention seems to be an essential,basic requirement for any organization.Banks are no exception to this rule.The competitive atmosphere within which electronic banking services are provided by different banks increases the necessity of customer retention.Methods:Being based on existing information technologies which allow one to collect data from organizations’databases,data mining introduces a powerful tool for the extraction of knowledge from huge amounts of data.In this research,the decision tree technique was applied to build a model incorporating this knowledge.Results:The results represent the characteristics of churned customers.Conclusions:Bank managers can identify churners in future using the results of decision tree.They should be provide some strategies for customers whose features are getting more likely to churner’s features.展开更多
Sheet metal is widely used on auto-bodies, plane-bodies and metal furniture, etc. For instance, a typical auto-body commonly consists of hundreds of sheet metal stamping parts. Because of its complexity of structure a...Sheet metal is widely used on auto-bodies, plane-bodies and metal furniture, etc. For instance, a typical auto-body commonly consists of hundreds of sheet metal stamping parts. Because of its complexity of structure and manufacturing process, auto-bodies inevitably have geometrical variation results from a number of different sources, such as the geometrical variation of stamping parts, the transformation of assembly process parameters and even the improper design concept. As more than 30% quality defects of an auto-body are born from the dimensional deviation of Body-In-White originated during the manufacturing process, effective diagnosis and control of dimensional faults are essential to the continuous improvement of the quality of vehicles. Especially during the period of new car launching or model changing when the assembly process was changed and adjusted frequently. For continuously improving the quality of modern cars, rapid dimensional variation causes identification becomes a challenging but essential work. In this paper, main variation causes of auto-body was firstly been cataloged and analyzed, then, a dimensional variation diagnostic reasoning and decision approach was developed through the combination of data mining and knowledge discovery techniques. This approach is driven by variation pattern identification which can be discovered from the dispersive, isolated massive measured data: Correlation Analysis (CA) and Maximal Tree (MT) methods were applied to extract the large variation group from massive multidimensional measured data, while multivariate statistical analysis (MSA) approach was used to discovery the principle variation pattern. A Decision Tree (DT) approach based on the knowledge of product and assembly process was developed to fulfill the "Hypothesis and Validation" characterized variation causes reasoning procedure. An practical application case with sudden and severe dimension variation on rear end panel in up/down direction was analyzed and successfully solved aided by the devloped variation diagnostic method, which have proved that the approach is effective and efficient.展开更多
This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the...This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the target class outcome in the leaf node's records that leads to a situation where majority voting cannot be applied. To solve the above mentioned exception, we propose to base the prediction of the result on the naive Bayes (NB) estimate, k-nearest neighbour (k-NN) and association rule mining (ARM). The other features used for splitting the parent nodes are also taken into consideration.展开更多
Corporations focus on web based education to train their employees ever more than before. Unlike traditional learning environments, web based education applications store large amount of data. This growing availabilit...Corporations focus on web based education to train their employees ever more than before. Unlike traditional learning environments, web based education applications store large amount of data. This growing availability of data stimulated the emergence of a new field called educational data mining. In this study, the classification method is implemented on a data that is obtained from a company which uses web based education to train their employees. The authors' aim is to find out the most critical factors that influence the users' success. For the classification of the data, two decision tree algorithms, Classification and Regression Tree (CART) and Quick, Unbiased and Efficient Statistical Tree (QUEST) are applied. According to the results, assurance of a certificate at the end of the training is found to be the most critical factor that influences the users' success. Position, number of work years and the education level of the user, are also found as important factors.展开更多
Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the mana...Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the management of patients with different characteristics. Methods: 170,246 outpatient data was extracted from the hospital management information system (HIS) during January 2016 to July 2016, 43,448 data was formed after the data cleaning. K-Means clustering algorithm was used to classify patients with chronic infectious diseases, and then C5.0 decision tree algorithm was used to predict the situation of patients with chronic infectious diseases. Results: Male patients accounted for 58.7%, patients living in Shanghai accounted for 85.6%. The average age of patients is 45.88 years old, the high incidence age is 25 to 65 years old. Patients was gathered into three categories: 1) Clusters 1—Important patients (4786 people, 11.72%, R = 2.89, F = 11.72, M = 84,302.95);2) Clustering 2—Major patients (23,103, 53.2%, R = 5.22, F = 3.45, M = 9146.39);3) Cluster 3—Potential patients (15,559 people, 35.8%, R = 19.77, F = 1.55, M = 1739.09). C5.0 decision tree algorithm was used to predict the treatment situation of patients with chronic infectious diseases, the final treatment time (weeks) is an important predictor, the accuracy rate is 99.94% verified by the confusion model. Conclusion: Medical institutions should strengthen the adherence education for patients with chronic infectious diseases, establish the chronic infectious diseases and customer relationship management database, take the initiative to help them improve treatment adherence. Chinese governments at all levels should speed up the construction of hospital information, establish the chronic infectious disease database, strengthen the blocking of mother-to-child transmission, to effectively curb chronic infectious diseases, reduce disease burden and mortality.展开更多
Trauma is the most common cause of death to young people and many of these deaths are preventable [1]. The prediction of trauma patients outcome was a difficult problem to investigate till present times. In this study...Trauma is the most common cause of death to young people and many of these deaths are preventable [1]. The prediction of trauma patients outcome was a difficult problem to investigate till present times. In this study, prediction models are built and their capabilities to accurately predict the mortality are assessed. The analysis includes a comparison of data mining techniques using classification, clustering and association algorithms. Data were collected by Hellenic Trauma and Emergency Surgery Society from 30 Greek hospitals. Dataset contains records of 8544 patients suffering from severe injuries collected from the year 2005 to 2006. Factors include patients' demographic elements and several other variables registered from the time and place of accident until the hospital treatment and final outcome. Using this analysis the obtained results are compared in terms of sensitivity, specificity, positive predictive value and negative predictive value and the ROC curve depicts these methods performance.展开更多
The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge ...The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge discovery is an important issue. Here, statistical and tools from computational intelligence are applied to analyze large data sets from meteorology and climate sciences. Our approach allows a geographical mapping of the statistical property to be easily interpreted by meteorologists. Our data analysis comprises two main steps of knowledge extraction, applied successively in order to reduce the complexity from the original data set. The goal is to identify a much smaller subset of climatic variables that might still be able to describe or even predict the probability of occurrence of an extreme event. The first step applies a class comparison technique: p-value estimation. The second step consists of a decision tree (DT) configured from the data available and the p-value analysis. The DT is used as a predictive model, identifying the most statistically significant climate variables of the precipitation intensity. The methodology is employed to the study the climatic causes of an extreme precipitation events occurred in Alagoas and Pernambuco States (Brazil) at June/2010.展开更多
China has the world’s largest planting area of paddy rice,but large quantities of paddy rice fall to the ground and are lost during harvesting with a combine harvester.Reducing grain loss is an effective way to incre...China has the world’s largest planting area of paddy rice,but large quantities of paddy rice fall to the ground and are lost during harvesting with a combine harvester.Reducing grain loss is an effective way to increase production and revenue.In this study,a monitoring system was developed to monitor the grain loss of the paddy rice and this approach was tested on the test bench for verifying the precision.The development of the monitoring system for grain loss included two stages:the first stage was to collect impact signals using a piezoelectric film,extract the four features of Root Mean Square,Peak number,Frequency and Amplitude(fundamental component),and identify the kernel impact signals using the J48(C4.5)Decision Tree algorithm.In the second stage,the precision of the monitoring system was tested for the paddy rice at three different moisture contents(10.4%,19.6%,and 30.4%)and five different grain/impurity ratios(1/0.5,1/1,1/1.5,1/2,and 1/2.5).According to the results,the highest monitoring accuracy was 99.3%(moisture content 30.8%and grain/impurity ratio 1/2.5),the average accuracy of the monitoring tests was 92.6%,and monitoring of grain/impurity ratios between 1/1 and 1/1.5(>95.4%)had higher accuracy than monitoring the other grain/impurity ratios.Monitoring accuracy decreased as impurities increased.The lowest accuracy for grain loss monitoring was obtained when the grain/impurity ratio was 1/2.5,with monitoring accuracies of 88.2%,75.7%and 78.8%at moisture contents of 10.4%,19.6%and 30.4%.展开更多
Data mining is a procedure of separating covered up,obscure,however possibly valuable data from gigantic data.Huge Data impactsly affects logical disclosures and worth creation.Data mining(DM)with Big Data has been br...Data mining is a procedure of separating covered up,obscure,however possibly valuable data from gigantic data.Huge Data impactsly affects logical disclosures and worth creation.Data mining(DM)with Big Data has been broadly utilized in the lifecycle of electronic items that range from the structure and generation stages to the administration organize.A far reaching examination of DM with Big Data and a survey of its application in the phases of its lifecycle won't just profit scientists to create solid research.As of late huge data have turned into a trendy expression,which constrained the analysts to extend the current data mining methods to adapt to the advanced idea of data and to grow new scientific procedures.In this paper,we build up an exact assessment technique dependent on the standard of Design of Experiment.We apply this technique to assess data mining instruments and AI calculations towards structure huge data examination for media transmission checking data.Two contextual investigations are directed to give bits of knowledge of relations between the necessities of data examination and the decision of an instrument or calculation with regards to data investigation work processes.展开更多
There is growing interest in power quality issues due to wider developments in power delivery engineering.In order to maintain good power quality,it is necessary to detect and monitor power quality problems.The power ...There is growing interest in power quality issues due to wider developments in power delivery engineering.In order to maintain good power quality,it is necessary to detect and monitor power quality problems.The power quality monitoring requires storing large amount of data for analysis.This rapid increase in the size of databases has demanded new technique such as data mining to assist in the analysis and understanding of the data.This paper presents the classification of power quality problems such as voltage sag,swell,interruption and unbalance using data mining algorithms:J48,Random Tree and Random Forest decision trees.These algorithms are implemented on two sets of voltage data using WEKA software.The numeric attributes in first data set include 3-phase RMS voltages at the point of common coupling.In second data set,three more numeric attributes such as minimum,maximum and average voltages,are added along with 3-phase RMS voltages.The performance of the algorithms is evaluated in both the cases to determine the best classification algorithm,and the effect of addition of the three attributes in the second case is studied,which depicts the advantages in terms of classification accuracy and training time of the decision trees.展开更多
Research on the quality of data in a structural calculation document(SCD)is lacking,although the SCD ofa bridge is used as an essential reference during the entire lifecycle of the facility.XML Schema matching enables...Research on the quality of data in a structural calculation document(SCD)is lacking,although the SCD ofa bridge is used as an essential reference during the entire lifecycle of the facility.XML Schema matching enables qualitative improvement of the stored data.This study aimed to enhance the applicability of XML Schema matching,which improves the speed and quality of information stored in bridge SCDs.First,the authors proposed a method of reducing the computing time for the schema matching of bridge SCDs.The computing speed of schema matching was increased by 13 to 1800 times by reducing the checking process of the correlations.Second,the authors developed a heuristic solution for selecting the optimal weight factors used in the matching process to maintain a high accuracy by introducing a decision tree.The decision tree model was built using the content elements stored in the SCD,design companies,bridge types,and weight factors as input variables,and the matching accuracy as the target variable.The inverse-calculation method was applied to extract the weight factors from the decision tree model for high-accuracy schema matching results.展开更多
文摘Tuberculosis remains an important problem in public health that threatens the world, including the Philippines. Treatment relapse continues to place a severe problem on patients and TB programs worldwide. A significant reason for the development of decline is poor compliance with medical treatments. The objectives of this research are to generate a predictive data mining model to classify the treatment relapse of TB patients and to identify the features influencing the category of treatment relapse. The TB patient dataset is applied and tested in decision tree J48 algorithm using WEKA. The J48 model identified the three (3) significant independent variables (DSSM Result, Age, and Sex) as predictors of category treatment relapse.
文摘Two important performance indicators for data mining algorithms are accuracy of classification/ prediction and time taken for training. These indicators are useful for selecting best algorithms for classification/prediction tasks in data mining. Empirical studies on these performance indicators in data mining are few. Therefore, this study was designed to determine how data mining classification algorithm perform with increase in input data sizes. Three data mining classification algorithms—Decision Tree, Multi-Layer Perceptron (MLP) Neural Network and Naïve Bayes— were subjected to varying simulated data sizes. The time taken by the algorithms for trainings and accuracies of their classifications were analyzed for the different data sizes. Results show that Naïve Bayes takes least time to train data but with least accuracy as compared to MLP and Decision Tree algorithms.
文摘Under the modern education system of China, the annual scholarship evaluation is a vital thing for many of the collegestudents. This paper adopts the classification algorithm of decision tree C4.5 based on the bettering of ID3 algorithm and constructa data set of the scholarship evaluation system through the analysis of the related attributes in scholarship evaluation information.And also having found some factors that plays a significant role in the growing up of the college students through analysis and re-search of moral education, intellectural education and culture&PE.
基金sponsored by the National Science and Technology Major Project(No.2011ZX05023-005-006)
文摘Data mining is the process of extracting implicit but potentially useful information from incomplete, noisy, and fuzzy data. Data mining offers excellent nonlinear modeling and self-organized learning, and it can play a vital role in the interpretation of well logging data of complex reservoirs. We used data mining to identify the lithologies in a complex reservoir. The reservoir lithologies served as the classification task target and were identified using feature extraction, feature selection, and modeling of data streams. We used independent component analysis to extract information from well curves. We then used the branch-and- bound algorithm to look for the optimal feature subsets and eliminate redundant information. Finally, we used the C5.0 decision-tree algorithm to set up disaggregated models of the well logging curves. The modeling and actual logging data were in good agreement, showing the usefulness of data mining methods in complex reservoirs.
基金Supported by Science and Technology Plan of Mudanjiang City (G200920064)Teaching Reform Construction of Mudanjiang Normal University (10-xj11080)
文摘Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting model of agro-meteorological disaster grade was established by adopting the C4.5 classification algorithm of decision tree,which can forecast the direct economic loss degree to provide rational data mining model and obtain effective analysis results.
文摘Background:Given the importance of customers as the most valuable assets of organizations,customer retention seems to be an essential,basic requirement for any organization.Banks are no exception to this rule.The competitive atmosphere within which electronic banking services are provided by different banks increases the necessity of customer retention.Methods:Being based on existing information technologies which allow one to collect data from organizations’databases,data mining introduces a powerful tool for the extraction of knowledge from huge amounts of data.In this research,the decision tree technique was applied to build a model incorporating this knowledge.Results:The results represent the characteristics of churned customers.Conclusions:Bank managers can identify churners in future using the results of decision tree.They should be provide some strategies for customers whose features are getting more likely to churner’s features.
文摘Sheet metal is widely used on auto-bodies, plane-bodies and metal furniture, etc. For instance, a typical auto-body commonly consists of hundreds of sheet metal stamping parts. Because of its complexity of structure and manufacturing process, auto-bodies inevitably have geometrical variation results from a number of different sources, such as the geometrical variation of stamping parts, the transformation of assembly process parameters and even the improper design concept. As more than 30% quality defects of an auto-body are born from the dimensional deviation of Body-In-White originated during the manufacturing process, effective diagnosis and control of dimensional faults are essential to the continuous improvement of the quality of vehicles. Especially during the period of new car launching or model changing when the assembly process was changed and adjusted frequently. For continuously improving the quality of modern cars, rapid dimensional variation causes identification becomes a challenging but essential work. In this paper, main variation causes of auto-body was firstly been cataloged and analyzed, then, a dimensional variation diagnostic reasoning and decision approach was developed through the combination of data mining and knowledge discovery techniques. This approach is driven by variation pattern identification which can be discovered from the dispersive, isolated massive measured data: Correlation Analysis (CA) and Maximal Tree (MT) methods were applied to extract the large variation group from massive multidimensional measured data, while multivariate statistical analysis (MSA) approach was used to discovery the principle variation pattern. A Decision Tree (DT) approach based on the knowledge of product and assembly process was developed to fulfill the "Hypothesis and Validation" characterized variation causes reasoning procedure. An practical application case with sudden and severe dimension variation on rear end panel in up/down direction was analyzed and successfully solved aided by the devloped variation diagnostic method, which have proved that the approach is effective and efficient.
文摘This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the target class outcome in the leaf node's records that leads to a situation where majority voting cannot be applied. To solve the above mentioned exception, we propose to base the prediction of the result on the naive Bayes (NB) estimate, k-nearest neighbour (k-NN) and association rule mining (ARM). The other features used for splitting the parent nodes are also taken into consideration.
文摘Corporations focus on web based education to train their employees ever more than before. Unlike traditional learning environments, web based education applications store large amount of data. This growing availability of data stimulated the emergence of a new field called educational data mining. In this study, the classification method is implemented on a data that is obtained from a company which uses web based education to train their employees. The authors' aim is to find out the most critical factors that influence the users' success. For the classification of the data, two decision tree algorithms, Classification and Regression Tree (CART) and Quick, Unbiased and Efficient Statistical Tree (QUEST) are applied. According to the results, assurance of a certificate at the end of the training is found to be the most critical factor that influences the users' success. Position, number of work years and the education level of the user, are also found as important factors.
文摘Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the management of patients with different characteristics. Methods: 170,246 outpatient data was extracted from the hospital management information system (HIS) during January 2016 to July 2016, 43,448 data was formed after the data cleaning. K-Means clustering algorithm was used to classify patients with chronic infectious diseases, and then C5.0 decision tree algorithm was used to predict the situation of patients with chronic infectious diseases. Results: Male patients accounted for 58.7%, patients living in Shanghai accounted for 85.6%. The average age of patients is 45.88 years old, the high incidence age is 25 to 65 years old. Patients was gathered into three categories: 1) Clusters 1—Important patients (4786 people, 11.72%, R = 2.89, F = 11.72, M = 84,302.95);2) Clustering 2—Major patients (23,103, 53.2%, R = 5.22, F = 3.45, M = 9146.39);3) Cluster 3—Potential patients (15,559 people, 35.8%, R = 19.77, F = 1.55, M = 1739.09). C5.0 decision tree algorithm was used to predict the treatment situation of patients with chronic infectious diseases, the final treatment time (weeks) is an important predictor, the accuracy rate is 99.94% verified by the confusion model. Conclusion: Medical institutions should strengthen the adherence education for patients with chronic infectious diseases, establish the chronic infectious diseases and customer relationship management database, take the initiative to help them improve treatment adherence. Chinese governments at all levels should speed up the construction of hospital information, establish the chronic infectious disease database, strengthen the blocking of mother-to-child transmission, to effectively curb chronic infectious diseases, reduce disease burden and mortality.
文摘Trauma is the most common cause of death to young people and many of these deaths are preventable [1]. The prediction of trauma patients outcome was a difficult problem to investigate till present times. In this study, prediction models are built and their capabilities to accurately predict the mortality are assessed. The analysis includes a comparison of data mining techniques using classification, clustering and association algorithms. Data were collected by Hellenic Trauma and Emergency Surgery Society from 30 Greek hospitals. Dataset contains records of 8544 patients suffering from severe injuries collected from the year 2005 to 2006. Factors include patients' demographic elements and several other variables registered from the time and place of accident until the hospital treatment and final outcome. Using this analysis the obtained results are compared in terms of sensitivity, specificity, positive predictive value and negative predictive value and the ROC curve depicts these methods performance.
文摘The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge discovery is an important issue. Here, statistical and tools from computational intelligence are applied to analyze large data sets from meteorology and climate sciences. Our approach allows a geographical mapping of the statistical property to be easily interpreted by meteorologists. Our data analysis comprises two main steps of knowledge extraction, applied successively in order to reduce the complexity from the original data set. The goal is to identify a much smaller subset of climatic variables that might still be able to describe or even predict the probability of occurrence of an extreme event. The first step applies a class comparison technique: p-value estimation. The second step consists of a decision tree (DT) configured from the data available and the p-value analysis. The DT is used as a predictive model, identifying the most statistically significant climate variables of the precipitation intensity. The methodology is employed to the study the climatic causes of an extreme precipitation events occurred in Alagoas and Pernambuco States (Brazil) at June/2010.
基金This work was supported by the National Key Research and Development Program of China(Grant No.2018YFD0700705)Synergistic Innovation Center of Jiangsu Modern Agricultural Equipment and Technology(Grant No.4091600002)and the Key Research&Development plan of Zhenjiang City-Modern Agriculture(Grant No.NY2019009).
文摘China has the world’s largest planting area of paddy rice,but large quantities of paddy rice fall to the ground and are lost during harvesting with a combine harvester.Reducing grain loss is an effective way to increase production and revenue.In this study,a monitoring system was developed to monitor the grain loss of the paddy rice and this approach was tested on the test bench for verifying the precision.The development of the monitoring system for grain loss included two stages:the first stage was to collect impact signals using a piezoelectric film,extract the four features of Root Mean Square,Peak number,Frequency and Amplitude(fundamental component),and identify the kernel impact signals using the J48(C4.5)Decision Tree algorithm.In the second stage,the precision of the monitoring system was tested for the paddy rice at three different moisture contents(10.4%,19.6%,and 30.4%)and five different grain/impurity ratios(1/0.5,1/1,1/1.5,1/2,and 1/2.5).According to the results,the highest monitoring accuracy was 99.3%(moisture content 30.8%and grain/impurity ratio 1/2.5),the average accuracy of the monitoring tests was 92.6%,and monitoring of grain/impurity ratios between 1/1 and 1/1.5(>95.4%)had higher accuracy than monitoring the other grain/impurity ratios.Monitoring accuracy decreased as impurities increased.The lowest accuracy for grain loss monitoring was obtained when the grain/impurity ratio was 1/2.5,with monitoring accuracies of 88.2%,75.7%and 78.8%at moisture contents of 10.4%,19.6%and 30.4%.
文摘Data mining is a procedure of separating covered up,obscure,however possibly valuable data from gigantic data.Huge Data impactsly affects logical disclosures and worth creation.Data mining(DM)with Big Data has been broadly utilized in the lifecycle of electronic items that range from the structure and generation stages to the administration organize.A far reaching examination of DM with Big Data and a survey of its application in the phases of its lifecycle won't just profit scientists to create solid research.As of late huge data have turned into a trendy expression,which constrained the analysts to extend the current data mining methods to adapt to the advanced idea of data and to grow new scientific procedures.In this paper,we build up an exact assessment technique dependent on the standard of Design of Experiment.We apply this technique to assess data mining instruments and AI calculations towards structure huge data examination for media transmission checking data.Two contextual investigations are directed to give bits of knowledge of relations between the necessities of data examination and the decision of an instrument or calculation with regards to data investigation work processes.
文摘There is growing interest in power quality issues due to wider developments in power delivery engineering.In order to maintain good power quality,it is necessary to detect and monitor power quality problems.The power quality monitoring requires storing large amount of data for analysis.This rapid increase in the size of databases has demanded new technique such as data mining to assist in the analysis and understanding of the data.This paper presents the classification of power quality problems such as voltage sag,swell,interruption and unbalance using data mining algorithms:J48,Random Tree and Random Forest decision trees.These algorithms are implemented on two sets of voltage data using WEKA software.The numeric attributes in first data set include 3-phase RMS voltages at the point of common coupling.In second data set,three more numeric attributes such as minimum,maximum and average voltages,are added along with 3-phase RMS voltages.The performance of the algorithms is evaluated in both the cases to determine the best classification algorithm,and the effect of addition of the three attributes in the second case is studied,which depicts the advantages in terms of classification accuracy and training time of the decision trees.
基金This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(2016R1A6A3A11934917).
文摘Research on the quality of data in a structural calculation document(SCD)is lacking,although the SCD ofa bridge is used as an essential reference during the entire lifecycle of the facility.XML Schema matching enables qualitative improvement of the stored data.This study aimed to enhance the applicability of XML Schema matching,which improves the speed and quality of information stored in bridge SCDs.First,the authors proposed a method of reducing the computing time for the schema matching of bridge SCDs.The computing speed of schema matching was increased by 13 to 1800 times by reducing the checking process of the correlations.Second,the authors developed a heuristic solution for selecting the optimal weight factors used in the matching process to maintain a high accuracy by introducing a decision tree.The decision tree model was built using the content elements stored in the SCD,design companies,bridge types,and weight factors as input variables,and the matching accuracy as the target variable.The inverse-calculation method was applied to extract the weight factors from the decision tree model for high-accuracy schema matching results.