Define and theory of autocorrelation decision tree (ADT) is introduced. In spatial data mining, spatial parallel query are very expensive operations. A new parallel algorithm in terms of autocorrelation decision tre...Define and theory of autocorrelation decision tree (ADT) is introduced. In spatial data mining, spatial parallel query are very expensive operations. A new parallel algorithm in terms of autocorrelation decision tree is presented. And the new method reduces CPU- and I/O-time and improves the query efficiency of spatial data. For dynamic load balancing, there are better control and optimization. Experimental performance comparison shows that the improved algorithm can obtain a optimal accelerator with the same quantities of processors. There are more completely accesses on nodes. And an individual implement of intelligent information retrieval for spatial data mining is presented.展开更多
In general, geospatial data can be divided into two formats, raster and vector formats. A raster consists of a matrix of cells where each cell contains a value representing quantitative information, such as temperatur...In general, geospatial data can be divided into two formats, raster and vector formats. A raster consists of a matrix of cells where each cell contains a value representing quantitative information, such as temperature, vegetation intensity, land use/cover, elevation, etc. A vector data consists of points, lines and polygons representing location or distance or area of landscape features in graphical forms. Many raster data are derived from remote sensing techniques using sophisticated sensors by quantitative approach and many vector data are generated from GIS processes by qualitative approach. Among them, land use/cover data is frequently used in many GIS analyses and spatial modeling processes. However, proper use of quantitative and qualitative geospatial data is important in spatial modeling and decision making. In this article, we discuss common geospatial data formats, their origins and proper use in spatial modelling and decision making processes.展开更多
For spatial based decision making such as choice of best place to construct a new department store, spatial data warehousing system is required more and more previous spatial data warehousing systems; however, provide...For spatial based decision making such as choice of best place to construct a new department store, spatial data warehousing system is required more and more previous spatial data warehousing systems; however, provided decision making of non-spatial data on a map and so those cannot support enough spatial based decision making. The spatial aggregations are proposed for spatial based decision making in spatial data warehouses. The meaning of aggregation operators for applying spatial data was modified and new spatial aggregations were defined. These aggregations can support hierarchical concept of spatial measure. Using these aggregations, the spatial analysis classified by non-spatial data is provided. In case study, how to use these aggregations and how to support spatial based decision making are shown.展开更多
This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteris...This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteristics with respect to the dynamic data environment. On GIS and CAD systems, the R-tree and its successors have been used. In addition, the NN search algorithm is also proposed in an attempt to obtain good performance from the R-tree. On the other hand, the GBD tree is superior to the R-tree with respect to exact match retrieval, because the GBD tree has auxiliary data that uniquely determines the position of the object in the structure. The proposed NN search algorithm depends on the property of the GBD tree described above. The NN search algorithm on the GBD tree was studied and the performance thereof was evaluated through experiments.展开更多
The broad sharing of spatial information is demanded in the infrastructure construction of spatial data in our country. And the spatial data warehouse realizes the effective management and sharing of spatial informati...The broad sharing of spatial information is demanded in the infrastructure construction of spatial data in our country. And the spatial data warehouse realizes the effective management and sharing of spatial information serving as an efficient tool. This article proposes ERP model system that of general decision oriented for constructing spatial data warehouse from the aspect of decision application. In the end of article, the construction process of spatial data warehouse based on ERP model system is discussed.展开更多
Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend...Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend prediction methods are based on years of oil field production experience and expertise,and the application conditions are very demanding.With the rapid development of artificial intelligence technology,big data analysis methods are gradually applied in various sub-fields of the oil and gas reservoir development.Based on the data-driven artificial intelligence algorithmGradient BoostingDecision Tree(GBDT),this paper predicts the initial single-layer production by considering geological data,fluid PVT data and well data.The results show that the GBDT algorithm prediction model has great accuracy,significantly improving efficiency and strong universal applicability.The GBDTmethod trained in this paper can predict production,which is helpful for well site optimization,perforation layer optimization and engineering parameter optimization and has guiding significance for oilfield development.展开更多
Tuberculosis remains an important problem in public health that threatens the world, including the Philippines. Treatment relapse continues to place a severe problem on patients and TB programs worldwide. A significan...Tuberculosis remains an important problem in public health that threatens the world, including the Philippines. Treatment relapse continues to place a severe problem on patients and TB programs worldwide. A significant reason for the development of decline is poor compliance with medical treatments. The objectives of this research are to generate a predictive data mining model to classify the treatment relapse of TB patients and to identify the features influencing the category of treatment relapse. The TB patient dataset is applied and tested in decision tree J48 algorithm using WEKA. The J48 model identified the three (3) significant independent variables (DSSM Result, Age, and Sex) as predictors of category treatment relapse.展开更多
Big data is usually unstructured, and many applications require theanalysis in real-time. Decision tree (DT) algorithm is widely used to analyzebig data. Selecting the optimal depth of DT is time-consuming process as ...Big data is usually unstructured, and many applications require theanalysis in real-time. Decision tree (DT) algorithm is widely used to analyzebig data. Selecting the optimal depth of DT is time-consuming process as itrequires many iterations. In this paper, we have designed a modified versionof a (DT). The tree aims to achieve optimal depth by self-tuning runningparameters and improving the accuracy. The efficiency of the modified (DT)was verified using two datasets (airport and fire datasets). The airport datasethas 500000 instances and the fire dataset has 600000 instances. A comparisonhas been made between the modified (DT) and standard (DT) with resultsshowing that the modified performs better. This comparison was conductedon multi-node on Apache Spark tool using Amazon web services. Resultingin accuracy with an increase of 6.85% for the first dataset and 8.85% for theairport dataset. In conclusion, the modified DT showed better accuracy inhandling different-sized datasets compared to standard DT algorithm.展开更多
Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and...Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and services supplied by the computing platform-grid,and can perform a data mining of distributed classification on grid.展开更多
Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting mo...Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting model of agro-meteorological disaster grade was established by adopting the C4.5 classification algorithm of decision tree,which can forecast the direct economic loss degree to provide rational data mining model and obtain effective analysis results.展开更多
Planning in advance to prepare for and respond to a natural hazard-induced disaster-related emergency is a key action that allows decision makers to mitigate unexpected impacts and potential damage. To further this ai...Planning in advance to prepare for and respond to a natural hazard-induced disaster-related emergency is a key action that allows decision makers to mitigate unexpected impacts and potential damage. To further this aim, a collaborative, modular, and information and communications technology-based Spatial Data Infrastructure(SDI)called SIRENE—Sistema Informativo per la Preparazione e la Risposta alle Emergenze(Information System for Emergency Preparedness and Response) is designed and implemented to access and share, over the Internet, relevant multisource and distributed geospatial data to support decision makers in reducing disaster risks. SIRENE flexibly searches and retrieves strategic information from local and/or remote repositories to cope with different emergency phases. The system collects, queries, and analyzes geographic information provided voluntarily by observers directly in the field(volunteered geographic information(VGI) reports) to identify potentially critical environmental conditions. SIRENE can visualize and cross-validate institutional and research-based data against VGI reports,as well as provide disaster managers with a decision support system able to suggest the mode and timing of intervention, before and in the aftermath of different types of emergencies, on the basis of the available information and in agreement with the laws in force at the national andregional levels. Testing installations of SIRENE have been deployed in 18 hilly or mountain municipalities(12 located in the Italian Central Alps of northern Italy, and six in the Umbria region of central Italy), which have been affected by natural hazard-induced disasters over the past years(landslides, debris flows, floods, and wildfire) and experienced significant social and economic losses.展开更多
Based on a case study of Longyou County, Zhejiang Province, the decision tree, a data mining method, was used to analyze the relationships between soil organic matter (SOM) and other environmental and satellite sensin...Based on a case study of Longyou County, Zhejiang Province, the decision tree, a data mining method, was used to analyze the relationships between soil organic matter (SOM) and other environmental and satellite sensing spatial data. The decision tree associated SOM content with some extensive easily observable landscape attributes, such as landform, geology, land use, and remote sensing images, thus transforming the SOM-related information into a clear, quantitative, landscape factor-associated regular syst…展开更多
This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the...This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the target class outcome in the leaf node's records that leads to a situation where majority voting cannot be applied. To solve the above mentioned exception, we propose to base the prediction of the result on the naive Bayes (NB) estimate, k-nearest neighbour (k-NN) and association rule mining (ARM). The other features used for splitting the parent nodes are also taken into consideration.展开更多
Under the modern education system of China, the annual scholarship evaluation is a vital thing for many of the collegestudents. This paper adopts the classification algorithm of decision tree C4.5 based on the betteri...Under the modern education system of China, the annual scholarship evaluation is a vital thing for many of the collegestudents. This paper adopts the classification algorithm of decision tree C4.5 based on the bettering of ID3 algorithm and constructa data set of the scholarship evaluation system through the analysis of the related attributes in scholarship evaluation information.And also having found some factors that plays a significant role in the growing up of the college students through analysis and re-search of moral education, intellectural education and culture&PE.展开更多
Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the mana...Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the management of patients with different characteristics. Methods: 170,246 outpatient data was extracted from the hospital management information system (HIS) during January 2016 to July 2016, 43,448 data was formed after the data cleaning. K-Means clustering algorithm was used to classify patients with chronic infectious diseases, and then C5.0 decision tree algorithm was used to predict the situation of patients with chronic infectious diseases. Results: Male patients accounted for 58.7%, patients living in Shanghai accounted for 85.6%. The average age of patients is 45.88 years old, the high incidence age is 25 to 65 years old. Patients was gathered into three categories: 1) Clusters 1—Important patients (4786 people, 11.72%, R = 2.89, F = 11.72, M = 84,302.95);2) Clustering 2—Major patients (23,103, 53.2%, R = 5.22, F = 3.45, M = 9146.39);3) Cluster 3—Potential patients (15,559 people, 35.8%, R = 19.77, F = 1.55, M = 1739.09). C5.0 decision tree algorithm was used to predict the treatment situation of patients with chronic infectious diseases, the final treatment time (weeks) is an important predictor, the accuracy rate is 99.94% verified by the confusion model. Conclusion: Medical institutions should strengthen the adherence education for patients with chronic infectious diseases, establish the chronic infectious diseases and customer relationship management database, take the initiative to help them improve treatment adherence. Chinese governments at all levels should speed up the construction of hospital information, establish the chronic infectious disease database, strengthen the blocking of mother-to-child transmission, to effectively curb chronic infectious diseases, reduce disease burden and mortality.展开更多
The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge ...The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge discovery is an important issue. Here, statistical and tools from computational intelligence are applied to analyze large data sets from meteorology and climate sciences. Our approach allows a geographical mapping of the statistical property to be easily interpreted by meteorologists. Our data analysis comprises two main steps of knowledge extraction, applied successively in order to reduce the complexity from the original data set. The goal is to identify a much smaller subset of climatic variables that might still be able to describe or even predict the probability of occurrence of an extreme event. The first step applies a class comparison technique: p-value estimation. The second step consists of a decision tree (DT) configured from the data available and the p-value analysis. The DT is used as a predictive model, identifying the most statistically significant climate variables of the precipitation intensity. The methodology is employed to the study the climatic causes of an extreme precipitation events occurred in Alagoas and Pernambuco States (Brazil) at June/2010.展开更多
Corporations focus on web based education to train their employees ever more than before. Unlike traditional learning environments, web based education applications store large amount of data. This growing availabilit...Corporations focus on web based education to train their employees ever more than before. Unlike traditional learning environments, web based education applications store large amount of data. This growing availability of data stimulated the emergence of a new field called educational data mining. In this study, the classification method is implemented on a data that is obtained from a company which uses web based education to train their employees. The authors' aim is to find out the most critical factors that influence the users' success. For the classification of the data, two decision tree algorithms, Classification and Regression Tree (CART) and Quick, Unbiased and Efficient Statistical Tree (QUEST) are applied. According to the results, assurance of a certificate at the end of the training is found to be the most critical factor that influences the users' success. Position, number of work years and the education level of the user, are also found as important factors.展开更多
Data mining is a procedure of separating covered up,obscure,however possibly valuable data from gigantic data.Huge Data impactsly affects logical disclosures and worth creation.Data mining(DM)with Big Data has been br...Data mining is a procedure of separating covered up,obscure,however possibly valuable data from gigantic data.Huge Data impactsly affects logical disclosures and worth creation.Data mining(DM)with Big Data has been broadly utilized in the lifecycle of electronic items that range from the structure and generation stages to the administration organize.A far reaching examination of DM with Big Data and a survey of its application in the phases of its lifecycle won't just profit scientists to create solid research.As of late huge data have turned into a trendy expression,which constrained the analysts to extend the current data mining methods to adapt to the advanced idea of data and to grow new scientific procedures.In this paper,we build up an exact assessment technique dependent on the standard of Design of Experiment.We apply this technique to assess data mining instruments and AI calculations towards structure huge data examination for media transmission checking data.Two contextual investigations are directed to give bits of knowledge of relations between the necessities of data examination and the decision of an instrument or calculation with regards to data investigation work processes.展开更多
文摘Define and theory of autocorrelation decision tree (ADT) is introduced. In spatial data mining, spatial parallel query are very expensive operations. A new parallel algorithm in terms of autocorrelation decision tree is presented. And the new method reduces CPU- and I/O-time and improves the query efficiency of spatial data. For dynamic load balancing, there are better control and optimization. Experimental performance comparison shows that the improved algorithm can obtain a optimal accelerator with the same quantities of processors. There are more completely accesses on nodes. And an individual implement of intelligent information retrieval for spatial data mining is presented.
文摘In general, geospatial data can be divided into two formats, raster and vector formats. A raster consists of a matrix of cells where each cell contains a value representing quantitative information, such as temperature, vegetation intensity, land use/cover, elevation, etc. A vector data consists of points, lines and polygons representing location or distance or area of landscape features in graphical forms. Many raster data are derived from remote sensing techniques using sophisticated sensors by quantitative approach and many vector data are generated from GIS processes by qualitative approach. Among them, land use/cover data is frequently used in many GIS analyses and spatial modeling processes. However, proper use of quantitative and qualitative geospatial data is important in spatial modeling and decision making. In this article, we discuss common geospatial data formats, their origins and proper use in spatial modelling and decision making processes.
基金This research was supported by the MIC ( Ministry of Information and Communication) , Korea , under the ITRC(Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology As-sessment)
文摘For spatial based decision making such as choice of best place to construct a new department store, spatial data warehousing system is required more and more previous spatial data warehousing systems; however, provided decision making of non-spatial data on a map and so those cannot support enough spatial based decision making. The spatial aggregations are proposed for spatial based decision making in spatial data warehouses. The meaning of aggregation operators for applying spatial data was modified and new spatial aggregations were defined. These aggregations can support hierarchical concept of spatial measure. Using these aggregations, the spatial analysis classified by non-spatial data is provided. In case study, how to use these aggregations and how to support spatial based decision making are shown.
文摘This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteristics with respect to the dynamic data environment. On GIS and CAD systems, the R-tree and its successors have been used. In addition, the NN search algorithm is also proposed in an attempt to obtain good performance from the R-tree. On the other hand, the GBD tree is superior to the R-tree with respect to exact match retrieval, because the GBD tree has auxiliary data that uniquely determines the position of the object in the structure. The proposed NN search algorithm depends on the property of the GBD tree described above. The NN search algorithm on the GBD tree was studied and the performance thereof was evaluated through experiments.
基金This work is supported by Technology Project to Tackle Key Problems:2 0 0 2 BA10 5 A- 0 1- 0 2
文摘The broad sharing of spatial information is demanded in the infrastructure construction of spatial data in our country. And the spatial data warehouse realizes the effective management and sharing of spatial information serving as an efficient tool. This article proposes ERP model system that of general decision oriented for constructing spatial data warehouse from the aspect of decision application. In the end of article, the construction process of spatial data warehouse based on ERP model system is discussed.
文摘Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend prediction methods are based on years of oil field production experience and expertise,and the application conditions are very demanding.With the rapid development of artificial intelligence technology,big data analysis methods are gradually applied in various sub-fields of the oil and gas reservoir development.Based on the data-driven artificial intelligence algorithmGradient BoostingDecision Tree(GBDT),this paper predicts the initial single-layer production by considering geological data,fluid PVT data and well data.The results show that the GBDT algorithm prediction model has great accuracy,significantly improving efficiency and strong universal applicability.The GBDTmethod trained in this paper can predict production,which is helpful for well site optimization,perforation layer optimization and engineering parameter optimization and has guiding significance for oilfield development.
文摘Tuberculosis remains an important problem in public health that threatens the world, including the Philippines. Treatment relapse continues to place a severe problem on patients and TB programs worldwide. A significant reason for the development of decline is poor compliance with medical treatments. The objectives of this research are to generate a predictive data mining model to classify the treatment relapse of TB patients and to identify the features influencing the category of treatment relapse. The TB patient dataset is applied and tested in decision tree J48 algorithm using WEKA. The J48 model identified the three (3) significant independent variables (DSSM Result, Age, and Sex) as predictors of category treatment relapse.
文摘Big data is usually unstructured, and many applications require theanalysis in real-time. Decision tree (DT) algorithm is widely used to analyzebig data. Selecting the optimal depth of DT is time-consuming process as itrequires many iterations. In this paper, we have designed a modified versionof a (DT). The tree aims to achieve optimal depth by self-tuning runningparameters and improving the accuracy. The efficiency of the modified (DT)was verified using two datasets (airport and fire datasets). The airport datasethas 500000 instances and the fire dataset has 600000 instances. A comparisonhas been made between the modified (DT) and standard (DT) with resultsshowing that the modified performs better. This comparison was conductedon multi-node on Apache Spark tool using Amazon web services. Resultingin accuracy with an increase of 6.85% for the first dataset and 8.85% for theairport dataset. In conclusion, the modified DT showed better accuracy inhandling different-sized datasets compared to standard DT algorithm.
文摘Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and services supplied by the computing platform-grid,and can perform a data mining of distributed classification on grid.
基金Supported by Science and Technology Plan of Mudanjiang City (G200920064)Teaching Reform Construction of Mudanjiang Normal University (10-xj11080)
文摘Based on the discuss of the basic concept of data mining technology and the decision tree method,combining with the data samples of wind and hailstorm disasters in some counties of Mudanjiang region,the forecasting model of agro-meteorological disaster grade was established by adopting the C4.5 classification algorithm of decision tree,which can forecast the direct economic loss degree to provide rational data mining model and obtain effective analysis results.
基金SIMULATOR-Sistema Integrato ModULAre per la gesTione e prevenzi One dei Rischi-Integrated Modular System for Risk Prevention and Management, financed by the Lombardy regional government, Italy
文摘Planning in advance to prepare for and respond to a natural hazard-induced disaster-related emergency is a key action that allows decision makers to mitigate unexpected impacts and potential damage. To further this aim, a collaborative, modular, and information and communications technology-based Spatial Data Infrastructure(SDI)called SIRENE—Sistema Informativo per la Preparazione e la Risposta alle Emergenze(Information System for Emergency Preparedness and Response) is designed and implemented to access and share, over the Internet, relevant multisource and distributed geospatial data to support decision makers in reducing disaster risks. SIRENE flexibly searches and retrieves strategic information from local and/or remote repositories to cope with different emergency phases. The system collects, queries, and analyzes geographic information provided voluntarily by observers directly in the field(volunteered geographic information(VGI) reports) to identify potentially critical environmental conditions. SIRENE can visualize and cross-validate institutional and research-based data against VGI reports,as well as provide disaster managers with a decision support system able to suggest the mode and timing of intervention, before and in the aftermath of different types of emergencies, on the basis of the available information and in agreement with the laws in force at the national andregional levels. Testing installations of SIRENE have been deployed in 18 hilly or mountain municipalities(12 located in the Italian Central Alps of northern Italy, and six in the Umbria region of central Italy), which have been affected by natural hazard-induced disasters over the past years(landslides, debris flows, floods, and wildfire) and experienced significant social and economic losses.
文摘Based on a case study of Longyou County, Zhejiang Province, the decision tree, a data mining method, was used to analyze the relationships between soil organic matter (SOM) and other environmental and satellite sensing spatial data. The decision tree associated SOM content with some extensive easily observable landscape attributes, such as landform, geology, land use, and remote sensing images, thus transforming the SOM-related information into a clear, quantitative, landscape factor-associated regular syst…
文摘This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the target class outcome in the leaf node's records that leads to a situation where majority voting cannot be applied. To solve the above mentioned exception, we propose to base the prediction of the result on the naive Bayes (NB) estimate, k-nearest neighbour (k-NN) and association rule mining (ARM). The other features used for splitting the parent nodes are also taken into consideration.
文摘Under the modern education system of China, the annual scholarship evaluation is a vital thing for many of the collegestudents. This paper adopts the classification algorithm of decision tree C4.5 based on the bettering of ID3 algorithm and constructa data set of the scholarship evaluation system through the analysis of the related attributes in scholarship evaluation information.And also having found some factors that plays a significant role in the growing up of the college students through analysis and re-search of moral education, intellectural education and culture&PE.
文摘Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the management of patients with different characteristics. Methods: 170,246 outpatient data was extracted from the hospital management information system (HIS) during January 2016 to July 2016, 43,448 data was formed after the data cleaning. K-Means clustering algorithm was used to classify patients with chronic infectious diseases, and then C5.0 decision tree algorithm was used to predict the situation of patients with chronic infectious diseases. Results: Male patients accounted for 58.7%, patients living in Shanghai accounted for 85.6%. The average age of patients is 45.88 years old, the high incidence age is 25 to 65 years old. Patients was gathered into three categories: 1) Clusters 1—Important patients (4786 people, 11.72%, R = 2.89, F = 11.72, M = 84,302.95);2) Clustering 2—Major patients (23,103, 53.2%, R = 5.22, F = 3.45, M = 9146.39);3) Cluster 3—Potential patients (15,559 people, 35.8%, R = 19.77, F = 1.55, M = 1739.09). C5.0 decision tree algorithm was used to predict the treatment situation of patients with chronic infectious diseases, the final treatment time (weeks) is an important predictor, the accuracy rate is 99.94% verified by the confusion model. Conclusion: Medical institutions should strengthen the adherence education for patients with chronic infectious diseases, establish the chronic infectious diseases and customer relationship management database, take the initiative to help them improve treatment adherence. Chinese governments at all levels should speed up the construction of hospital information, establish the chronic infectious disease database, strengthen the blocking of mother-to-child transmission, to effectively curb chronic infectious diseases, reduce disease burden and mortality.
文摘The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge discovery is an important issue. Here, statistical and tools from computational intelligence are applied to analyze large data sets from meteorology and climate sciences. Our approach allows a geographical mapping of the statistical property to be easily interpreted by meteorologists. Our data analysis comprises two main steps of knowledge extraction, applied successively in order to reduce the complexity from the original data set. The goal is to identify a much smaller subset of climatic variables that might still be able to describe or even predict the probability of occurrence of an extreme event. The first step applies a class comparison technique: p-value estimation. The second step consists of a decision tree (DT) configured from the data available and the p-value analysis. The DT is used as a predictive model, identifying the most statistically significant climate variables of the precipitation intensity. The methodology is employed to the study the climatic causes of an extreme precipitation events occurred in Alagoas and Pernambuco States (Brazil) at June/2010.
文摘Corporations focus on web based education to train their employees ever more than before. Unlike traditional learning environments, web based education applications store large amount of data. This growing availability of data stimulated the emergence of a new field called educational data mining. In this study, the classification method is implemented on a data that is obtained from a company which uses web based education to train their employees. The authors' aim is to find out the most critical factors that influence the users' success. For the classification of the data, two decision tree algorithms, Classification and Regression Tree (CART) and Quick, Unbiased and Efficient Statistical Tree (QUEST) are applied. According to the results, assurance of a certificate at the end of the training is found to be the most critical factor that influences the users' success. Position, number of work years and the education level of the user, are also found as important factors.
文摘Data mining is a procedure of separating covered up,obscure,however possibly valuable data from gigantic data.Huge Data impactsly affects logical disclosures and worth creation.Data mining(DM)with Big Data has been broadly utilized in the lifecycle of electronic items that range from the structure and generation stages to the administration organize.A far reaching examination of DM with Big Data and a survey of its application in the phases of its lifecycle won't just profit scientists to create solid research.As of late huge data have turned into a trendy expression,which constrained the analysts to extend the current data mining methods to adapt to the advanced idea of data and to grow new scientific procedures.In this paper,we build up an exact assessment technique dependent on the standard of Design of Experiment.We apply this technique to assess data mining instruments and AI calculations towards structure huge data examination for media transmission checking data.Two contextual investigations are directed to give bits of knowledge of relations between the necessities of data examination and the decision of an instrument or calculation with regards to data investigation work processes.