This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed...This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed herein utilizes the fusion of diversified feature formats,specifically,metadata,textual,and pattern features.The goal is to enhance the system’s ability to discern and generalize transformation rules fromsource to destination formats in varied contexts.Firstly,the article delves into the methodology for extracting these distinct features from raw data and the pre-processing steps undertaken to prepare the data for the model.Subsequent sections expound on the mechanism of feature optimization using Recursive Feature Elimination(RFE)with linear regression,aiming to retain the most contributive features and eliminate redundant or less significant ones.The core of the research revolves around the deployment of the XGBoostmodel for training,using the prepared and optimized feature sets.The article presents a detailed overview of the mathematical model and algorithmic steps behind this procedure.Finally,the process of rule discovery(prediction phase)by the trained XGBoost model is explained,underscoring its role in real-time,automated data transformations.By employingmachine learning and particularly,the XGBoost model in the context of Business Rule Engine(BRE)data transformation,the article underscores a paradigm shift towardsmore scalable,efficient,and less human-dependent data transformation systems.This research opens doors for further exploration into automated rule discovery systems and their applications in various sectors.展开更多
In a growing number of information processing applications,data takes the form of continuous data streams rather than traditional stored databases.Monitoring systems that seek to provide monitoring services in cloud e...In a growing number of information processing applications,data takes the form of continuous data streams rather than traditional stored databases.Monitoring systems that seek to provide monitoring services in cloud environment must be prepared to deal gracefully with huge data collections without compromising system performance.In this paper,we show that by using a concept of urgent data,our system can shorten the response time for most 'urgent' queries while guarantee lower bandwidth consumption.We argue that monitoring data can be treated differently.Some data capture critical system events;the arrival of these data will significantly influence the monitoring reaction speed which is called urgent data.High speed urgent data collections can help system to react in real time when facing fatal errors.A cloud environment in production,MagicCube,is used as a test bed.Extensive experiments over both real world and synthetic traces show that when using urgent data,monitoring system can lower the response latency compared with existing monitoring approaches.展开更多
In order to solve the problems of low efficiency and heavy workload of tumor coding in hospitals,we proposed a Drools-based intelligent tumors coding method.At present,most tumor hospitals use manual coding,the traine...In order to solve the problems of low efficiency and heavy workload of tumor coding in hospitals,we proposed a Drools-based intelligent tumors coding method.At present,most tumor hospitals use manual coding,the trained coders follow the main diagnosis selection rules to select the main diagnosis from the discharge diagnosis of the tumor patients,and then code all the discharge diagnoses according to the coding rules.Owing to different coders have different familiarity with the main diagnosis selection rules and ICD-10 disease coding,it will reduce the efficiency of the artificial coding results and affect the quality of the whole medical record.We first analyze the ICD library information,doctor's diagnostic information,radiotherapy information or chemotherapy information,surgery information,hospitalization information and other related information,and then generated Drools rule files based on the main diagnostic selection principles and coding principles,we also combined the text similarity analysis algorithm to construct an intelligent diagnostic information coding method.Practice shows that the coding method can be used to make the work efficiently and at the same time obtain the coding results which meet the standard and have high accuracy,so that the coders can be free from the repeated work and pay more attention to coding quality control and the coding logic adjustment.展开更多
A job shop scheduling problem with a combination processing in complex production environment is proposed. Based on the defining of "non-elastic combination processing relativity" and "virtual process", the proble...A job shop scheduling problem with a combination processing in complex production environment is proposed. Based on the defining of "non-elastic combination processing relativity" and "virtual process", the problem can be simplified and transformed to a traditional one. On the basis of the dispatching rules select engine and considered factors of complex production environment, a heuristic method is designed. The algorithm has been applied to a mould enterprise in Shenzhen for half a year. The practice showed that by using the method suggested the number of delayed orders was decreased about 20% and the productivity was increased by 10 to 20%.展开更多
文摘This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed herein utilizes the fusion of diversified feature formats,specifically,metadata,textual,and pattern features.The goal is to enhance the system’s ability to discern and generalize transformation rules fromsource to destination formats in varied contexts.Firstly,the article delves into the methodology for extracting these distinct features from raw data and the pre-processing steps undertaken to prepare the data for the model.Subsequent sections expound on the mechanism of feature optimization using Recursive Feature Elimination(RFE)with linear regression,aiming to retain the most contributive features and eliminate redundant or less significant ones.The core of the research revolves around the deployment of the XGBoostmodel for training,using the prepared and optimized feature sets.The article presents a detailed overview of the mathematical model and algorithmic steps behind this procedure.Finally,the process of rule discovery(prediction phase)by the trained XGBoost model is explained,underscoring its role in real-time,automated data transformations.By employingmachine learning and particularly,the XGBoost model in the context of Business Rule Engine(BRE)data transformation,the article underscores a paradigm shift towardsmore scalable,efficient,and less human-dependent data transformation systems.This research opens doors for further exploration into automated rule discovery systems and their applications in various sectors.
基金supported by the National Key Technology R&D Program(Grant NO. 2012BAH17F01)NSFC-NSF International Cooperation Project(Grant NO. 61361126011)
文摘In a growing number of information processing applications,data takes the form of continuous data streams rather than traditional stored databases.Monitoring systems that seek to provide monitoring services in cloud environment must be prepared to deal gracefully with huge data collections without compromising system performance.In this paper,we show that by using a concept of urgent data,our system can shorten the response time for most 'urgent' queries while guarantee lower bandwidth consumption.We argue that monitoring data can be treated differently.Some data capture critical system events;the arrival of these data will significantly influence the monitoring reaction speed which is called urgent data.High speed urgent data collections can help system to react in real time when facing fatal errors.A cloud environment in production,MagicCube,is used as a test bed.Extensive experiments over both real world and synthetic traces show that when using urgent data,monitoring system can lower the response latency compared with existing monitoring approaches.
文摘In order to solve the problems of low efficiency and heavy workload of tumor coding in hospitals,we proposed a Drools-based intelligent tumors coding method.At present,most tumor hospitals use manual coding,the trained coders follow the main diagnosis selection rules to select the main diagnosis from the discharge diagnosis of the tumor patients,and then code all the discharge diagnoses according to the coding rules.Owing to different coders have different familiarity with the main diagnosis selection rules and ICD-10 disease coding,it will reduce the efficiency of the artificial coding results and affect the quality of the whole medical record.We first analyze the ICD library information,doctor's diagnostic information,radiotherapy information or chemotherapy information,surgery information,hospitalization information and other related information,and then generated Drools rule files based on the main diagnostic selection principles and coding principles,we also combined the text similarity analysis algorithm to construct an intelligent diagnostic information coding method.Practice shows that the coding method can be used to make the work efficiently and at the same time obtain the coding results which meet the standard and have high accuracy,so that the coders can be free from the repeated work and pay more attention to coding quality control and the coding logic adjustment.
基金Supported by Research Fund for the Doctoral Program of Higher Education of China(20060487072)National Key Technology R&D Program(2006BAF01A43)
文摘A job shop scheduling problem with a combination processing in complex production environment is proposed. Based on the defining of "non-elastic combination processing relativity" and "virtual process", the problem can be simplified and transformed to a traditional one. On the basis of the dispatching rules select engine and considered factors of complex production environment, a heuristic method is designed. The algorithm has been applied to a mould enterprise in Shenzhen for half a year. The practice showed that by using the method suggested the number of delayed orders was decreased about 20% and the productivity was increased by 10 to 20%.