This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed...This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed herein utilizes the fusion of diversified feature formats,specifically,metadata,textual,and pattern features.The goal is to enhance the system’s ability to discern and generalize transformation rules fromsource to destination formats in varied contexts.Firstly,the article delves into the methodology for extracting these distinct features from raw data and the pre-processing steps undertaken to prepare the data for the model.Subsequent sections expound on the mechanism of feature optimization using Recursive Feature Elimination(RFE)with linear regression,aiming to retain the most contributive features and eliminate redundant or less significant ones.The core of the research revolves around the deployment of the XGBoostmodel for training,using the prepared and optimized feature sets.The article presents a detailed overview of the mathematical model and algorithmic steps behind this procedure.Finally,the process of rule discovery(prediction phase)by the trained XGBoost model is explained,underscoring its role in real-time,automated data transformations.By employingmachine learning and particularly,the XGBoost model in the context of Business Rule Engine(BRE)data transformation,the article underscores a paradigm shift towardsmore scalable,efficient,and less human-dependent data transformation systems.This research opens doors for further exploration into automated rule discovery systems and their applications in various sectors.展开更多
In enterprise operations,maintaining manual rules for enterprise processes can be expensive,time-consuming,and dependent on specialized domain knowledge in that enterprise domain.Recently,rule-generation has been auto...In enterprise operations,maintaining manual rules for enterprise processes can be expensive,time-consuming,and dependent on specialized domain knowledge in that enterprise domain.Recently,rule-generation has been automated in enterprises,particularly through Machine Learning,to streamline routine tasks.Typically,these machine models are black boxes where the reasons for the decisions are not always transparent,and the end users need to verify the model proposals as a part of the user acceptance testing to trust it.In such scenarios,rules excel over Machine Learning models as the end-users can verify the rules and have more trust.In many scenarios,the truth label changes frequently thus,it becomes difficult for the Machine Learning model to learn till a considerable amount of data has been accumulated,but with rules,the truth can be adapted.This paper presents a novel framework for generating human-understandable rules using the Classification and Regression Tree(CART)decision tree method,which ensures both optimization and user trust in automated decision-making processes.The framework generates comprehensible rules in the form of if condition and then predicts class even in domains where noise is present.The proposed system transforms enterprise operations by automating the production of human-readable rules from structured data,resulting in increased efficiency and transparency.Removing the need for human rule construction saves time and money while guaranteeing that users can readily check and trust the automatic judgments of the system.The remarkable performance metrics of the framework,which achieve 99.85%accuracy and 96.30%precision,further support its efficiency in translating complex data into comprehensible rules,eventually empowering users and enhancing organizational decision-making processes.展开更多
In a growing number of information processing applications,data takes the form of continuous data streams rather than traditional stored databases.Monitoring systems that seek to provide monitoring services in cloud e...In a growing number of information processing applications,data takes the form of continuous data streams rather than traditional stored databases.Monitoring systems that seek to provide monitoring services in cloud environment must be prepared to deal gracefully with huge data collections without compromising system performance.In this paper,we show that by using a concept of urgent data,our system can shorten the response time for most 'urgent' queries while guarantee lower bandwidth consumption.We argue that monitoring data can be treated differently.Some data capture critical system events;the arrival of these data will significantly influence the monitoring reaction speed which is called urgent data.High speed urgent data collections can help system to react in real time when facing fatal errors.A cloud environment in production,MagicCube,is used as a test bed.Extensive experiments over both real world and synthetic traces show that when using urgent data,monitoring system can lower the response latency compared with existing monitoring approaches.展开更多
In order to solve the problems of low efficiency and heavy workload of tumor coding in hospitals,we proposed a Drools-based intelligent tumors coding method.At present,most tumor hospitals use manual coding,the traine...In order to solve the problems of low efficiency and heavy workload of tumor coding in hospitals,we proposed a Drools-based intelligent tumors coding method.At present,most tumor hospitals use manual coding,the trained coders follow the main diagnosis selection rules to select the main diagnosis from the discharge diagnosis of the tumor patients,and then code all the discharge diagnoses according to the coding rules.Owing to different coders have different familiarity with the main diagnosis selection rules and ICD-10 disease coding,it will reduce the efficiency of the artificial coding results and affect the quality of the whole medical record.We first analyze the ICD library information,doctor's diagnostic information,radiotherapy information or chemotherapy information,surgery information,hospitalization information and other related information,and then generated Drools rule files based on the main diagnostic selection principles and coding principles,we also combined the text similarity analysis algorithm to construct an intelligent diagnostic information coding method.Practice shows that the coding method can be used to make the work efficiently and at the same time obtain the coding results which meet the standard and have high accuracy,so that the coders can be free from the repeated work and pay more attention to coding quality control and the coding logic adjustment.展开更多
A job shop scheduling problem with a combination processing in complex production environment is proposed. Based on the defining of "non-elastic combination processing relativity" and "virtual process", the proble...A job shop scheduling problem with a combination processing in complex production environment is proposed. Based on the defining of "non-elastic combination processing relativity" and "virtual process", the problem can be simplified and transformed to a traditional one. On the basis of the dispatching rules select engine and considered factors of complex production environment, a heuristic method is designed. The algorithm has been applied to a mould enterprise in Shenzhen for half a year. The practice showed that by using the method suggested the number of delayed orders was decreased about 20% and the productivity was increased by 10 to 20%.展开更多
文摘This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed herein utilizes the fusion of diversified feature formats,specifically,metadata,textual,and pattern features.The goal is to enhance the system’s ability to discern and generalize transformation rules fromsource to destination formats in varied contexts.Firstly,the article delves into the methodology for extracting these distinct features from raw data and the pre-processing steps undertaken to prepare the data for the model.Subsequent sections expound on the mechanism of feature optimization using Recursive Feature Elimination(RFE)with linear regression,aiming to retain the most contributive features and eliminate redundant or less significant ones.The core of the research revolves around the deployment of the XGBoostmodel for training,using the prepared and optimized feature sets.The article presents a detailed overview of the mathematical model and algorithmic steps behind this procedure.Finally,the process of rule discovery(prediction phase)by the trained XGBoost model is explained,underscoring its role in real-time,automated data transformations.By employingmachine learning and particularly,the XGBoost model in the context of Business Rule Engine(BRE)data transformation,the article underscores a paradigm shift towardsmore scalable,efficient,and less human-dependent data transformation systems.This research opens doors for further exploration into automated rule discovery systems and their applications in various sectors.
文摘In enterprise operations,maintaining manual rules for enterprise processes can be expensive,time-consuming,and dependent on specialized domain knowledge in that enterprise domain.Recently,rule-generation has been automated in enterprises,particularly through Machine Learning,to streamline routine tasks.Typically,these machine models are black boxes where the reasons for the decisions are not always transparent,and the end users need to verify the model proposals as a part of the user acceptance testing to trust it.In such scenarios,rules excel over Machine Learning models as the end-users can verify the rules and have more trust.In many scenarios,the truth label changes frequently thus,it becomes difficult for the Machine Learning model to learn till a considerable amount of data has been accumulated,but with rules,the truth can be adapted.This paper presents a novel framework for generating human-understandable rules using the Classification and Regression Tree(CART)decision tree method,which ensures both optimization and user trust in automated decision-making processes.The framework generates comprehensible rules in the form of if condition and then predicts class even in domains where noise is present.The proposed system transforms enterprise operations by automating the production of human-readable rules from structured data,resulting in increased efficiency and transparency.Removing the need for human rule construction saves time and money while guaranteeing that users can readily check and trust the automatic judgments of the system.The remarkable performance metrics of the framework,which achieve 99.85%accuracy and 96.30%precision,further support its efficiency in translating complex data into comprehensible rules,eventually empowering users and enhancing organizational decision-making processes.
基金supported by the National Key Technology R&D Program(Grant NO. 2012BAH17F01)NSFC-NSF International Cooperation Project(Grant NO. 61361126011)
文摘In a growing number of information processing applications,data takes the form of continuous data streams rather than traditional stored databases.Monitoring systems that seek to provide monitoring services in cloud environment must be prepared to deal gracefully with huge data collections without compromising system performance.In this paper,we show that by using a concept of urgent data,our system can shorten the response time for most 'urgent' queries while guarantee lower bandwidth consumption.We argue that monitoring data can be treated differently.Some data capture critical system events;the arrival of these data will significantly influence the monitoring reaction speed which is called urgent data.High speed urgent data collections can help system to react in real time when facing fatal errors.A cloud environment in production,MagicCube,is used as a test bed.Extensive experiments over both real world and synthetic traces show that when using urgent data,monitoring system can lower the response latency compared with existing monitoring approaches.
文摘In order to solve the problems of low efficiency and heavy workload of tumor coding in hospitals,we proposed a Drools-based intelligent tumors coding method.At present,most tumor hospitals use manual coding,the trained coders follow the main diagnosis selection rules to select the main diagnosis from the discharge diagnosis of the tumor patients,and then code all the discharge diagnoses according to the coding rules.Owing to different coders have different familiarity with the main diagnosis selection rules and ICD-10 disease coding,it will reduce the efficiency of the artificial coding results and affect the quality of the whole medical record.We first analyze the ICD library information,doctor's diagnostic information,radiotherapy information or chemotherapy information,surgery information,hospitalization information and other related information,and then generated Drools rule files based on the main diagnostic selection principles and coding principles,we also combined the text similarity analysis algorithm to construct an intelligent diagnostic information coding method.Practice shows that the coding method can be used to make the work efficiently and at the same time obtain the coding results which meet the standard and have high accuracy,so that the coders can be free from the repeated work and pay more attention to coding quality control and the coding logic adjustment.
基金Supported by Research Fund for the Doctoral Program of Higher Education of China(20060487072)National Key Technology R&D Program(2006BAF01A43)
文摘A job shop scheduling problem with a combination processing in complex production environment is proposed. Based on the defining of "non-elastic combination processing relativity" and "virtual process", the problem can be simplified and transformed to a traditional one. On the basis of the dispatching rules select engine and considered factors of complex production environment, a heuristic method is designed. The algorithm has been applied to a mould enterprise in Shenzhen for half a year. The practice showed that by using the method suggested the number of delayed orders was decreased about 20% and the productivity was increased by 10 to 20%.