An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic rela...An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic relativity of ontology concepts is used to describe complicated relationships of domains in the method.Candidate item sets with less semantic relativity are filtered to reduce the number of candidate item sets in association rules mining.An ontology hierarchy relationship is regarded as a directed acyclic graph rather than a hierarchy tree in the semantic relativity computation.Not only direct hierarchy relationships,but also non-direct hierarchy relationships and other typical semantic relationships are taken into account.Experimental results show that the proposed method can reduce the number of candidate item sets effectively and improve the efficiency of association rules mining.展开更多
Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results conta...Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.展开更多
Professional drivers are more frequently exposed to longer driving distance and travel time,leading to a higher possibility of safety risk for distraction and fatigue.The widespread and common use of commercial driver...Professional drivers are more frequently exposed to longer driving distance and travel time,leading to a higher possibility of safety risk for distraction and fatigue.The widespread and common use of commercial driver monitoring systems(DMS)provides a potential for data collection.It increases the amount of data characterizing driver behavior that can be used for further safety research.This study utilized DMS warning-based data and applied an association rule mining approach to explore risk factors contributing to hazardous materials(HAZMAT)truck driver inattention.A total of 499 HAZMAT truck driver inattentive warning events were used to find rules that will predict the occurrence of driver’s fatigue and distraction.First,Fisher’s exact tests were performed to examine the association between the frequency of driver inattentive behavior warnings and risk factors.Second,support,confidence,and lift values were used as measurements to quantify the relative strength of the association rules generated by the Apriori algorithm.Results show that speed between 40and 49 km/h,relatively longer travel time(3-6 h),freeway,tangent section,off-peak hour and clear weather condition are found to be highly associated with fatigue driving,while nighttime during 18:00 to 23:59,speed between 70 and 80 km/h,travel time between 1 and 3 h,freeways,acceleration less than 0.5 m/s^(2),visibility greater than 1000 m,and tangent roadway section are found to be highly associated with distracted driving.By focusing on the specific feature groups,these association rules would help in the development of mitigating distraction and fatigue driving countermeasures and enforcement approaches.展开更多
BACKGROUND It is increasingly common to find patients affected by a combination of type 2 diabetes mellitus(T2DM)and coronary artery disease(CAD),and studies are able to correlate their relationships with available bi...BACKGROUND It is increasingly common to find patients affected by a combination of type 2 diabetes mellitus(T2DM)and coronary artery disease(CAD),and studies are able to correlate their relationships with available biological and clinical evidence.The aim of the current study was to apply association rule mining(ARM)to discover whether there are consistent patterns of clinical features relevant to these diseases.ARM leverages clinical and laboratory data to the meaningful patterns for diabetic CAD by harnessing the power help of data-driven algorithms to optimise the decision-making in patient care.AIM To reinforce the evidence of the T2DM-CAD interplay and demonstrate the ability of ARM to provide new insights into multivariate pattern discovery.METHODS This cross-sectional study was conducted at the Department of Biochemistry in a specialized tertiary care centre in Delhi,involving a total of 300 consented subjects categorized into three groups:CAD with diabetes,CAD without diabetes,and healthy controls,with 100 subjects in each group.The participants were enrolled from the Cardiology IPD&OPD for the sample collection.The study employed ARM technique to extract the meaningful patterns and relationships from the clinical data with its original value.RESULTS The clinical dataset comprised 35 attributes from enrolled subjects.The analysis produced rules with a maximum branching factor of 4 and a rule length of 5,necessitating a 1%probability increase for enhancement.Prominent patterns emerged,highlighting strong links between health indicators and diabetes likelihood,particularly elevated HbA1C and random blood sugar levels.The ARM technique identified individuals with a random blood sugar level>175 and HbA1C>6.6 are likely in the“CAD-with-diabetes”group,offering valuable insights into health indicators and influencing factors on disease outcomes.CONCLUSION The application of this method holds promise for healthcare practitioners to offer valuable insights for enhancing patient treatment targeting specific subtypes of CAD with diabetes.Implying artificial intelligence techniques with medical data,we have shown the potential for personalized healthcare and the development of user-friendly applications aimed at improving cardiovascular health outcomes for this high-risk population to optimise the decision-making in patient care.展开更多
This study explores the factors influencing metro passengers’ arrival volume in Wuhan, China, and Lagos, Nigeria, by examining weather, time of day, waiting time, travel behavior, arrival patterns, and metro satisfac...This study explores the factors influencing metro passengers’ arrival volume in Wuhan, China, and Lagos, Nigeria, by examining weather, time of day, waiting time, travel behavior, arrival patterns, and metro satisfaction. It addresses a significant research gap in understanding metro passengers’ dynamics across cultural and geographical contexts. It employs questionnaires, field observations, and advanced data analysis techniques like association rule mining and neural network modeling. Key findings include a correlation between rainy weather, shorter waiting times, and higher arrival volumes. Neural network models showed high predictive accuracy, with waiting time, metro satisfaction, and weather being significant factors in Lagos Light Rail Blue Line Metro. In contrast, arrival patterns, weather, and time of day were more influential in Wuhan Metro Line 5. Results suggest that improving metro satisfaction and reducing waiting times could increase arrival volumes in Lagos Metro while adjusting schedules for weather and peak times could optimize flow in Wuhan Metro. These insights are valuable for transportation planning, passenger arrival volume management, and enhancing user experiences, potentially benefiting urban transportation sustainability and development goals.展开更多
One of the leading cancers for both genders worldwide is lung cancer.The occurrence of lung cancer has fully augmented since the early 19th century.In this manuscript,we have discussed various data mining techniques t...One of the leading cancers for both genders worldwide is lung cancer.The occurrence of lung cancer has fully augmented since the early 19th century.In this manuscript,we have discussed various data mining techniques that have been employed for cancer diagnosis.Exposure to air pollution has been related to various adverse health effects.This work is subject to analysis of various air pollutants and associated health hazards and intends to evaluate the impact of air pollution caused by lung cancer.We have introduced data mining in lung cancer to air pollution,and our approach includes preprocessing,data mining,testing and evaluation,and knowledge discovery.Initially,we will eradicate the noise and irrelevant data,and following that,we will join the multiple informed sources into a common source.From that source,we will designate the information relevant to our investigation to be regained from that assortment.Following that,we will convert the designated data into a suitable mining process.The patterns are abstracted by utilizing a relational suggestion rule mining process.These patterns have revealed information,and this information is categorized with the help of an Auto Associative Neural Network classification method(AANN).The proposed method is compared with the existing method in various factors.In conclusion,the projected Auto associative neural network and relational suggestion rule mining methods accomplish a high accuracy status.展开更多
Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting corre...Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting correlations, frequent patterns, associations, or causal structures between items hidden in a large database. By exploiting quantum computing, we propose an efficient quantum search algorithm design to discover the maximum frequent patterns. We modified Grover’s search algorithm so that a subspace of arbitrary symmetric states is used instead of the whole search space. We presented a novel quantum oracle design that employs a quantum counter to count the maximum frequent items and a quantum comparator to check with a minimum support threshold. The proposed derived algorithm increases the rate of the correct solutions since the search is only in a subspace. Furthermore, our algorithm significantly scales and optimizes the required number of qubits in design, which directly reflected positively on the performance. Our proposed design can accommodate more transactions and items and still have a good performance with a small number of qubits.展开更多
Recent advancements in science and technology,coupled with the proliferation of data,have also urged laboratory medicine to integrate with the era of artificial intelligence(AI)and machine learning(ML).In the current ...Recent advancements in science and technology,coupled with the proliferation of data,have also urged laboratory medicine to integrate with the era of artificial intelligence(AI)and machine learning(ML).In the current practices of evidencebased medicine,the laboratory tests analysing disease patterns through the association rule mining(ARM)have emerged as a modern tool for the risk assessment and the disease stratification,with the potential to reduce cardiovascular disease(CVD)mortality.CVDs are the well recognised leading global cause of mortality with the higher fatality rates in the Indian population due to associated factors like hypertension,diabetes,and lifestyle choices.AI-driven algorithms have offered deep insights in this field while addressing various challenges such as healthcare systems grappling with the physician shortages.Personalized medicine,well driven by the big data necessitates the integration of ML techniques and high-quality electronic health records to direct the meaningful outcome.These technological advancements enhance the computational analyses for both research and clinical practice.ARM plays a pivotal role by uncovering meaningful relationships within databases,aiding in patient survival prediction and risk factor identification.AI potential in laboratory medicine is vast and it must be cautiously integrated while considering potential ethical,legal,and privacy concerns.Thus,an AI ethics framework is essential to guide its responsible use.Aligning AI algorithms with existing lab practices,promoting education among healthcare professionals,and fostering careful integration into clinical settings are imperative for harnessing the benefits of this transformative technology.展开更多
In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Associ...In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Association rules were used to analyze correlation and check consistency between indices. This study shows that the judgment obtained by weak association rules or non-association rules is more accurate and more credible than that obtained by strong association rules. When the testing grades of two indices in the weak association rules are inconsistent, the testing grades of indices are more likely to be erroneous, and the mistakes are often caused by human factors. Clustering data mining technology was used to analyze the reliability of a diagnosis, or to perform health diagnosis directly. Analysis showed that the clustering results are related to the indices selected, and that if the indices selected are more significant, the characteristics of clustering results are also more significant, and the analysis or diagnosis is more credible. The indices and diagnosis analysis function produced by this study provide a necessary theoretical foundation and new ideas for the development of hydraulic metal structure health diagnosis technology.展开更多
Data mining has been proven as a reliable technique to analyze road accidents and provide productive results. Most of the road accident data analysis use data mining techniques, focusing on identifying factors that af...Data mining has been proven as a reliable technique to analyze road accidents and provide productive results. Most of the road accident data analysis use data mining techniques, focusing on identifying factors that affect the severity of an accident. However, any damage resulting from road accidents is always unacceptable in terms of health, property damage and other economic factors. Sometimes, it is found that road accident occurrences are more frequent at certain specific locations. The analysis of these locations can help in identifying certain road accident features that make a road accident to occur frequently in these locations. Association rule mining is one of the popular data mining techniques that identify the correlation in various attributes of road accident. In this paper, we first applied k-means algorithm to group the accident locations into three categories, high-frequency, moderate-frequency and low-frequency accident locations. k-means algorithm takes accident frequency count as a parameter to cluster the locations. Then we used association rule mining to characterize these locations. The rules revealed different factors associated with road accidents at different locations with varying accident frequencies. Theassociation rules for high-frequency accident location disclosed that intersections on highways are more dangerous for every type of accidents. High-frequency accident locations mostly involved two-wheeler accidents at hilly regions. In moderate-frequency accident locations, colonies near local roads and intersection on highway roads are found dangerous for pedestrian hit accidents. Low-frequency accident locations are scattered throughout the district and the most of the accidents at these locations were not critical. Although the data set was limited to some selected attributes, our approach extracted some useful hidden information from the data which can be utilized to take some preventive efforts in these locations.展开更多
The increasing usage of internet requires a significant system for effective communication. To pro- vide an effective communication for the internet users, based on nature of their queries, shortest routing ...The increasing usage of internet requires a significant system for effective communication. To pro- vide an effective communication for the internet users, based on nature of their queries, shortest routing path is usually preferred for data forwarding. But when more number of data chooses the same path, in that case, bottleneck occurs in the traffic this leads to data loss or provides irrelevant data to the users. In this paper, a Rule Based System using Improved Apriori (RBS-IA) rule mining framework is proposed for effective monitoring of traffic occurrence over the network and control the network traffic. RBS-IA framework integrates both the traffic control and decision making system to enhance the usage of internet trendier. At first, the network traffic data are ana- lyzed and the incoming and outgoing data information is processed using apriori rule mining algorithm. After generating the set of rules, the network traffic condition is analyzed. Based on the traffic conditions, the decision rule framework is introduced which derives and assigns the set of suitable rules to the appropriate states of the network. The decision rule framework improves the effectiveness of network traffic control by updating the traffic condition states for identifying the relevant route path for packet data transmission. Experimental evaluation is conducted by extrac- ting the Dodgers loop sensor data set from UCI repository to detect the effectiveness of theproposed Rule Based System using Improved Apriori (RBS-IA) rule mining framework. Performance evaluation shows that the proposed RBS-IA rule mining framework provides significant improvement in managing the network traffic control scheme. RBS-IA rule mining framework is evaluated over the factors such as accuracy of the decision being obtained, interestingness measure and execution time.展开更多
Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two comp...Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two components at their full force.While the art component involves creating visually appealing and easily interpreted graphics for users,the science component requires accurate representations of a large amount of input data.With a lack of the science component,visualization cannot serve its role of creating correct representations of the actual data,thus leading to wrong perception,interpretation,and decision.It might be even worse if incorrect visual representations were intentionally produced to deceive the viewers.To address common pitfalls in graphical representations,this paper focuses on identifying and understanding the root causes of misinformation in graphical representations.We reviewed the misleading data visualization examples in the scientific publications collected from indexing databases and then projected them onto the fundamental units of visual communication such as color,shape,size,and spatial orientation.Moreover,a text mining technique was applied to extract practical insights from common visualization pitfalls.Cochran’s Q test and McNemar’s test were conducted to examine if there is any difference in the proportions of common errors among color,shape,size,and spatial orientation.The findings showed that the pie chart is the most misused graphical representation,and size is the most critical issue.It was also observed that there were statistically significant differences in the proportion of errors among color,shape,size,and spatial orientation.展开更多
Indirect association is a high level relationship between items and frequent item sets in data. There are many potential applications for indirect associations, such as database marketing, intelligent data analysis, w...Indirect association is a high level relationship between items and frequent item sets in data. There are many potential applications for indirect associations, such as database marketing, intelligent data analysis, web -log analysis, recommended system, etc. Existing indirect association mining algorithms are mostly based on the notion of post - processing of discovery of frequent item sets. In the mining process, all frequent item sets need to be generated first, and then they are fihered and joined to form indirect associations. We have presented an indirect association mining algorithm (NIA) based on anti -monotonicity of indirect associations whereas k candidate indirect associations can be generated directly from k - 1 candidate indirect associations, without all frequent item sets generated. We also use the frequent itempair support matrix to reduce the time and memory space needed by the algorithm. In this paper, a novel algorithm (NIA2) is introduced based on the generation of indirect association patterns between itempairs through one item mediator sets from frequent itempair support matrix. A notion of mediator set support threshold is also presented. NIA2 mines indirect association patterns directly from the dataset, without generating all frequent item sets. The frequent itempair support matrix and the notion of using tm as the support threshold for mediator sets can significantly reduce the cost of joint operations and the search process compared with existing algorithms. Results of experiments on a real - word web log dataset have proved NIA2 one order of magnitude faster than existing algorithms.展开更多
The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowledge discovery from large scale databases. And there has been a spurt of...The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowledge discovery from large scale databases. And there has been a spurt of research activities around this problem. However, traditional association rule mining may often derive many rules in which people are uninterested. This paper reports a generalization of association rule mining called φ association rule mining. It allows people to have different interests on different itemsets that arethe need of real application. Also, it can help to derive interesting rules and substantially reduce the amount of rules. An algorithm based on FP tree for mining φ frequent itemset is presented. It is shown by experiments that the proposed methodis efficient and scalable over large databases.展开更多
Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only ...Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.展开更多
Most of the international accreditation bodies in engineering education(e.g.,ABET)and outcome-based educational systems have based their assess-ments on learning outcomes and program educational objectives.However,map...Most of the international accreditation bodies in engineering education(e.g.,ABET)and outcome-based educational systems have based their assess-ments on learning outcomes and program educational objectives.However,map-ping program educational objectives(PEOs)to student outcomes(SOs)is a challenging and time-consuming task,especially for a new program which is applying for ABET-EAC(American Board for Engineering and Technology the American Board for Engineering and Technology—Engineering Accreditation Commission)accreditation.In addition,ABET needs to automatically ensure that the mapping(classification)is reasonable and correct.The classification also plays a vital role in the assessment of students’learning.Since the PEOs are expressed as short text,they do not contain enough semantic meaning and information,and consequently they suffer from high sparseness,multidimensionality and the curse of dimensionality.In this work,a novel associative short text classification tech-nique is proposed to map PEOs to SOs.The datasets are extracted from 152 self-study reports(SSRs)that were produced in operational settings in an engineering program accredited by ABET-EAC.The datasets are processed and transformed into a representational form appropriate for association rule mining.The extracted rules are utilized as delegate classifiers to map PEOs to SOs.The proposed asso-ciative classification of the mapping of PEOs to SOs has shown promising results,which can simplify the classification of short text and avoid many problems caused by enriching short text based on external resources that are not related or relevant to the dataset.展开更多
The severity of traffic accidents is a serious global concern,particularly in developing nations.Knowing the main causes and contributing circumstances may reduce the severity of traffic accidents.There exist many mac...The severity of traffic accidents is a serious global concern,particularly in developing nations.Knowing the main causes and contributing circumstances may reduce the severity of traffic accidents.There exist many machine learning models and decision support systems to predict road accidents by using datasets from different social media forums such as Twitter,blogs and Facebook.Although such approaches are popular,there exists an issue of data management and low prediction accuracy.This article presented a deep learning-based sentiment analytic model known as Extra-large Network Bi-directional long short term memory(XLNet-Bi-LSTM)to predict traffic collisions based on data collected from social media.Initially,a Tweet dataset has been formed by using an exhaustive keyword-based searching strategy.In the next phase,two different types of features named as individual tokens and pair tokens have been obtained by using POS tagging and association rule mining.The output of this phase has been forwarded to a three-layer deep learning model for final prediction.Numerous experiment has been performed to test the efficiency of the proposed XLNet-Bi-LSTM model.It has been shown that the proposed model achieved 94.2%prediction accuracy.展开更多
Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discoveri...Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.展开更多
This paper discusses on the detection of outliers by hybridizing Rough_Outlier Algorithm with Negative Association Rules. An optimization algorithm named Binary Particle Swarm Optimization is used to improve the compu...This paper discusses on the detection of outliers by hybridizing Rough_Outlier Algorithm with Negative Association Rules. An optimization algorithm named Binary Particle Swarm Optimization is used to improve the computation of Non_Reduct in order to detect outliers.By using Binary PSO algorithm, the rules generated from Rough_Outliers algorithm is optimized, giving significant outliers object detected. The detection ofoutliers process is then enhanced by hybridizing it with Negative Association Rules. Frequent and Infrequent item sets from outlier rules are generated. Results show that the hybrid Rough_Negative algorithm is able to uncover meaningful knowledge of outliers from the frequent and infrequent item sets. These knowledge can then be used by experts in their field of domain for better decision making.展开更多
This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the...This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the target class outcome in the leaf node's records that leads to a situation where majority voting cannot be applied. To solve the above mentioned exception, we propose to base the prediction of the result on the naive Bayes (NB) estimate, k-nearest neighbour (k-NN) and association rule mining (ARM). The other features used for splitting the parent nodes are also taken into consideration.展开更多
基金The National Natural Science Foundation of China(No.50674086)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060290508)the Science and Technology Fund of China University of Mining and Technology(No.2007B016)
文摘An association rules mining method based on semantic relativity is proposed to solve the problem that there are more candidate item sets and higher time complexity in traditional association rules mining.Semantic relativity of ontology concepts is used to describe complicated relationships of domains in the method.Candidate item sets with less semantic relativity are filtered to reduce the number of candidate item sets in association rules mining.An ontology hierarchy relationship is regarded as a directed acyclic graph rather than a hierarchy tree in the semantic relativity computation.Not only direct hierarchy relationships,but also non-direct hierarchy relationships and other typical semantic relationships are taken into account.Experimental results show that the proposed method can reduce the number of candidate item sets effectively and improve the efficiency of association rules mining.
基金Under the auspices of Special Fund of Ministry of Land and Resources of China in Public Interest(No.201511001)
文摘Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.
基金supported by National Key R&D Program of China(2021YFC3001500).
文摘Professional drivers are more frequently exposed to longer driving distance and travel time,leading to a higher possibility of safety risk for distraction and fatigue.The widespread and common use of commercial driver monitoring systems(DMS)provides a potential for data collection.It increases the amount of data characterizing driver behavior that can be used for further safety research.This study utilized DMS warning-based data and applied an association rule mining approach to explore risk factors contributing to hazardous materials(HAZMAT)truck driver inattention.A total of 499 HAZMAT truck driver inattentive warning events were used to find rules that will predict the occurrence of driver’s fatigue and distraction.First,Fisher’s exact tests were performed to examine the association between the frequency of driver inattentive behavior warnings and risk factors.Second,support,confidence,and lift values were used as measurements to quantify the relative strength of the association rules generated by the Apriori algorithm.Results show that speed between 40and 49 km/h,relatively longer travel time(3-6 h),freeway,tangent section,off-peak hour and clear weather condition are found to be highly associated with fatigue driving,while nighttime during 18:00 to 23:59,speed between 70 and 80 km/h,travel time between 1 and 3 h,freeways,acceleration less than 0.5 m/s^(2),visibility greater than 1000 m,and tangent roadway section are found to be highly associated with distracted driving.By focusing on the specific feature groups,these association rules would help in the development of mitigating distraction and fatigue driving countermeasures and enforcement approaches.
文摘BACKGROUND It is increasingly common to find patients affected by a combination of type 2 diabetes mellitus(T2DM)and coronary artery disease(CAD),and studies are able to correlate their relationships with available biological and clinical evidence.The aim of the current study was to apply association rule mining(ARM)to discover whether there are consistent patterns of clinical features relevant to these diseases.ARM leverages clinical and laboratory data to the meaningful patterns for diabetic CAD by harnessing the power help of data-driven algorithms to optimise the decision-making in patient care.AIM To reinforce the evidence of the T2DM-CAD interplay and demonstrate the ability of ARM to provide new insights into multivariate pattern discovery.METHODS This cross-sectional study was conducted at the Department of Biochemistry in a specialized tertiary care centre in Delhi,involving a total of 300 consented subjects categorized into three groups:CAD with diabetes,CAD without diabetes,and healthy controls,with 100 subjects in each group.The participants were enrolled from the Cardiology IPD&OPD for the sample collection.The study employed ARM technique to extract the meaningful patterns and relationships from the clinical data with its original value.RESULTS The clinical dataset comprised 35 attributes from enrolled subjects.The analysis produced rules with a maximum branching factor of 4 and a rule length of 5,necessitating a 1%probability increase for enhancement.Prominent patterns emerged,highlighting strong links between health indicators and diabetes likelihood,particularly elevated HbA1C and random blood sugar levels.The ARM technique identified individuals with a random blood sugar level>175 and HbA1C>6.6 are likely in the“CAD-with-diabetes”group,offering valuable insights into health indicators and influencing factors on disease outcomes.CONCLUSION The application of this method holds promise for healthcare practitioners to offer valuable insights for enhancing patient treatment targeting specific subtypes of CAD with diabetes.Implying artificial intelligence techniques with medical data,we have shown the potential for personalized healthcare and the development of user-friendly applications aimed at improving cardiovascular health outcomes for this high-risk population to optimise the decision-making in patient care.
文摘This study explores the factors influencing metro passengers’ arrival volume in Wuhan, China, and Lagos, Nigeria, by examining weather, time of day, waiting time, travel behavior, arrival patterns, and metro satisfaction. It addresses a significant research gap in understanding metro passengers’ dynamics across cultural and geographical contexts. It employs questionnaires, field observations, and advanced data analysis techniques like association rule mining and neural network modeling. Key findings include a correlation between rainy weather, shorter waiting times, and higher arrival volumes. Neural network models showed high predictive accuracy, with waiting time, metro satisfaction, and weather being significant factors in Lagos Light Rail Blue Line Metro. In contrast, arrival patterns, weather, and time of day were more influential in Wuhan Metro Line 5. Results suggest that improving metro satisfaction and reducing waiting times could increase arrival volumes in Lagos Metro while adjusting schedules for weather and peak times could optimize flow in Wuhan Metro. These insights are valuable for transportation planning, passenger arrival volume management, and enhancing user experiences, potentially benefiting urban transportation sustainability and development goals.
基金support from Taif University Researchers supporting Project Number(TURSP-2020/215),Taif University,Taif,Saudi Arabia.
文摘One of the leading cancers for both genders worldwide is lung cancer.The occurrence of lung cancer has fully augmented since the early 19th century.In this manuscript,we have discussed various data mining techniques that have been employed for cancer diagnosis.Exposure to air pollution has been related to various adverse health effects.This work is subject to analysis of various air pollutants and associated health hazards and intends to evaluate the impact of air pollution caused by lung cancer.We have introduced data mining in lung cancer to air pollution,and our approach includes preprocessing,data mining,testing and evaluation,and knowledge discovery.Initially,we will eradicate the noise and irrelevant data,and following that,we will join the multiple informed sources into a common source.From that source,we will designate the information relevant to our investigation to be regained from that assortment.Following that,we will convert the designated data into a suitable mining process.The patterns are abstracted by utilizing a relational suggestion rule mining process.These patterns have revealed information,and this information is categorized with the help of an Auto Associative Neural Network classification method(AANN).The proposed method is compared with the existing method in various factors.In conclusion,the projected Auto associative neural network and relational suggestion rule mining methods accomplish a high accuracy status.
文摘Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting correlations, frequent patterns, associations, or causal structures between items hidden in a large database. By exploiting quantum computing, we propose an efficient quantum search algorithm design to discover the maximum frequent patterns. We modified Grover’s search algorithm so that a subspace of arbitrary symmetric states is used instead of the whole search space. We presented a novel quantum oracle design that employs a quantum counter to count the maximum frequent items and a quantum comparator to check with a minimum support threshold. The proposed derived algorithm increases the rate of the correct solutions since the search is only in a subspace. Furthermore, our algorithm significantly scales and optimizes the required number of qubits in design, which directly reflected positively on the performance. Our proposed design can accommodate more transactions and items and still have a good performance with a small number of qubits.
文摘Recent advancements in science and technology,coupled with the proliferation of data,have also urged laboratory medicine to integrate with the era of artificial intelligence(AI)and machine learning(ML).In the current practices of evidencebased medicine,the laboratory tests analysing disease patterns through the association rule mining(ARM)have emerged as a modern tool for the risk assessment and the disease stratification,with the potential to reduce cardiovascular disease(CVD)mortality.CVDs are the well recognised leading global cause of mortality with the higher fatality rates in the Indian population due to associated factors like hypertension,diabetes,and lifestyle choices.AI-driven algorithms have offered deep insights in this field while addressing various challenges such as healthcare systems grappling with the physician shortages.Personalized medicine,well driven by the big data necessitates the integration of ML techniques and high-quality electronic health records to direct the meaningful outcome.These technological advancements enhance the computational analyses for both research and clinical practice.ARM plays a pivotal role by uncovering meaningful relationships within databases,aiding in patient survival prediction and risk factor identification.AI potential in laboratory medicine is vast and it must be cautiously integrated while considering potential ethical,legal,and privacy concerns.Thus,an AI ethics framework is essential to guide its responsible use.Aligning AI algorithms with existing lab practices,promoting education among healthcare professionals,and fostering careful integration into clinical settings are imperative for harnessing the benefits of this transformative technology.
基金supported by the Key Program of the National Natural Science Foundation of China(Grant No.50539010)the Special Fund for Public Welfare Industry of the Ministry of Water Resources of China(Grant No.200801019)
文摘In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Association rules were used to analyze correlation and check consistency between indices. This study shows that the judgment obtained by weak association rules or non-association rules is more accurate and more credible than that obtained by strong association rules. When the testing grades of two indices in the weak association rules are inconsistent, the testing grades of indices are more likely to be erroneous, and the mistakes are often caused by human factors. Clustering data mining technology was used to analyze the reliability of a diagnosis, or to perform health diagnosis directly. Analysis showed that the clustering results are related to the indices selected, and that if the indices selected are more significant, the characteristics of clustering results are also more significant, and the analysis or diagnosis is more credible. The indices and diagnosis analysis function produced by this study provide a necessary theoretical foundation and new ideas for the development of hydraulic metal structure health diagnosis technology.
文摘Data mining has been proven as a reliable technique to analyze road accidents and provide productive results. Most of the road accident data analysis use data mining techniques, focusing on identifying factors that affect the severity of an accident. However, any damage resulting from road accidents is always unacceptable in terms of health, property damage and other economic factors. Sometimes, it is found that road accident occurrences are more frequent at certain specific locations. The analysis of these locations can help in identifying certain road accident features that make a road accident to occur frequently in these locations. Association rule mining is one of the popular data mining techniques that identify the correlation in various attributes of road accident. In this paper, we first applied k-means algorithm to group the accident locations into three categories, high-frequency, moderate-frequency and low-frequency accident locations. k-means algorithm takes accident frequency count as a parameter to cluster the locations. Then we used association rule mining to characterize these locations. The rules revealed different factors associated with road accidents at different locations with varying accident frequencies. Theassociation rules for high-frequency accident location disclosed that intersections on highways are more dangerous for every type of accidents. High-frequency accident locations mostly involved two-wheeler accidents at hilly regions. In moderate-frequency accident locations, colonies near local roads and intersection on highway roads are found dangerous for pedestrian hit accidents. Low-frequency accident locations are scattered throughout the district and the most of the accidents at these locations were not critical. Although the data set was limited to some selected attributes, our approach extracted some useful hidden information from the data which can be utilized to take some preventive efforts in these locations.
文摘The increasing usage of internet requires a significant system for effective communication. To pro- vide an effective communication for the internet users, based on nature of their queries, shortest routing path is usually preferred for data forwarding. But when more number of data chooses the same path, in that case, bottleneck occurs in the traffic this leads to data loss or provides irrelevant data to the users. In this paper, a Rule Based System using Improved Apriori (RBS-IA) rule mining framework is proposed for effective monitoring of traffic occurrence over the network and control the network traffic. RBS-IA framework integrates both the traffic control and decision making system to enhance the usage of internet trendier. At first, the network traffic data are ana- lyzed and the incoming and outgoing data information is processed using apriori rule mining algorithm. After generating the set of rules, the network traffic condition is analyzed. Based on the traffic conditions, the decision rule framework is introduced which derives and assigns the set of suitable rules to the appropriate states of the network. The decision rule framework improves the effectiveness of network traffic control by updating the traffic condition states for identifying the relevant route path for packet data transmission. Experimental evaluation is conducted by extrac- ting the Dodgers loop sensor data set from UCI repository to detect the effectiveness of theproposed Rule Based System using Improved Apriori (RBS-IA) rule mining framework. Performance evaluation shows that the proposed RBS-IA rule mining framework provides significant improvement in managing the network traffic control scheme. RBS-IA rule mining framework is evaluated over the factors such as accuracy of the decision being obtained, interestingness measure and execution time.
文摘Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two components at their full force.While the art component involves creating visually appealing and easily interpreted graphics for users,the science component requires accurate representations of a large amount of input data.With a lack of the science component,visualization cannot serve its role of creating correct representations of the actual data,thus leading to wrong perception,interpretation,and decision.It might be even worse if incorrect visual representations were intentionally produced to deceive the viewers.To address common pitfalls in graphical representations,this paper focuses on identifying and understanding the root causes of misinformation in graphical representations.We reviewed the misleading data visualization examples in the scientific publications collected from indexing databases and then projected them onto the fundamental units of visual communication such as color,shape,size,and spatial orientation.Moreover,a text mining technique was applied to extract practical insights from common visualization pitfalls.Cochran’s Q test and McNemar’s test were conducted to examine if there is any difference in the proportions of common errors among color,shape,size,and spatial orientation.The findings showed that the pie chart is the most misused graphical representation,and size is the most critical issue.It was also observed that there were statistically significant differences in the proportion of errors among color,shape,size,and spatial orientation.
文摘Indirect association is a high level relationship between items and frequent item sets in data. There are many potential applications for indirect associations, such as database marketing, intelligent data analysis, web -log analysis, recommended system, etc. Existing indirect association mining algorithms are mostly based on the notion of post - processing of discovery of frequent item sets. In the mining process, all frequent item sets need to be generated first, and then they are fihered and joined to form indirect associations. We have presented an indirect association mining algorithm (NIA) based on anti -monotonicity of indirect associations whereas k candidate indirect associations can be generated directly from k - 1 candidate indirect associations, without all frequent item sets generated. We also use the frequent itempair support matrix to reduce the time and memory space needed by the algorithm. In this paper, a novel algorithm (NIA2) is introduced based on the generation of indirect association patterns between itempairs through one item mediator sets from frequent itempair support matrix. A notion of mediator set support threshold is also presented. NIA2 mines indirect association patterns directly from the dataset, without generating all frequent item sets. The frequent itempair support matrix and the notion of using tm as the support threshold for mediator sets can significantly reduce the cost of joint operations and the search process compared with existing algorithms. Results of experiments on a real - word web log dataset have proved NIA2 one order of magnitude faster than existing algorithms.
文摘The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowledge discovery from large scale databases. And there has been a spurt of research activities around this problem. However, traditional association rule mining may often derive many rules in which people are uninterested. This paper reports a generalization of association rule mining called φ association rule mining. It allows people to have different interests on different itemsets that arethe need of real application. Also, it can help to derive interesting rules and substantially reduce the amount of rules. An algorithm based on FP tree for mining φ frequent itemset is presented. It is shown by experiments that the proposed methodis efficient and scalable over large databases.
基金Supported by the National Natural Science Foun-dation of China (70371015)
文摘Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.
文摘Most of the international accreditation bodies in engineering education(e.g.,ABET)and outcome-based educational systems have based their assess-ments on learning outcomes and program educational objectives.However,map-ping program educational objectives(PEOs)to student outcomes(SOs)is a challenging and time-consuming task,especially for a new program which is applying for ABET-EAC(American Board for Engineering and Technology the American Board for Engineering and Technology—Engineering Accreditation Commission)accreditation.In addition,ABET needs to automatically ensure that the mapping(classification)is reasonable and correct.The classification also plays a vital role in the assessment of students’learning.Since the PEOs are expressed as short text,they do not contain enough semantic meaning and information,and consequently they suffer from high sparseness,multidimensionality and the curse of dimensionality.In this work,a novel associative short text classification tech-nique is proposed to map PEOs to SOs.The datasets are extracted from 152 self-study reports(SSRs)that were produced in operational settings in an engineering program accredited by ABET-EAC.The datasets are processed and transformed into a representational form appropriate for association rule mining.The extracted rules are utilized as delegate classifiers to map PEOs to SOs.The proposed asso-ciative classification of the mapping of PEOs to SOs has shown promising results,which can simplify the classification of short text and avoid many problems caused by enriching short text based on external resources that are not related or relevant to the dataset.
文摘The severity of traffic accidents is a serious global concern,particularly in developing nations.Knowing the main causes and contributing circumstances may reduce the severity of traffic accidents.There exist many machine learning models and decision support systems to predict road accidents by using datasets from different social media forums such as Twitter,blogs and Facebook.Although such approaches are popular,there exists an issue of data management and low prediction accuracy.This article presented a deep learning-based sentiment analytic model known as Extra-large Network Bi-directional long short term memory(XLNet-Bi-LSTM)to predict traffic collisions based on data collected from social media.Initially,a Tweet dataset has been formed by using an exhaustive keyword-based searching strategy.In the next phase,two different types of features named as individual tokens and pair tokens have been obtained by using POS tagging and association rule mining.The output of this phase has been forwarded to a three-layer deep learning model for final prediction.Numerous experiment has been performed to test the efficiency of the proposed XLNet-Bi-LSTM model.It has been shown that the proposed model achieved 94.2%prediction accuracy.
基金Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number RI-44-0444.
文摘Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.
文摘This paper discusses on the detection of outliers by hybridizing Rough_Outlier Algorithm with Negative Association Rules. An optimization algorithm named Binary Particle Swarm Optimization is used to improve the computation of Non_Reduct in order to detect outliers.By using Binary PSO algorithm, the rules generated from Rough_Outliers algorithm is optimized, giving significant outliers object detected. The detection ofoutliers process is then enhanced by hybridizing it with Negative Association Rules. Frequent and Infrequent item sets from outlier rules are generated. Results show that the hybrid Rough_Negative algorithm is able to uncover meaningful knowledge of outliers from the frequent and infrequent item sets. These knowledge can then be used by experts in their field of domain for better decision making.
文摘This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the target class outcome in the leaf node's records that leads to a situation where majority voting cannot be applied. To solve the above mentioned exception, we propose to base the prediction of the result on the naive Bayes (NB) estimate, k-nearest neighbour (k-NN) and association rule mining (ARM). The other features used for splitting the parent nodes are also taken into consideration.