摘要
Data mining technology and association rule mining can be important technologies to deal with a large amount of accumulated data in the medical field,and can reflect the value of large medical data.According to the characteristics of large medical data,aiming at the problem that the traditional Apriori algorithm scans the database too long and generates too many candidate itemsets,a method of digital mapping and sorting of itemsets is proposed.The method of the base model and generation model was used to generate superset,which can improve the efficiency of superset generation and pruning.By using open source framework Hadoop and transplanting the improved algorithm to the Hadoop platform combined with the MapReduce framework,the idea of parallel improvement was introduced based on database partition.Experimental results show that it solves the redundancy of large-scale data sets and makes Apriori algorithm have good parallel scalability.Finally,an example was given to demonstrate the possibility of improving the algorithm.
出处
《国际计算机前沿大会会议论文集》
2020年第2期506-520,共15页
International Conference of Pioneering Computer Scientists, Engineers and Educators(ICPCSEE)
基金
the national natural science foundation of China([2018]61741124)
the science planning project of Guizhou province(Guizhou science and technology cooperation platform talent[2018]no.5781)
What’s more,we thank the anonymous reviewers sincerely for their significant and valuable feedback.