摘要
针对传统平台运行Apriori算法来挖掘中医病案中用药组合规律时,存在着占用内存空间大、计算效率低和PB级数据无法处理等问题,提出基于Hadoop的中医哮喘用药组合关联分析方法。采用Mapreduce分布式计算框架和HBase分布式数据库优化Apriori算法性能:一方面使用Mapreduce计算框架并行处理数据,借助HBase高速读写数据的特性,加速频繁项集的产生;另一方面摒弃传统算法中的自连接产生候选项集方式,对每个节点上的数据,使用循环和递归相结合的方式产生候选集,提高候选集产生的效率。实验结果证明,借助基于Hadoop的中医哮喘用药组合关联分析方法挖掘中医药组合规律,效率更高,能更有效地指导临床实践。
On the traditional platform running Apriori algorithm to mining the TCM asthma medication combination rules, there are problems: Occupying a large memory space, low computational efficiency, and PB byte data can not be processed with. Thus this paper puts forward analysis method to mining associations of asthma medicine in TCM treatment based on Hadoop. Using Mapreduce distributed computing framework and HBase distributed database to optimized Apriori algorithm performance: on the one hand parallel processing data with Mapreduce computing framework, combine with characteristic of HBase high speed of read and write data, which accelerate the production of frequent item sets; On the other hand abandoned the traditional algorithm self join way of generated candidate sets. Using a combination of loops and recursive way to generation candidate sets for data of each node instead, which improve the candidate set generation efficiency. Experiments show that the use of Hadoop platform to mine the combination of TCM asthma medication,which has higher efficiency, and can more effectively guide clinical practice.
出处
《计算机工程与应用》
CSCD
北大核心
2017年第13期95-98,124,共5页
Computer Engineering and Applications
基金
山东省重点研发计划项目(No.2015GSF119016)