摘要
随着精准扶贫建档立卡工作的实施,精准扶贫系统已积累了大量数据,利用高效的关联规则算法挖掘其中隐含的有用信息对助力精准扶贫工作具有重要意义。本文针对贫困户建档立卡数据的数据重复率高,属性多样特点,提出一种改进的Apriori算法,利用对矩阵的数据结构和集合的相关性质来构建候选项集,避免重复扫描数据库以及逐层的剪枝连接运算,提高算法挖掘效率;通过对实际贫困户建档立卡数据进行挖掘,证明了该算法在最小支持度阈值较低的条件下挖掘效率优于传统Apriori算法。
With the implementation of the targeted poverty alleviation archiving work,the targeted poverty alleviation system has accumulated a larger amount of data.It is of great significance to use efficient association rules algorithm to mine the hidden useful information.The text aims at the characteristics of high repetition rate and diverse attributes of archived card data of poor households.An improved Apriori algorithm is proposed to construct candidate item sets by utilizing the data structure of the matrix and the relevant properties of the set,so as to avoid repeated scanning of the database and pruning connection operation layer by layer,so as to improve the efficiency of algorithm mining.By mining the data of the actual family in poverty,it is proved that the mining efficiency of this algorithm is better than traditional Apriori algorithm under the condition of low minimum support threshold.
作者
何庆
刘亮
HE Qing;LIU Liang(Institute of Big Data and Information Engineering,Guizhou University,Guiyang 550025,China)
出处
《贵州大学学报(自然科学版)》
2019年第6期46-52,共7页
Journal of Guizhou University:Natural Sciences
基金
贵州省科技计划项目重大专项资助(黔科合重大专项字[2016]3022,黔科合重大专项字[2018]3002)
贵州省公共大数据重点实验室开放课题资助(2017BDKFJJ004,2017BDKFJJ034)
贵州省教育厅青年科技人才成长项目资助(黔科合KY字[2016]124)
关键词
关联规则
频繁项集
精准扶贫
APRIORI算法
association rules
candidate item sets
targeted poverty alleviation
Apriori algorithm