摘要
因传统信息分类方法在面向大数据时,普遍存在分类时间较长、平均误分率较高的问题,不能有效区分不同信息类型,提出一种面向大数据的规则引擎驱动下信息分类方法。通过对规则引擎下的大数据信息进行分析,构建大数据信息集模型,获取大数据信息特征。将大数据信息特征输入到支持向量机分类器中,以大数据信息分类正确率作为适应度函数,采用粒子群算法对支持向量机参数进行优化选取,根据选取结果实现信息的初步分类,根据初步分类结果运用数据概化理论实现信息属性的最优分类。实验结果表明,所提方法的平均误分率较低,分类所用时间较短,并且能够有效划分大数据信息类型,分类结果具有较高的可靠性。
Traditional information classification methods have defects in processing big data, such as long classification time and high average misclassification rate, leading to the failure of different information types. Therefore, this paper put forwards a rule engine-driven information classification method for big data. The big data information under the rule engine was analyzed in detail to build the big data information set model and obtain the characteristics of big data information. Big data information features were input into a support vector machine(SVM) classifier. The classification accuracy of big data information was taken as the fitness function. Particle swarm optimization(PSO) was applied to optimize the parameters of SVM. Based on the selected results, the preliminary classification of information was realized. According to the preliminary classification results, the data generalization theory was used to achieve the optimal classification of information attributes. Experimental results show that this method has a low average misclassification rate, short classification time, high efficiency, and reliability.
作者
倪海
邵英俭
NI Hai;SHAO Ying-jian(Big Data and Smart Campus Management Center of Beihua University,Jilin Jilin 132013,China;Science Park Administrative Committee of Beihua University,Jilin Jilin 132013,China)
出处
《计算机仿真》
北大核心
2021年第5期371-374,444,共5页
Computer Simulation
基金
北华大学横向课题“润石云大数据综合服务管理平台软件研发项目”(201901020)。
关键词
大数据
规则引擎
支持向量机
信息分类
数据概化
Big data
Rule engine
Support vector machine
Information classification
Data generalization