Association rules mining is a major data mining field that leads to discovery of associations and correlations among items in today’s big data environment. The conventional association rule mining focuses mainly on p...Association rules mining is a major data mining field that leads to discovery of associations and correlations among items in today’s big data environment. The conventional association rule mining focuses mainly on positive itemsets generated from frequently occurring itemsets (PFIS). However, there has been a significant study focused on infrequent itemsets with utilization of negative association rules to mine interesting frequent itemsets (NFIS) from transactions. In this work, we propose an efficient backward calculating negative frequent itemset algorithm namely EBC-NFIS for computing backward supports that can extract both positive and negative frequent itemsets synchronously from dataset. EBC-NFIS algorithm is based on popular e-NFIS algorithm that computes supports of negative itemsets from the supports of positive itemsets. The proposed algorithm makes use of previously computed supports from memory to minimize the computation time. In addition, association rules, i.e. positive and negative association rules (PNARs) are generated from discovered frequent itemsets using EBC-NFIS algorithm. The efficiency of the proposed algorithm is verified by several experiments and comparing results with e-NFIS algorithm. The experimental results confirm that the proposed algorithm successfully discovers NFIS and PNARs and runs significantly faster than conventional e-NFIS algorithm.展开更多
目的应用近红外光谱技术建立海参产地区分和胶原蛋白快速检测的方法。方法总计43个海参样品来自大连、福建、连云港、山东4个地区。首先采集样品的近红外光谱图,经过标准正态变量(standard normal variables,SNV)预处理,利用不同定性判...目的应用近红外光谱技术建立海参产地区分和胶原蛋白快速检测的方法。方法总计43个海参样品来自大连、福建、连云港、山东4个地区。首先采集样品的近红外光谱图,经过标准正态变量(standard normal variables,SNV)预处理,利用不同定性判别模型对海参产地进行区分。通过分光光度计法测定海参的胶原蛋白含量,利用偏最小二乘法(partial least squares,PLS)、区间偏最小二乘法(interval partial least squares,iPLS)、向后区间偏最小二乘法(backwards interval partial least squares,BiPLS)和联合区间偏最小二乘法(synergy interval partial least squares,Si PLS)建立了海参胶原蛋白含量的预测模型。结果产地区分模型中最小二乘支持向量机(least-squares support vector machine regression,LS-SVM)的识别率最高,校正集识别率为100%,预测集识别率为95.35%;海参胶原蛋白预测模型中BiPLS的预测效果较好,校正集相关系数Rc为0.9002,预测集相关系数Rp为0.8517。结论近红外光谱技术可实现对海参的产地区分和胶原蛋白的快速检测。展开更多
文摘Association rules mining is a major data mining field that leads to discovery of associations and correlations among items in today’s big data environment. The conventional association rule mining focuses mainly on positive itemsets generated from frequently occurring itemsets (PFIS). However, there has been a significant study focused on infrequent itemsets with utilization of negative association rules to mine interesting frequent itemsets (NFIS) from transactions. In this work, we propose an efficient backward calculating negative frequent itemset algorithm namely EBC-NFIS for computing backward supports that can extract both positive and negative frequent itemsets synchronously from dataset. EBC-NFIS algorithm is based on popular e-NFIS algorithm that computes supports of negative itemsets from the supports of positive itemsets. The proposed algorithm makes use of previously computed supports from memory to minimize the computation time. In addition, association rules, i.e. positive and negative association rules (PNARs) are generated from discovered frequent itemsets using EBC-NFIS algorithm. The efficiency of the proposed algorithm is verified by several experiments and comparing results with e-NFIS algorithm. The experimental results confirm that the proposed algorithm successfully discovers NFIS and PNARs and runs significantly faster than conventional e-NFIS algorithm.
文摘目的应用近红外光谱技术建立海参产地区分和胶原蛋白快速检测的方法。方法总计43个海参样品来自大连、福建、连云港、山东4个地区。首先采集样品的近红外光谱图,经过标准正态变量(standard normal variables,SNV)预处理,利用不同定性判别模型对海参产地进行区分。通过分光光度计法测定海参的胶原蛋白含量,利用偏最小二乘法(partial least squares,PLS)、区间偏最小二乘法(interval partial least squares,iPLS)、向后区间偏最小二乘法(backwards interval partial least squares,BiPLS)和联合区间偏最小二乘法(synergy interval partial least squares,Si PLS)建立了海参胶原蛋白含量的预测模型。结果产地区分模型中最小二乘支持向量机(least-squares support vector machine regression,LS-SVM)的识别率最高,校正集识别率为100%,预测集识别率为95.35%;海参胶原蛋白预测模型中BiPLS的预测效果较好,校正集相关系数Rc为0.9002,预测集相关系数Rp为0.8517。结论近红外光谱技术可实现对海参的产地区分和胶原蛋白的快速检测。