基于高斯函数的衰减因子设置方法研究被引量：4

A Method to Set Decay Factor Based on Gaussian Function

下载PDF

导出

摘要数据流是随着时间顺序快速变化的和连续的,其包含的知识会随着时间的改变而不同.在一些数据流应用中,通常认为最新的数据具有最大的价值.因此,会采用时间衰减模型来挖掘数据流中的频繁模式.已有的衰减因子设计方式通常具有随机性,使得到的结果集具有不稳定性;或仅考虑算法的高查全率或查准率,而忽略了算法对应的高查准率或查全率.为了平衡算法的高查全率和高查准率同时保证结果集的稳定性,设计了均值衰减因子设置方式.为了更进一步地增加最新事务的权重、减少历史事务的权重,设计了采用高斯函数设置高斯衰减因子的方式.为了比较不同衰减因子设计方式的优劣,研究并设计了4种方式的时间衰减模型,并采用这4种模型挖掘数据流闭合频繁模式.通过对高密度和低密度数据流分别进行频繁挖掘的实验结果分析可以得出,采用均值衰减因子设置方式可以平衡高查全率和高查准率;采用高斯衰减因子设置方式与其他方法相比,可以得到更优的算法性能. Data stream is a continuous and time changed sequence of data elements,and contained information is different over time.In some data stream applications,the information embedded in the data arriving in the new recent time period is of particular value.Therefore,time decay model（TDM）is used for mining frequent patterns on data stream.Existing methods to design time decay factor have the characteristics of randomness,so the result set is unsteady.Or,the methods just consider 100%recall or 100% precision of the algorithm,while they ignore the corresponding high precision or recall.In order to balance high recall and high precision of the algorithm and ensure the stability of the result set,a novel way to set average decay factor is designed.To further increase the weights of the latest transactions and reduce the weights of historical transactions,another novel way to design decay factor based on Gaussian function is proposed.For comparing the pros and cons of different time factors,four time decay models are researched and designed.The algorithms based on these four models are designed to discover closed frequent patterns over data streams.The performance of the proposed methods to mine the frequent patterns on the high-density or low-density data streams is evaluated via experiments.Results show that using the average time decay factor balances the high recall and high precision of the algorithm.Compared with other ways,setting decay factor based on Gaussian function gets better performance than them.

作者韩萌王志海原继东

机构地区北京交通大学计算机与信息技术学院北方民族大学计算机科学与工程学院

出处《计算机研究与发展》 EI CSCD 北大核心 2015年第12期2834-2843,共10页 Journal of Computer Research and Development

基金国家自然科学基金项目(61563001) 国家民委科研基金项目(14BFZ008) 北京市自然科学基金项目(4142042) 北方民族大学科研基金项目(2013QZP02)

关键词衰减因子时间衰减模型高斯函数查全率查准率频繁模式挖掘数据流挖掘 decay factor time decay model Gaussian function recall precision frequent pattern mining data streams mining

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献16

1李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式[J].软件学报,2008,19(10):2585-2596. 被引量：45
2Chen Hui, Shu L, Xia Jiali, et al. Mining frequent patterns in a varying-size sliding window of online transactional data streams [J]. Information Sciences, 2012, 215:15-36.
3李海峰,章宁,朱建明,曹怀虎.时间敏感数据流上的频繁项集挖掘算法[J].计算机学报,2012,35(11):2283-2293. 被引量：29
4Chi Yun, Wang Haixun, Yu P S, et al. Catch the moment.- Maintaining closed frequent itemsets over a data stream sliding window [J]. Knowledge and Information Systems, 2006, 10(3): 265-294.
5Yen S J, Lee Y, Wu Chengwei, et al. An efficient algorithm for maintaining frequent closed itemsets over data stream [G] //Next-Generation Applied Intelligence. Berlin: Springer, 2009 : 767-776.
6Tang Keming, Dai Caiyan, Chen Ling. A novel strategy for mining frequent closed itemsets in data streams [J]. Journal of Computers, 2012, 7(7): 1564-1572.
7Noria F, Deypir M, Sadreddini M H. A sliding window based algorithm for frequent closed itemset mining over data streams [J]. Journal of Systems and Software, 2013, 86(3) : 615-623.
8Cheng J, Ke Yiping, Ng W. Maintaining frequent closed itemsets over a sliding window [J]. Journal of Intelligent Information Systems, 2008, 31(3): 191-215.
9Yen S, Wu Chengwei, Lee Y, et al. A fast algorithm for mining frequent closed itemsets over stream sliding window [C] //Proc of 2011 IEEE Int Conf on Fuzzy Systems. Piscataway, NJ: IEEE, 2011:996-1002.
10HewaNadungodage C, Xia Yuni, Lee J J, et al. Hyper- structure mining of frequent patterns in uncertain data streams [J]. Knowledge and Information Systems, 2013, 37 (1): 219-244.

二级参考文献50

1李建中于戈周傲英.不确定性数据管理的要求与挑战[J].中国计算机学会通讯,2009,5(4):6-14.
2Gaber MM, Zaslavsky A, Krishnaswamy S. Mining data streams: A review. ACM SIGMOD Record, 2005,34(2): 18-26.
3Jiang N, Gruenwald L. Research issues in data stream association rule mining. ACM SIGMOD Record, 2006,35(1):14-19.
4Garofalakis MN, Gehrke J. Querying and mining data streams: You only get one look a tutorial. In: Franklin MJ, Moon B, Ailamaki A, eds. Proc. of the 2002 ACM SIGMOD Int'l Conf. on Management of Data. Madison: ACM Press, 2002. 635-635.
5Giannella C, Han J, Pei J, Yan X, Yu PS. Mining frequent patterns in data streams at multiple time granularities. In: Data Mining: Next Generation Challenges and Future Directions. 2004. 191-212.
6Chang JH, Lee WS. Finding recent frequent itemsets adaptively over online data streams. In: Lise G, Ted ES, Pedro D, Christos F, eds. Proc. of the 9th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Washington: ACM Press, 2003. 487-492.
7Jiang N, Gruenwald L. CFI-Stream: Mining closed frequent itemsets in data streams. In: Roberto B, Kristin PB, Gautam D, Dimitrios G, Johannes G, eds. Proc. of the 12th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Philadelphia: ACM Press, 2006. 592-597.
8Yu JX, Chong Z, Lu H, Zhang Z, Zhou A. A false negative approach to mining frequent itemsets from high speed transactional data streams, Information Sciences, 2006,176(4):1986-2015.
9Leung CKS, Khan QI. DStree: A tree structure for the mining of frequent sets from data streams. In: Clifton CW, Zhong N, Liu JM, Wah BW, Wu XD, eds. Proc. of the 6th Int'l Conf. on Data Mining. Hong Kong: IEEE Press, 2006. 928-932.
10Wong RCW, Fu AWC. Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery, 2006,13(2): 193-217.

共引文献79

1杨欢,张玉清,胡予濮,刘奇旭.基于权限频繁模式挖掘算法的Android恶意应用检测方法[J].通信学报,2013,34(S1):106-115. 被引量：47
2邓爱萍.网络热点发现与跟踪算法研究[J].计算机与现代化,2009(12):122-124. 被引量：2
3朱参世,李响,朱琳.基于流数据分类挖掘算法在入侵检测的应用[J].微计算机信息,2010,26(12):80-81.
4朱参世,李响.自适应模糊决策树算法在数据流挖掘中的应用[J].现代电子技术,2010,33(10):63-65. 被引量：2
5吴枫,仲妍,吴泉源.基于时间衰减模型的数据流频繁模式挖掘[J].自动化学报,2010,36(5):674-684. 被引量：9
6陈辉.挖掘数据流滑动时间窗口内Top-K频繁模式[J].小型微型计算机系统,2010,31(6):1123-1128. 被引量：2
7杨君锐,黄威.基于前缀树的数据流频繁模式挖掘算法[J].华中科技大学学报（自然科学版）,2010,38(7):107-110. 被引量：2
8倪志伟,姜苗,王超,戴奇波.在线挖掘数据流混合窗口中闭频繁项集[J].系统仿真学报,2010,22(9):2110-2114. 被引量：2
9琚春华,许翀寰.基于有序复合策略的数据流最大频繁项集挖掘[J].情报学报,2010,29(5):864-871.
10冯博,徐雅静,赵娜,徐惠民.数据流中的频繁标记闭子树的批量挖掘[J].北京邮电大学学报,2010,33(5):121-125.

同被引文献32

1唐振鹏,张婷婷,吴俊传,杜晓旭,陈凯杰.基于混合模型的原油价格多步预测研究[J].计量经济学报,2021(2):346-361. 被引量：7
2刘业政,焦宁,姜元春.连续属性离散化算法比较研究[J].计算机应用研究,2007,24(9):28-30. 被引量：20
3李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式[J].软件学报,2008,19(10):2585-2596. 被引量：45
4吴锋,李秀梅,朱旭辉,黄哲华.最速下降法的若干重要改进[J].广西大学学报（自然科学版）,2010,35(4):596-600. 被引量：12
5李成,周恒.原油价格改进型神经网络预测方法[J].统计与决策,2013,29(8):67-69. 被引量：8
6李晶,陈轲娜.基于铁锂电池的非浮充式变电站直流电源系统[J].四川电力技术,2013,36(5):30-32. 被引量：6
7刘文龙.基于T-S模糊模型的多变量非线性预测控制[J].电子测量与仪器学报,2013,27(10):998-1003. 被引量：20
8尹安东,周斌,江昊,赵韩.自适应神经模糊系统的LiFePO_4电池SOC预测[J].电子测量与仪器学报,2014,28(1):84-90. 被引量：13
9刘艳莉,戴胜,程泽,朱乐为.基于有限差分扩展卡尔曼滤波的锂离子电池SOC估计[J].电工技术学报,2014,29(1):221-228. 被引量：87
10夏琳琳,潘旭影,王丹,魏洪磊.基于类高斯隶属函数的模糊万能逼近器性能分析[J].沈阳工业大学学报,2014,36(3):316-321. 被引量：8

引证文献4

1黄彬,张伟,覃朝云.基于ANFIS模型的蓄电池放电剩余电量估计[J].自动化与仪表,2018,33(10):87-90. 被引量：2
2王志刚,徐越,梁永春,毛亚琼.基于频繁模式的数据有效性评估研究[J].通信电源技术,2018,35(11):215-217.
3程浩东,韩萌,张妮,李小娟,王乐.基于滑动窗口模型的数据流闭合高效用项集挖掘[J].计算机研究与发展,2021,58(11):2500-2514. 被引量：14
4张晓,成晟,董睿,王珏.基于尺度分析和动态误差修正的油价选择性集成预测方法[J].系统科学与数学,2023,43(10):2451-2466. 被引量：1

二级引证文献17

1单芝慧,韩萌,韩强.动态数据上的高效用模式挖掘综述[J].计算机应用,2022,42(1):94-108. 被引量：5
2李慕航,韩萌,陈志强,武红鑫,张喜龙.基于窗口内投影的闭合高效用模式挖掘[J].太原理工大学学报,2022,53(2):257-265.
3张妮,韩萌,王乐,李小娟,程浩东.基于滑动窗口的含负项高效用模式挖掘方法[J].郑州大学学报（理学版）,2022,54(4):55-63. 被引量：1
4单芝慧,韩萌,韩强.基于滑动窗口的数据流高效用模糊项集挖掘[J].南京师大学报（自然科学版）,2023,46(1):120-129. 被引量：1
5戴美玲.基于改进模糊聚类的网络敏感数据流动态挖掘研究[J].保山学院学报,2023,42(2):44-51. 被引量：1
6单芝慧,韩萌,韩强.增量数据上的闭合定量高效用项集挖掘算法[J].计算机应用,2023,43(7):2049-2056. 被引量：1
7蒋华,李星,王慧娇,韦静海.基于数据索引结构的跨级高效用项集挖掘算法[J].计算机应用,2023,43(7):2200-2208. 被引量：1
8鲁江.基于模糊聚类的网络敏感数据流动态挖掘[J].电子设计工程,2024,32(9):152-155. 被引量：1
9陈鲜展,沈易成,洪飞扬,石绅.煤矿掘进工作面瓦斯浓度预测[J].工矿自动化,2024,50(4):128-132. 被引量：1
10刘淑娟,韩萌,高智慧,穆栋梁,李昂.数据流上的约束跨层级高效用项集挖掘[J].计算机工程与应用,2024,60(13):287-300.

1屠莉,吴懋刚,杨立志.基于时间衰减模型的不确定数据流聚类算法[J].小型微型计算机系统,2014,35(9):2039-2043. 被引量：1
2吴枫,仲妍,吴泉源.基于时间衰减模型的数据流频繁模式挖掘[J].自动化学报,2010,36(5):674-684. 被引量：9
3李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式[J].软件学报,2008,19(10):2585-2596. 被引量：45
4曹毅,贺卫红.基于内容过滤的电子商务推荐系统研究[J].计算机技术与发展,2009,19(6):182-185. 被引量：16
5欧阳.数据挖掘技术在移动通信中的应用[J].广西师范学院学报（自然科学版）,2005,22(4):40-44. 被引量：1
6汤小文,蔡庆生.数据挖掘在电信业中的应用[J].计算机工程,2004,30(6):36-37. 被引量：25
7胡伟健,滕飞,李灵芳,王欢.适应用户兴趣变化的改进型协同过滤算法[J].计算机应用,2016,36(8):2087-2091. 被引量：13
8吴枫,仲妍,金鑫,吴泉源,贾焰,杨树强.滑动窗口内进化数据流任意形状聚类算法[J].小型微型计算机系统,2009,30(5):887-890. 被引量：6
9杨传耀,张成洪,胡运发.一种基于投影和树的闭合频繁模式算法[J].模式识别与人工智能,2008,21(1):6-11.
10韩萌,王志海,原继东.一种基于时间衰减模型的数据流闭合模式挖掘方法[J].计算机学报,2015,38(7):1473-1483. 被引量：16

计算机研究与发展

2015年第12期

浏览历史

内容加载中请稍等...

基于高斯函数的衰减因子设置方法研究被引量：4

参考文献16

二级参考文献50

共引文献79

同被引文献32

引证文献4

二级引证文献17

相关作者

相关机构

相关主题

浏览历史

基于高斯函数的衰减因子设置方法研究 被引量：4

参考文献16

二级参考文献50

共引文献79

同被引文献32

引证文献4

二级引证文献17

相关作者

相关机构

相关主题

浏览历史

基于高斯函数的衰减因子设置方法研究被引量：4