期刊文献+

连续时间区间内的频繁词序列挖掘算法

Frequent Word Sequence Mining Algorithm in Continuous Time Interval
下载PDF
导出
摘要 查询文本中频繁出现的短语可快速掌握文本内容,然而传统频繁词序列挖掘算法面向挖掘任务时的时间复杂度较高,无法满足频繁更换查询条件及快速获得反馈的查询需求。利用基于频率树的快速频繁词序列挖掘算法(TS;ining),在保持后缀树线性构造时间的情况下实现文本集合中频繁词序列的查询,并采用树型索引结构避免多次扫描文本集合,降低算法时间复杂度。针对连续时间区间内的频繁词序列查询问题,提出改进的剪枝挖掘算法(TS;runing),通过减少频率树的扫描范围进一步提高挖掘效率。实验结果表明,TS;ining与TS;runing算法的运行时间相比经典Apriori挖掘算法约减少了2个数量级,具有更高的频繁词序列挖掘效率。 Mining frequent phrases in a text assists in the quick understanding of the content,but traditional algorithms for frequent word sequence mining usually have high time complexity in mining tasks,and fail to deal with frequently changing query criteria or respond to queries in real time.This paper gives an algorithm for rapidly mining frequent word sequences based on frequency tree,called TS_Mining.The algorithm can implement the query of the frequent word sequence in a text set while maintaining the linear construction time of the suffix tree.Through the tree index structure,repeated scanning of the text set can be avoided to reduce the algorithm complexity.In addition,in order to increase the efficiency of frequent word sequence mining in a continuous time interval,an improved pruning mining algorithm,called TS_Pruning is designed,which reduces the scanning range of the frequency tree.The experimental results show that compared with the Apriori mining algorithm,the TS_Mining algorithm and TS_Pruning algorithm reduces the running time by about 2 orders of magnitude,and displays a higher efficiency in frequent word sequence mining.
作者 王璐 刘晓清 何震瀛 WANG Lu;LIU Xiaoqing;HE Zhenying(School of Software,Fudan University,Shanghai 200441,China;School of Computer Science,Fudan University,Shanghai 200433,China)
出处 《计算机工程》 CAS CSCD 北大核心 2022年第2期79-85,91,共8页 Computer Engineering
基金 国家自然科学基金(61732004)。
关键词 频繁词序列 后缀树 数据挖掘 频繁项集 热点话题检测 frequent word sequence suffix tree data mining frequent itemset hot topic detection
  • 相关文献

参考文献7

二级参考文献42

共引文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部