摘要
大规模文本数据挖掘是大数据分析的重要分支,也是近年来的一个研究热点。研究了多粒度时间文本数据周期模式挖掘算法,首先提出了时间粒度转换、多粒度时间间隔等概念,然后建立了文本数据的周期模型,给出了一个多粒度时间文本下的周期模式挖掘算法,最后对大量病毒文本文献数据进行了实验,表明了提出的算法可以挖掘一些有效的周期模式,讨论了周期宽松度对支持度和置信度的影响。该研究为大文本数据分析提供了一种新的方法。
The large-scale text data mining is an important branch of the big data analysis and is also a hot research topic in recent years. This paper studied algorithm of the textual periodicity data mining with multi-granularity time. First, the concepts of granularity conversion and multi-granularity time interval were presented. Then, a periodic pattern of textual data and an algorithm of the periodic pattern to textual data with multi-granularity time were proposed. Finally, by testing virus textual data, the proposed algorithm shows that some efficient periodic patterns are obtained. The influence of the periodic range on the degree of support and confidence were discussed. This paper provided a new method for the big text data analysis.
出处
《计算机科学》
CSCD
北大核心
2013年第11A期251-254,262,共5页
Computer Science
关键词
多粒度时间
文本数据
数据挖掘
周期模式
Multiple granularity, Textual data,Data mining, Periodic pattern