Sequential pattern mining is an important data mining problem with broadapplications. However, it is also a challenging problem since the mining may have to generate orexamine a combinatorially explosive number of int...Sequential pattern mining is an important data mining problem with broadapplications. However, it is also a challenging problem since the mining may have to generate orexamine a combinatorially explosive number of intermediate subsequences. Recent studies havedeveloped two major classes of sequential pattern mining methods: (1) a candidategeneration-and-test approach, represented by (ⅰ) GSP, a horizontal format-based sequential patternmining method, and (ⅱ) SPADE, a vertical format-based method; and (2) a pattern-growth method,represented by PrefixSpan and its further extensions, such as gSpan for mining structured patterns.In this study, we perform a systematic introduction and presentation of the pattern-growthmethodology and study its principles and extensions. We first introduce two interestingpattern-growth algorithms, FreeSpan and PrefixSpan, for efficient sequential pattern mining. Then weintroduce gSpan for mining structured patterns using the same methodology. Their relativeperformance in large databases is presented and analyzed. Several extensions of these methods arealso discussed in the paper, including mining multi-level, multi-dimensional patterns and miningconstraint-based patterns.展开更多
The study on database technologies, or more generally, the technologies ofdata and information management, is an important and active research field. Recently, many excitingresults have been reported. In this fast gro...The study on database technologies, or more generally, the technologies ofdata and information management, is an important and active research field. Recently, many excitingresults have been reported. In this fast growing field, Chinese researchers play more and moreactive roles. Research papers from Chinese scholars, both in China and abroad, appear in prestigiousacademic forums. In this paper, we, nine young Chinese researchers working in the United States,present concise surveys and report our recent progress on the selected fields that we are workingon. Although the paper covers only a small number of topics and the selection of the topics is farfrom balanced, we hope that such an effort would attract more and more researchers, especially thosein China, to enter the frontiers of database research and promote collaborations. For the obviousreason, the authors are listed alphabetically, while the sections are arranged in the order of theauthor list.展开更多
More often than not, a new comer to computer science research such as anundergraduate or graduate student would naturally ask for introductory reading on the culture andphilosophy in computer science. The book 'Ou...More often than not, a new comer to computer science research such as anundergraduate or graduate student would naturally ask for introductory reading on the culture andphilosophy in computer science. The book 'Out of Their Minds: The Lives and Discoveries of 15 GreatComputer Scientists' is a nice book for them.展开更多
文摘Sequential pattern mining is an important data mining problem with broadapplications. However, it is also a challenging problem since the mining may have to generate orexamine a combinatorially explosive number of intermediate subsequences. Recent studies havedeveloped two major classes of sequential pattern mining methods: (1) a candidategeneration-and-test approach, represented by (ⅰ) GSP, a horizontal format-based sequential patternmining method, and (ⅱ) SPADE, a vertical format-based method; and (2) a pattern-growth method,represented by PrefixSpan and its further extensions, such as gSpan for mining structured patterns.In this study, we perform a systematic introduction and presentation of the pattern-growthmethodology and study its principles and extensions. We first introduce two interestingpattern-growth algorithms, FreeSpan and PrefixSpan, for efficient sequential pattern mining. Then weintroduce gSpan for mining structured patterns using the same methodology. Their relativeperformance in large databases is presented and analyzed. Several extensions of these methods arealso discussed in the paper, including mining multi-level, multi-dimensional patterns and miningconstraint-based patterns.
文摘The study on database technologies, or more generally, the technologies ofdata and information management, is an important and active research field. Recently, many excitingresults have been reported. In this fast growing field, Chinese researchers play more and moreactive roles. Research papers from Chinese scholars, both in China and abroad, appear in prestigiousacademic forums. In this paper, we, nine young Chinese researchers working in the United States,present concise surveys and report our recent progress on the selected fields that we are workingon. Although the paper covers only a small number of topics and the selection of the topics is farfrom balanced, we hope that such an effort would attract more and more researchers, especially thosein China, to enter the frontiers of database research and promote collaborations. For the obviousreason, the authors are listed alphabetically, while the sections are arranged in the order of theauthor list.
文摘More often than not, a new comer to computer science research such as anundergraduate or graduate student would naturally ask for introductory reading on the culture andphilosophy in computer science. The book 'Out of Their Minds: The Lives and Discoveries of 15 GreatComputer Scientists' is a nice book for them.