摘要
本文探讨基于关联规则挖掘的中文网页体裁模式发现问题。通过链表结构,将文档集转换为适用于关联规则挖掘的事务数据库,保证了事务数据库出现的词条项按照在文本中出现的顺序排列,实现了Apriori关联规则算法。实验结果表明,这对于某些类别的体裁模式发现有比较好的效果。
This paper gives a research on pattern discovery of Chinese web page genre based on association rules. Using a linked list structure, the set of documents will be converted to a transaction database which is applied to mining association rules, and ensure the word items of the transaction database are arranged by the order of the text. An apriori association rules mining algorithm is implemented. The results of experiment show that it is more efficient for some genre pattern discovery.
出处
《计算机工程与科学》
CSCD
2008年第12期134-136,141,共4页
Computer Engineering & Science
基金
福建省重点项目(2008I0021)
关键词
文本分类
体裁模式
关联规则
text classification
genre pattern
association rule