摘要
提出一种以词性为参考值的文本挖掘算法,能有效挖掘与种子词有关的关联规则。基于Bootstrapping算法思想,既减少了预处理阶段对于词根还原的依赖,能处理日志中出现的中文词汇。增加了对于日志文本上下的理解,提高了关联规则的有效性,并应用与IDS日志挖掘之中,有效改善挖掘效率,为规则库提供关联规则。
We made a text mining algorithm using part of speech (POS) as its argument, which can effectively mine the seed-related rules. Based on the idea of Bootstrapping algorithm, it can reduce the dependence of root-restoring on the pre- processing stage, process Chinese vocabulary appear in the log, increase the understanding of context, enhance the effectiveness of rule-relating. When applied in IDS log mining, it will significantly improve the mining efficiency and provide rule li- brary with rules.
出处
《计算机与数字工程》
2010年第2期90-93,共4页
Computer & Digital Engineering