摘要
以Web网页为例,提出了利用关联规则挖掘技术改进关键词表的建立。通过对网页进行统一编码、过滤HTML标识、中文分词、去掉无用词这一系列数据处理后,使用Apriori算法,挖掘出网页中和关键词达到一定支持度和置信度的关联词,并添加到关键词表中,从而改进了关键词表的建立方式。
To take web pages for an example,a method for using association rules to improve antistop list is brought forward by this paper.Through uniting coding,filtrating HTML mark,chinese word segmentation,filtrating stop words,web pages are transformed for standard format.To find associated words which have certain support and confidence to keywords by using Apriori algorithm,and then adding associated words to antistop list.So,antistop list is improved.
出处
《计算机安全》
2011年第4期69-71,共3页
Network & Computer Security
关键词
关键词搜索
内容审计系统
数据挖掘
关联规则
Keyword Searching
Content Auditing System
Data Mining
Association Rules