摘要
传统的文本信息过滤算法仅能实现结构对应层次上的判断,无法体现文本的语义.本文介绍一个能对Web页文本信息进行语义过滤的系统,通过分词、生成语义框架和计算框架间相似度,对相似度大于阈值的Web页进行过滤.试验表明,语义过滤能较好地甄别文本的不同观点,准确度较单纯关键字过滤有明显提高.
The traditional algorithms for text information filtering can't recognize the semantic of text because they only implement the judgement on the level of structure matching. This paper introduced a Semantic-based Content Filtering system for Web page, by some key steps such as Chinese Word Segmentation, the semantic-based framework forming, calculating the similar degree of two semantic-based frameworks etc. , the value, which can represent the similar degree of two semantic-based frameworks, and can be calculated and decide to filtrate or not. The results of experiments proved that semantic filtering can discriminate the different standpoints. The precision is improved in evidence compared with content filtering only by key word.
出处
《周口师范学院学报》
CAS
2007年第2期103-106,共4页
Journal of Zhoukou Normal University
基金
河南省教育厅自然科学研究计划项目(No.2006520022)
周口师范学院青年基金项目(No.ZKNUQN200615)
关键词
语义
过滤
分词
语义框架
semantic
filtering
Chinese word segmentation
semantic-based framework