摘要
在关键词权重统计算法的基础上,从语义角度出发,通过关键词对主题的表征强度并考虑关键词在文档中的位置等其他因素来计算关键词权重,为此创造性地提出了基于语义的矩阵词典和权重策略,使过滤更高效且权重取值更合理。实验表明,其对不良网页的过滤准确率也更高。
Based on statistic algorithm, we can take advantage of semantic knowledge to calculate the weight of keywords through the symbolizing intensity of the keywords' coefficient in relation with a given subject. Besides we can also consider the keyword's position in the document at the same time. For the above purpose, this paper has creatively brought forward a new matrix dictionary and weight algorithm based on semantics, which makes filtering more effective and reasonable. Our experiment shows that filtering precision is much higher than that in a traditional way when filtering bad webpage.
出处
《微计算机信息》
北大核心
2007年第27期261-262,109,共3页
Control & Automation
基金
上海市高等学校青年科学基金项目基于数据挖掘的网络安全管理技术研究(03SQ05)
关键词
向量空间模型
网页过滤
权重策略
矩阵词典
Vector space model, Webpage filtering, Weight Algorithm, Matrix dictionary