摘要
设计并实现了一个基于语义分析的主题信息采集系统(SAFWC),提出一种链接价值预测算法(SPageRank)。该算法从语义的角度出发,结合“知网”,通过对扩展元数据进行主题相关性判定来选择、预测与主题相关的URL。实验结果表明,该系统具有较高的采集效率及精度。
The design and implementation of a Semantic Analysis Focused Web Crawler (SAFWC) was introduced. In combination with HowNet, extended metadata semantic relevance algorithm for predicting the relativity between URL and top ie was applied. The result of experiments has shown that SAFC has higher efficiency and accuracy for Web pages relevant to a predefined set of topics.
出处
《计算机应用》
CSCD
北大核心
2007年第2期406-408,共3页
journal of Computer Applications
关键词
主题信息采集
知网
扩展元数据
搜索策略
focused Web crawler
howNet
extended metadata
crawling strategy