摘要
每个网页中都存在许多超链接,很多网页的有用信息都存在于超链接中,如何有效地获取这些超链接成为Web挖掘的一个重要步骤。提出了利用HTMLParser开源工具实现Web页面解析,提取网页的超链接,从而获取有用信息,为下一步开发搜索引擎做准备。
There are many hyperlinks in each Web page, many pages of useful information exist the hyperlink, how to effectively access to these hyperlinks as an important step in Web mining. We propose the use of open source tools to achieve Web page HTMLParser parse, extract web page hyperlink in order to gain useful information for further development of search engine preparation.
出处
《电脑编程技巧与维护》
2010年第2期74-75,共2页
Computer Programming Skills & Maintenance