摘要
针对传统搜索引擎难以提取客户端脚本生成信息的问题,结合求职搜索引擎的研发,运用HtmlUnit解析JavaScript动态网页,使用Selenium IDE提取动态元素的XPath,解决传统搜索引擎难以提取客户端动态生成信息的问题。实验结果证明,该技术是行之有效的。
Aiming at the problem that using the script of Web page widely,the traditional search engine is difficult to extract the information,this paper uses HtmlUnit to interpret JavaScript dynamic Web page,and uses Selenium IDE to extract XPath of dynamic element,the seeking-job search engine extracts successfully the information of Web page produced dynamically.Experimental results show that this technology is useful.
出处
《计算机工程》
CAS
CSCD
北大核心
2009年第24期265-267,共3页
Computer Engineering
关键词
动态网页
信息提取
求职
搜索
dynamic Web page
information extraction
seeking-job
search