期刊文献+

基于特定领域的网页文本提取与实现

Extraction and Implementation of Text Based on Specific Areas of the Webpage
下载PDF
导出
摘要 提出了针对小范围的网页文本提取的一种方法.结合对东方财富网的股评网页的HTML文件进行网页文本分析,设计出基于特定领域的网页结构特征的网页文本提取算法.该算法的设计与普通的广义网页提取算法的设计相比,设计简单,针对性较强,提取效率较高,且对股票市场的网页信息的识别与处理起到基础性的作用. This article proposes a method for web text extraction of small scale, and conducts the practice using the HTML doeuments of stock comment pages of www. eastmoney, com. The algorithm designed is based on the web structural characters of a certain specific field, which includes two main steps : the pretreatment of HTML web documents and its acception or rejection. Compared with the ordinary designs of general web page extraction algorithms, this algorithmic design is of simplicity, stronger pertinence and higher extraction efficiency, which plays a fundamental role in the recognition and processing of web information of the stock market.
出处 《中央民族大学学报(自然科学版)》 2013年第3期92-96,共5页 Journal of Minzu University of China(Natural Sciences Edition)
关键词 股票 信息提取 网页信息 stock information extraction web information
  • 相关文献

参考文献2

二级参考文献9

共引文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部