期刊文献+

Web权威信息自动提取技术的研究及应用 被引量:3

Study and Application of Automation Extraction Technology from Web Authoritative Information
下载PDF
导出
摘要 WWW为各行各业提供了大量的信息,但如何准确地从这些信息中提取出相关领域的权威信息是目前研究的热点问题之一。该文提出评判网站信息的多因素综合评估模型,该模型对网站的权威值进行合理计算,给出基于表格数据的语法树模型,完成了表格数据的自动提取。通过实例证明,该方法很好地解决了权威信息的准确和自动提取。 Although WWW has provided much information for all fields, how to extract the authoritative information from related fields exactly is becoming a hot topic. This paper provides a process of extracting table data it provides a multiple factors assessment model to judge the Web page. Using the model, the authoritative value of Web page can be gained correctly. It provides a table-based phrase tree method to extract the interesting data automatically. Example proves that this method can extract the authoritative information exactly and automatically.
出处 《计算机工程》 CAS CSCD 北大核心 2008年第13期54-55,66,共3页 Computer Engineering
基金 上海高校优秀青年教师科研专项基金资助项目
关键词 数据提取 WEB数据挖掘 语法树 多因素综合评估 表格 data extraction Web data mining phrasing tree multiple factors assessment table
  • 相关文献

参考文献5

  • 1Chang Chia-Hui. Kayed M, Girgis M R. A Survey of Web Information Extraction Systems[J]. IEEE Trans. on Knowledge and Data Engineering, 2006, 18(10): 1411-1425.
  • 2Laender A H F, Ribeiro-Neto B A, Da S A S, et al. A Brief Survey of Web Data Extraction Tools[J]. SIGMOD Record, 2002, 31(2): 84-93.
  • 3Chirita P, Olmedilla D, Nejdl W, Finding Related PagesUsing the Link Structure of the WWW[C]//Proc. of IEEE/WIC/ACM International Conf. of Web Intelligence. New York, USA: ACM Press, 2004.
  • 4Ingongngam P, Rungsawang A, Topic-centric Algorithm, A Novel Approach to Web Link Analysis[C]//Proc. of the 18th Int'l Conf. on Advanced Information Networking and Applications. [S. l.]:IEEE Press, 2004.
  • 5袁毅.主题特征度在核心网站评价中的作用[J].情报杂志,2005,24(10):18-21. 被引量:3

二级参考文献6

  • 1Search Engineer Watch.http:∥www.searchengineerwatch.com,2005-04-04
  • 2Egghe L.Applications of the Theory of Bradford's Law to the Calculation of Leimkuhler's Law and to the Completion of Bibliographies.Journal of the American Society for Information Science,1990;(41)
  • 3Cui L.Rating Health Web Sites Using the Principles of Citation Analysis:A Biblometric Cpproach.http:∥www.jmir.org/1999/1/e4/.1999/2005-04-04
  • 4Thelwall M.Results from a Web Impact Factor Crawler.Journal of Documentation,2001;(2)
  • 5Bar-Ilan J.Data Collection Methods on the Web for Informetric Purposes-A Review and Analysis.Scientometrics,2001;(1)
  • 6蔡明月.资讯计量学与网路计量学[J].新世纪图书馆,2003(2):8-16. 被引量:20

共引文献2

同被引文献21

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部