期刊文献+

基于规则引擎的大规模网页信息抽取平台设计与实现 被引量:1

Design and Implementation of Web Information Extraction Platform Based on Rule Engine
下载PDF
导出
摘要 信息抽取是数据挖掘和知识发掘的重要方法,基于规则自动化或半自动化地从互联网中提取准确有效的数据是知识挖掘的关键。本文构建了一个通用文本信息抽取平台,采用多种信息匹配技术从网络数据源中抽取数据和信息,并采用规则处理方式对网页信息进行智能化抽取。该平台采用EclipseRCP开发,对其功能可进行插件式扩充,在业务逻辑上采用规则引擎。该平台具有界面友好、易于扩展、使用方便等特点,并能够从大规模网页中自动地获取有效的数据和信息。 Information extraction is an important approach of data mining and knowledge discovery,accurate and valid Internet data extraction based upon rule engine as well as automation of the action are the key to knowledge discovery.This paper develops a general text information retrieval platform,using several kinds of information matching techniques to extract data from network data source and adopt processing rules to automatically and intelligently handle information.The platform is implemented using Eclipse RCP;features are implemented as Plug-ins and business logic is embodied as rules.The advantages of the platform are user-friendly,easy expansion,and can automatically retrieve accurate and valid data from large scale web pages.
作者 任宪臻 朱义
出处 《北京城市学院学报》 2010年第5期67-70,共4页 Journal of Beijing City University
关键词 信息抽取 规则引擎 富客户端平台 增量爬取 information extraction rule engine RCP incremental crawling
  • 相关文献

参考文献1

二级参考文献14

  • 1Fayyad U M,Piatetsky-Shapiro G,Smyth P.Adavance in Knowledge Discovery and Data Mining.Cambridge MA: AAAI/MIT Press,1996
  • 2John George H.Enhancements to the data mining process: [Ph.D.Thesis].Stanford University, 1997
  • 3Rao A S.AgentSpeak(L):BDI Agents Speak Out in a Logical Computable Language.In:Proc.Eur.Workshop Model.Auto.Agents Multi-Agent World (MAAMAW-96, 7th), 1996.42~55
  • 4梁南元 郑延斌.一个汉语自动分词模型CWSM及自动分词系统PC—CWSS[J].Communications of COLIPS,1991,1(1):51-55.
  • 5Wang XiaoLong,et al.The Problem of Separating Characters into Fewest Words and Its Algorithms.Chinese Science Bulletin,1989,34 (22): 1924~1928
  • 6Salton G,Wong A,Yang C S.A Vector Space Model for Automatic Indexing.Communication of the ACM 1995,18:613~620
  • 7Mladenic D.Machine Learning on non-homogeneous, distributed text data.Doctoral Dissertation, University of Ljubljana,1998
  • 8McCallum A,Nigam K.A Comparison of Event Models for Naive Bayes Text Classification.Just Research 4616 Henry Street Pittsburgh,PA 15213
  • 9McCallum A,Nigam K.Text Classification by Bootstrapping with Keywords, EM and Shrinkage.Just Research 4616 Henry Street Pittsburgh, PA 15213
  • 10The International Journal of Artificial Intelligence.Neural Networks, and Complex Problem-Solving Technologies.http: //textmining.krdl.org.sg/APIN/TWMcfp.html, 2001

共引文献41

同被引文献7

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部