期刊文献+

Web信息抽取及知识表示系统的研究与实现 被引量:2

Research and Realization of a Web Information Extraction and Knowledge Presentation System
下载PDF
导出
摘要 研究了从数据密集型Web页面中自动提取结构化数据并形成知识表示系统的问题。基于知识数据库实现动态页面获取,进行预处理后转换为XML文档,采用基于PAT-array的模式发现算法自动发现重复模式,结合基于本体的关键词库自动识别页面数据显示结构模型,利用XML的对象-关系映射技术将数据存入知识数据库,由此实现Web数据自动抽取。同时,利用知识数据库已有知识从互联网抽取新知识,达到知识数据库的自扩展。以交通信息自动抽取及混合交通出行方案生成与表示系统进行的实验表明该系统具有高抽取准确率和良好的适应性。 The Web Information Extraction and Knowledge Presentation System is proposed to extract information from data intensive web pages.It downloads dynamic web pages, based on a knowledge database, changes them to XML documents after preprocessing, finds repeated patterns from them, by using a PAT-array based Pattern Discovery Algorithm, recognizes their data display structure models, automatically based on the repeated patterns and an ontology-based keyword library, and then extracts the data and stores them in the knowledge database with the object-relational mapping technology of XML.Through these steps, web data is extracted automatically, and the knowledge database is also expanded automatically.Experiments on the traffic information auto-extraction and mixed traffic travel schemes auto-creation system showed that the system has high precision and is adaptive to web pages in different domains with different structures.
出处 《计算机系统应用》 2010年第9期1-4,9,共5页 Computer Systems & Applications
基金 安徽省教育厅自然科学基金(2005KJ004ZD)
关键词 WEB信息提取 知识表示 数据密集型Web页面 基于本体的关键词库 web information extraction knowledge presentation data intensive web pages ontology-based keyword library
  • 相关文献

参考文献10

  • 1Ana-Maria P. Information extraction from unstructured Web text [Ph.D Dissertation]. University of Washington, 2007.
  • 2李海健,王晓丰.Web信息抽取的现状及未来展望[J].廊坊师范学院学报(自然科学版),2009,9(3):39-40. 被引量:4
  • 3Wong TL, Wai L. An unsupervised method for joint information extraction and feature mining across different Web sites. Data and Knowledge Engineering, 2009,68(1): 107 - 125.
  • 4韩存鸽,燕敏.Web信息抽取方法研究[J].计算机系统应用,2009,18(7):172-174. 被引量:6
  • 5Chang CH, Kayed M, Girgis MR, et al. A Survey of Web Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(10): 1411 - 1428.
  • 6Gatterbauer W, Bohunsky P, Herzog P, Krupl B, Pollak B.Towards domain-independent information extraction from web tables. Proc. of the 16th international conference on World Wide Web, May.2007.71 - 80.
  • 7Crescenzi V, Mecca G Automatic information extraction from large websites. Journal of the ACM, 2004,51(5):731 - 779.
  • 8邓尚民,孙玉伟.信息抽取系统的研究现状[J].现代图书情报技术,2006(3):55-58. 被引量:23
  • 9林建敏,谢康林.基于PAT-array和模糊聚类的文本聚类方法[J].计算机工程,2004,30(12):126-127. 被引量:6
  • 10Jtidy说明.[2008-11-21].http://jtidy.sourceforge.net/.

二级参考文献33

  • 1张清军,朱才连.基于主动学习的Web页面信息抽取[J].情报学报,2004,23(6):667-671. 被引量:5
  • 2W3 C:TidySpecification.http://www.w3 .org/People/Raggett/tidy/.
  • 3HorstmannCS.Java2核心技术,第5版.北京:机械工业出版社,2001:40-50.
  • 4Zhang Jian, Gao Jianfeng, Zhou Ming. Extraction of Chinese Compound Words-An Experimental Study on a Very Large Corpus.http://research.microsoft.com/china/papers/Extraction Chinese Compound Words.pdf
  • 5Manber U,Myers E.Suffix Arrays:A New Method for On-line String Searches. In Proceedings of the First Annual ACM_SIAM Symposium on Discrete Algorithms, 1990:319-327
  • 6Zamir O, Etzioni O.Web Document Clustering: A Feasibility Demonstration. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval,Melbourne, Australia, 1998
  • 7Chang C H,Lui S C.IEPAD:lnformation Extraction Based on Pattem Discovery. In Proceedings of the Tenth International Conference on World Wide Web, Hong Kong, 2001-05
  • 8Gaston T,New Indices for Text Pat Trees and Pat Arrays. In Information Retrieval Data Structures & Algorithms, Frakes and Baeza-Yates(eds.), Prentice hall, 1992:66-82
  • 9Ralph Grishman. Information extraction:Techniques and Challenges.In Maria Teresa Pazienza, editor, Information Extraction. Springer -Vedag, Lecture Nots in Artificial Intelligence, Room, 1997
  • 10Ralph Grishman and Beth Sundheim. Message Understanding Confererce -6: A Brief History. In Proceedings of 16th International Computational Linguistics. 1996

共引文献35

同被引文献7

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部