期刊文献+

基于文本内容的农业网页信息抽取和分类研究 被引量:3

Text Oriented Information Extraction and Classification Technology for Agricultural Webs
原文传递
导出
摘要 通过对农业网页的HTML结构和特征研究,叙述基于文本内容的农业网页信息抽取和分类实验研究过程。实验中利用DOM结构对农业网页信息进行信息抽取和预处理,并根据文本的内容自动计算文本类别属性,得到特征词,通过总结样本文档的特征,对遇到的新文档进行自动分类。实验结果表明,本文信息提取的时间复杂度比较小、精确度高,提高了分类的正确率。 Through the investigation and analysis of their structures and features of HTML in the agricultural websites, the paper described the methods of the information extraction and classification for agricultural webs. The main contents included: information extraction and classification for agricultural webs based on document object model (DOM) structure; automatic calculation of text classification attribute according to its contents; obtaining feature words; and automatic classification of new documents through the summary of sample document features and The experimental results showed that the time consumption of web information extraction was lower while its exactness kept higher, with satisfactory classification rates.
出处 《情报科学》 CSSCI 北大核心 2012年第7期1012-1015,共4页 Information Science
基金 2008年国家社科基金重点项目(08ATQ003)
关键词 文本 农业网页 信息抽取 分类 text agricultural web information extraction classification
  • 相关文献

参考文献10

二级参考文献59

共引文献91

同被引文献29

  • 1郑长松,傅彦,佘莉.基于模板的Web信息自动提取方法[J].计算机应用研究,2009,26(2):570-572. 被引量:10
  • 2尹丽春,姜春林,殷福亮,王友强.基于CSCD和SCI的跨省区科学合作网络可视化分析[J].图书情报工作,2007,51(8):62-64. 被引量:32
  • 3Le Q V,Mikolov T.Distributed representations of sentences and documents. . 2014
  • 4Nemanja Djuric,Vladan Radosavljevic,Mihajlo Grbovic.Hierarchical neural language models for joint representation of streaming documents and their content. International World Wide Web Conference Committee (IW3C2) . 2015
  • 5Keyur J Patel,Ketan J Sarvakar.Web page classification using data mining. International Journal of Advanced Research in Computer and Communication Engineering . 2013
  • 6Patrick Kenekayoro,Kevan Buckley,Mike Thelwall.??Automatic classification of academic web page types(J)Scientometrics . 2014 (2)
  • 7Stephen Robertson.??Understanding inverse document frequency: on theoretical arguments for IDF(J)Journal of Documentation . 2004 (5)
  • 8Seyda Ertekin,C Lee Giles.A comparative study on representation of web pages in automatic text categorization. . 2010
  • 9Revathi N,Anjana Peter,Jagadeesh Kumar.Web text classification using genetic algorithm and a dynamic neural network model. International Journal of Advanced Research in Computer Engineering&Technology . 2013
  • 10Shen Feng,Luo Xiong,Chen Yi.Text classification dimension reduction algorithm for Chinese web page based on deep learning. International Conference on Cyberspace Technology (CCT 2013) . 2013

引证文献3

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部