期刊文献+

基于决策树和链接相似的Deep Web查询接口判定

Deep Web query interface identification based on decision tree and link-similar
下载PDF
导出
摘要 针对现有Deep Web查询接口判定方法误判较多、无法有效区分搜索引擎类接口的不足,提出了基于决策树和链接相似的Deep Web查询接口判定方法。该方法利用信息增益率选取重要属性,并构建决策树对接口表单进行预判定,识别特征较为明显的接口;然后利用基于链接相似的判定方法对未识别出的接口进行二次判定,准确识别真正查询接口,排除搜索引擎类接口。结果表明,该方法能有效区分搜索引擎类接口,提高了分类的准确率和查全率。 In order to solve the problems existed in the traditional method that Deep Web query interfaces are more false positives and search engine class interface can not be effectively distinguished,this paper proposed a Deep Web query interface identification method based on decision tree and link-similar.This method used attribute information gain ratio as selection level,built a decision tree to pre-determine the form of the interfaces to identify the most interfaces which had some distinct features,and then used a new method based on link-similar to identify these unidentified again,distinguishing between Deep Web query interface and the interface of search engines.The result of experiment shows that it can enhance the accuracy and proves that it is better than the traditional methods.
出处 《计算机应用研究》 CSCD 北大核心 2011年第11期4086-4088,4099,共4页 Application Research of Computers
基金 江苏省高校自然科学重大基金资助项目(08KJA520001) 国家自然科学基金资助项目(70971067)
关键词 DEEP WEB 查询接口 决策树 链接相似 Deep Web query interface decision tree link-similar
  • 相关文献

参考文献8

  • 1GHANEM T M, AREF W G. Databases deepen the Web[ J]. Computer,2004,37 ( 1 ) :116-117.
  • 2刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量:136
  • 3赵朋朋,崔志明,高岭,仲华.关于中国Deep Web的规模、分布和结构[J].小型微型计算机系统,2007,28(10):1799-1802. 被引量:13
  • 4COPE J, CRASWELL N, HAWKING D. Automated discovery of search interfaces on the Web[ C ]//Proc of the 14th Australasian Database Conference in Research and Practice in Information Technology. 2003 : 181-189.
  • 5LAGE J P, DA SILVAA S, GOLGHER P B,et al. Automatic genera- tion of agents for collecting hidden Web pages for data extraction [ J ]. Data & Knowledge Engineering, 2004,49(2) :177-196.
  • 6BARBOSA L, FREIRE J. Combining classifiers to identify online databases[ C]//Proc of the 16th International Conference on World Wide Web. New York: ACM Press, 2007:431-440.
  • 7方巍,黄黎,崔志明.基于最大熵分类器的Deep Web查询接口自动判定[J].计算机工程与应用,2008,44(21):133-137. 被引量:1
  • 8郑淑丽,韩江洪,程文娟,吴永忠.Deep Web查询接口自动识别方法[J].郑州大学学报(理学版),2009,41(1):56-58. 被引量:1

二级参考文献82

  • 1李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:95
  • 2刘挺,车万翔,李生.基于最大熵分类器的语义角色标注[J].软件学报,2007,18(3):565-573. 被引量:73
  • 3Ghanem T M,Aref W G.Databases deepen the Web[J].IEEE Computer, 2004,73 ( 1 ) : 116-117.
  • 4Bergraan M K.The deep Web:surfacing hidden value[R].BrightPlanet technical Report,2001.
  • 5Sherman C,Price G.The invisible Web:uncovering information sources search engines can't see[C].2003.
  • 6Lewis D D.Naive (Bayes) at forty:the independence assumption in information retrieval[C]//The 10th European Conference on Machine Learning.New York : Springer, 1998:4-15.
  • 7Yang Y,Lin X.A re-examination of text categorization methods[C]// The 22nd Annual International ACM SIGIR Conference on Research and Development in the Information Retrieval.New York: ACM Press, 1999.
  • 8Quinlal R.C4.5 :Programs for Machine Learning[M].San Mateo, CA: Morgan Kaufmann Publishers, 1993.
  • 9Schapire R E,Singer Y.Improved boosting algorithms using confidence-rated predications[C]//The 11th Annual Conference on Computational Learning Theory.Madison:ACM Press,1998:80-91.
  • 10Joachims T.Text categorization with support vector machines:learning with many relevant features[C]//The 10th European Conference on Machine Learning.New York:Springer, 1998 :137-142.

共引文献142

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部