期刊文献+

基于改进决策树算法的Web数据库查询结果自动分类方法 被引量:7

A Categorization Approach Based on Adapted Decision Tree Algorithm for Web Databases Query Results
下载PDF
导出
摘要 为了解决Web数据库多查询结果问题,提出了一种基于改进决策树算法的Web数据库查询结果自动分类方法.该方法在离线阶段分析系统中所有用户的查询历史并聚合语义上相似的查询,根据聚合的查询将原始数据划分成多个元组聚类,每个元组聚类对应一种类型的用户偏好.当查询到来时,基于离线阶段划分的元组聚类,利用改进的决策树算法在查询结果集上自动构建一个带标签的分层分类树,使得用户能够通过检查标签的方式快速选择和定位其所需信息.实验结果表明,提出的分类方法具有较低的搜索代价和较好的分类效果,能够有效地满足不同类型用户的个性化查询需求. To deal with the problem that too many results are returned from a Web database in response to a user query, this paper proposes a novel approach based on adapted decision tree algorithm for automatically categorizing Web database query results. The query history of all users in the system is analyzed offline and then similar queries in semantics are merged into the same cluster. Next, a set of tuple clusters over the original data is generated in accordance to the query clusters, each tuple cluster corresponding to one type of user preferences. When a query is coming, based on the tuple clusters generated in the offline time, a labeled and leveled categorization tree, which can enable the user to easily select and locate the information he/she needs, is constructed by using the adapted decision tree algorithm. Experimental results demonstrate that the categorization approach has lower navigational cost and better categorization effectiveness, and can meet different type user's personalized query needs effectively as well.
出处 《计算机研究与发展》 EI CSCD 北大核心 2012年第12期2656-2670,共15页 Journal of Computer Research and Development
基金 国家青年科学基金项目(61003162) 国家自然科学基金面上项目(61073139) 中国煤炭工业协会科学技术研究指导性计划项目(MTKJ2009-242 MTKJ2010-337 MTKJ2011-335) 辽宁省科技厅计划项目(201104090)
关键词 WEB数据库 用户偏好 元组聚类 C4 5算法 查询结果分类 Web database l user preference tuples clustering C4.5 algorithm query results categorization
  • 相关文献

参考文献17

  • 1Nambiar U,Kambhampati S. Answering imprecise queriesover autonomous Web databases [C] //Proc of the 22nd IntConf on Data Engineering. Piscataway, NJ: IEEE, 2006 :4544.
  • 2孟祥福,严丽,张文博,马宗民.基于文档属性单元松弛的XML近似查询方法[J].计算机研究与发展,2010,47(11):1936-1946. 被引量:5
  • 3Agrawal S,Chaudhuri S,Das G,et al. Automated rankingof database query results [J]. ACM Trans on DatabaseSystems, 2003, 28(2): 140-174.
  • 4Chakrabarti K, Ganti V,Han J, ct al. Ranking objectsbased on relationships [C] //Proc of the 2006 ACM SIGMODInt Conf on Management of Data. New York: ACM, 2006 :371-382.
  • 5Chaudhuri S,Das G, Hristidis V,et al. Probabilisticinformation retrieval approach for ranking of database queryresults [J]. ACM Trans on Database Systems2006 , 31(3):1134-1168.
  • 6Meng X F,Ma Z M,Yan L. Answering approximate queriesover autonomous Web databases [C] //Proc of the 18th IntWorld Wide Web Conf. New York: ACM, 2009: 1021-1030.
  • 7Liu T Y, Yang Y M, Wan H, et al. An experimental studyon large-scale Web categorization [C] //Proc of the 13th IntWorld Wide Web Conf. New York: ACM, 2004 : 1106-1107.
  • 8Zeng H J,He Q C, Chen Z,et al. Learning to cluster Websearch results [C] //Proc of the 27th Annual Int ACM SIGIRConf. New York: ACM, 2004: 210-217.
  • 9Liu T Y, Wan H, Ma W Y. An editor labeling model fortraining set expansion in Web categorization [C] //Proc of the2005 IEEE Int Conf on Web Intelligence. Piscataway T NJ :IEEE, 2005: 165-171.
  • 10Bekkerman R, El-Yaniv R, Tishby N,et al. Distributionalword clusters vs words for text categorization [J]. Journal ofMachine Learning Research, 2003, 3(3) : 1183-1208.

二级参考文献15

  • 1衡星辰,覃征,邵利平,曹玉辉,高洪江.基于两阶段查询重写的XML近似查询算法[J].电子学报,2007,35(7):1271-1278. 被引量:6
  • 2Tim Bray. Extensible Markup Language (XML) 1.0 [CP/OL]. [2008- 11- 26]. http://www, w3. org/TR/REC-xml/.
  • 3Meng X F, Ma Z M, Yan L. Answering approximate queries over autonomous Web databases [C] //Proc of the 18th 1nt Conf on World Wide Web. New York: ACM, 2009: 1021- 1030.
  • 4Kanza Y, Sagiv Y. Flexible queries over semi structured data [C] //Proc of the 20th ACM SIGACT-SIGMOD-S1GART Syrup on Principles of Database Systems. New York: ACM, 2001 : 40-51.
  • 5Polyzotis N, Garofalakis M, Ioannidis Y. Approximate XML query answers [C] //Proc of the 2004 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2004: 263-274.
  • 6Mandreoli F, answering for Proc of the Engineering Martoglia R, Tiberio P. Approximate query a heterogeneous XML document base [C] // Int Conf on Web Information Systems Berlin: Springer, 2004:337-351.
  • 7Spiegel J, Pontikakis E D, Budalakoti S, et al. AQAX: a system for approximate XML query answers [C] //Proc of the 32nd Int Conf on Very Large Data Bases. New York:ACM, 2006:1159-1162.
  • 8Millist W V, I.iu J X, Liu C F. Strong functional dependencies and their application to normal forms in XML [J]. ACMTranson Database Systems, 2004, 29(3):445- 462.
  • 9Beeri C, Dowd M, Fagin R, et al. On the structure of Armstrong relations for functional dependencies [J]. Journal of the ACM, 1984, 31(1), 30-46.
  • 10Gasterland T. Cooperative answering through controlled query relaxation [J]. IEEE Expert: Intelligent Systems and Their Applications, 1997, 12(5):48-59.

共引文献4

同被引文献50

引证文献7

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部