期刊文献+

基于切平面的主题提取算法 被引量:1

New cut-plane based algorithm for topic distillation
下载PDF
导出
摘要 从语义相关性角度分析超链归纳主题搜索(HITS)算法,发现其产生主题漂移的原因在于页面被投影到错误的语义基上,因此引入局部密集因子LDF(Local Density Factor)的概念。为了解决Web内容的重叠性,基于切平面的概念提出了一种新的主题提取算法(CPTDA)。CPTDA不但可以发现用户最感兴趣的主题页面集合,还可以发现与查询相关的其他页面集合。在10个查询上的实验结果表明,与HITS算法相比,CPTDA算法不仅可以减少30%-52%的主题漂移率,而且可以发现与查询相关的多个主题。 To interpret the procedure of hypertext induced topic search based on a semantic relation model,the reason about the topic drift of HITS has been found that Web pages are projected to a wrong latent semantic basis.A new concept LDF(Local Density Factor) has been introduced and based on cut-plane a new topic distillation algorithm CPTDA(Cut-Plane based Topic Distillation Algorithm) has been presented to improve the quality of topic distillation.CPTDA has been applied not only to avoid the topic drift,but also to explore relative topics of user query.The experimental results on 10 queries show that CPTDA reduces topic drift rate by 30% to 52% compared to that of HITS,and discovers several relative topics to queries that have multiple meanings.
作者 李芳 柯熙政
出处 《计算机工程与应用》 CSCD 北大核心 2007年第25期172-174,191,共4页 Computer Engineering and Applications
基金 国家部委预研演示验证项目。
关键词 局部密集因子 切平面 超链归纳主题搜索 主题提取 主题漂移 local density factor cut-plane hypertext induced topic search topic distillation topic drift
  • 相关文献

参考文献1

二级参考文献10

  • 1[1]Han, J., Cai, Y., Cercone, N. Knowledge discovery in databases: an attribute-oriented approach. In: Yuan, Le-yan, ed. Proceedings of the 18th International Conference on Very Large Data Bases. Vancouver: Morgan Kaufmann, 1992. 547~559.
  • 2[2]Srikant, R., Agrawal, R. Mining generalized association rules. In: Umeshwar, D., Gray, P.M.D., Shojiro, N., eds. Proceedings of the 21st International Conference on Very Large Data Bases. Zurich: Morgan Kaufmann, 1995. 407~419.
  • 3[3]Han, J., Fu, Y. Discovery of multiple-level association rules from large database. In: Umeshwar, D., Gray, P.M.D., Shojiro, N., eds. Proceedings of the 21st International Conference on Very Large Data Bases. Zurich: Morgan Kaufmann, 1995. 420~431.
  • 4[4]Oren, Z., Oren, E., Omid, M., et al. Fast and intuitive clustering of web document. In: Heckerman, D., Mannila, H., Pregibon, D., eds. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD'97). Newport Beach, CA: AAAI Press, 1997. 287~290.
  • 5[5]Cheung, D.W., Kao, B., Lee, J. W. Discovering user access patterns on the world-wide-web. In: Lu Hong-jun, Motoda, H., Liu, Huan, eds. Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining. Singapore: World Scientific, 1997. 303~316.
  • 6[6]Salton, G., Buckley, C. Term-Weighting approaches in automatic text retrieval. Information Processing and Management, 1988,24(5):513~523.
  • 7[7]Oren, Z. Clustering web documents: a phrase-based method for grouping search engine results [Ph.D. Thesis]. Seattle, WA: University of Washington, 1999.
  • 8[8]Bezedek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum Press, 1981.
  • 9[9]Ruspini, E.H. A new approach to clustering. Information Control, 1969,19(15):22~32.
  • 10[10]Luo, San-ding. Efficient intelligent search system for web information mining (EIS). In: Goscinski, A., Horace, H.S.I, Jia, Wei-jia, et al, eds. Proceedings of the 4th International Conference on Algorithms and Architecture for Parallel Processing (ICA3PP 2000). Hong Kong: World Scientific Publishing, 2000. 716~717.

共引文献11

同被引文献9

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部