期刊文献+

代表性博文选择的博客兴趣建模 被引量:2

Blogger Profiling Based on Choosing Representative Entries
下载PDF
导出
摘要 随着博客信息源成指数级的增长,在博客空间中的信息检索,知识发现等任务正面临着巨大的挑战.博客特有的格式为以博客为载体的数据挖掘任务带来不便.本文提出挑选最具代表性的m个博文构成的博文集对博客兴趣建模,挑选的标准保证博文集中博文的重要性和主题多样性,并根据这两个指标来构造博文评估函数,将其转换成实例选择优化问题求解.实验以博客分类为目标,表明通过本文方法预处理后的博客,能够降低时间复杂度,提高分类准确率. With an exponential growth of the bloggers and the amount of information,there are more and more challenging about Information Retrieval and Knowledge Discover in blogosphere,which result in the inconveniences for subsequent blog data mining task.In this paper,we investigate a new problem of profiling a blog by choosing the m most representative entries from the blog.We proposed two principles: importance and diversity.We combine them into a objective function,formulate the entry selection program into a formal optimization task of instance selection.We evaluated the proposed entry selection algorithms by blog classification,our experiment results showed high classification accuracy and low Time complexity.
作者 卢露 朱福喜
出处 《小型微型计算机系统》 CSCD 北大核心 2011年第10期2012-2015,共4页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60773011)资助
关键词 博客兴趣建模 博文选择 有向二分图模型 博客分类 blogger profiling entry selection directed bipartite graph model blog classification
  • 相关文献

参考文献10

  • 1Lin Y R, Sundararn H, Chi Y,et al. Discovery of blog communi- ties based on mutual awareness[ C]. In Proc. of the World Wide Web 2006 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics. Edinburgh, 2006.
  • 2Lin Y R, Sundaram H, Chi Y,et al. Blog community discovery and evolution based on mutual awarencss expansion[ C ]. Proceedings of the IEEE/WIC/ACM International Conference on Web In- telligence ,2007:48-56.
  • 3Kumar R,Novak J, Raghavan P,et al. On the bursty evolution of blog space[ C]. In Proceedings of the 12th International Confer- ence on World Wide Web,2003:568-576.
  • 4Ni Xiao-chuan, Xue Gui-rong, Ling Xiao, et al. Exploring in the weblog space by detecting informative and affective articles [ C ]. The 16th International World Wide Web Conference, Banff, Cana- da,2007 : 8-12.
  • 5Hu M,Sun A,Lim E P. Comments-oriented blog summarization by sentence extraction [ C ]. International Conference on Information and Knowledge Management, Proceedings, 2007:901-904.
  • 6Durant K T,Smith M. Mining sentiment classification from politi- cal weblogs[ C]. In Proc of WebKDD Workshop Inconj with ACM SIGKDD, Philadelphia, PA, August,2006.
  • 7Gruhl D, Guha R, Liben-Nowell D, et al. Information diffusion through blogspaceC C]. In Proceedings of the 13th International Conference on World Wide Web, 2004:491-501.
  • 8Mei Q, Liu C, Su H,et al. A probabilistic approach to spatiotemporal theme pattern mining on weblogs[ C]. Proceedings of the 15th International Conference on World Wide Web, 2006:533-542.
  • 9Bharat K,Henzinger M. Improved algorithms for topic distillation in a hyperlinked environment[ J]. In Research and Development in Information Retrieval, 2003 : 104-111.
  • 10Borodin A, Gareth R, Jeffrey S, et al. Link analysis ranking: algorithm, theory, and experiments [ J ]. ACM Trans. on Intemet Technology, 2005,5( 1 ) :231-297.

同被引文献10

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部