期刊文献+

基于用户群的智能主题爬虫 被引量:3

Intelligent Topic Crawler Based on Users Community
下载PDF
导出
摘要 提出一个基于用户群的智能主题爬虫系统CITC。它首先对用户群日志进行挖掘,得到相应的知识库。在知识库的指导下,CITC采用多重选择策略,对网页进行选择性爬取。实验结果表明,此系统能够基于用户群兴趣有效地抓取目的网页。 A Community-Specific Intelligent Topic Crawler is introduced. This system mines the Web logs of community, which results in corresponding knowledge base. With the guidance of the knowledge base and multi-layer selective strategy,CITC fetch relevant pages selectively. The experiment shows that this system can fetch relevant pages efficiently based on the interest of user community.
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2007年第2期230-233,共4页 Journal of Guangxi Normal University:Natural Science Edition
基金 甘肃省自然科学基金资助项目(3ZS051-A25-035)
关键词 用户群 网页对偶筛选 知识库 主题爬虫 相关度 users community page dual filter knowledge base topic crawler relevancy
  • 相关文献

参考文献5

  • 1CHAKRABARTI S,van den BERG M,DOM B.Focused crawling:a new approach to topic-specific Web resource discovery[J].Computer Networks,1999,31(11/16):1623-1640.
  • 2张兵.一种网络日志挖掘的高效算法[J].广西师范大学学报(自然科学版),2006,24(1):26-29. 被引量:2
  • 3PELLEG D,MOORE A.X-means:extending K-means with efficient estimation of the number of clusters[C]//Proceedings of the 17th International Conference on Machine Learning.San Francisco:Morgan Kaufmann Publishers,2000:727-734.
  • 4LAWRENCE S,GILES C L.Searching the World Wide Web[J].Science,1998,280(5360):98-100.
  • 5HERSOVICI M,HEYDON A,MITZENMACHER M,et al.The shark-search algorithm--an application:tailored Web site mapping[J].Networks and ISDN Systems,1998,30(17):317-326.

二级参考文献10

  • 1苏毅娟,严小卫.一种改进的频繁集挖掘方法[J].广西师范大学学报(自然科学版),2001,19(3):22-26. 被引量:10
  • 2张兵,聂永红,林士敏.NPSP:一种高效的序列模式增量挖掘算法[J].广西师范大学学报(自然科学版),2004,22(4):22-26. 被引量:4
  • 3NEUSS C,VROMAS J.Applications CGI en Perl pour les Webmasters[M].Paris:International Thomson Publishing France,1996.
  • 4SRIKANT R,AGRAWAL R.Mining Generalized Association Rules[J].Future Generation Computer Systems,1997,13(2/3):161-180.
  • 5COOLEY R,MOBASHER B,SRIVASTAVA J.Web mining:information and pattern discovery on the world wide Web[C]//Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence.Los Alamitos,Calif.:IEEE Computer Society,1997:558-567.
  • 6PERKOWITZ M,ETZIONI O.Adaptive web sites:Conceptual cluster mining[C]//Proceedings of the 16th Int.Joint Conf.on Artificial Intelligence.San Francisco,Calif.:Morgan Kaufmann Publishers,1999:264-269.
  • 7TAUSCHER L,GREENBERG S.How people revisit web pages:Empirical findings and implications for the design of history systems[J].International Journal of Human-Computer Studies,1997,47:97-137.
  • 8ZAIANE O R,XIN Man,HAN Jia-wei.Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs[C]//Proc.of the 5th International Forum on Research and Technology Advances in Digital Libraries.Los Alamitos,Calif.:IEEE Computer Society,1998:19-29.
  • 9TAMAKRISHNAN Srikant,RAKESH Agrawal.Mining sequential patterns:generalizations and performance improvements[C]//Proceedings of the 5th International Conference on Extending Database Technology.Berlin:Springer-Verlag,1996:3-17.
  • 10刘美玲,徐章艳,卢景丽,区玉明,袁鼎荣,吴信东.利用项集有序特性改进Apriori算法[J].广西师范大学学报(自然科学版),2004,22(1):33-37. 被引量:11

共引文献1

同被引文献33

引证文献3

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部