A UNIFIED EXTENDING METHOD FOR CONTENT-IGNORANT WEB PAGE CLUSTERING

A UNIFIED EXTENDING METHOD FOR CONTENT-IGNORANT WEB PAGE CLUSTERING

下载PDF

导出

摘要 The content-ignorant clustering method takes advantages in time complexity and space complexity than the content based methods.In this paper,the authors introduce a unified expanding method for content-ignorant web page clustering by mining the "click-through" log,which tries to solve the problem that the "click-through" log is sparse.The relationship between two nodes which have been expanded is also defined and optimized.Analysis and experiment show that the performance of the new method has improved,by the comparison with the standard content-ignorant method.The new method can also work without iterative clustering. The content-ignorant clustering method takes advantages in time complexity and space complexity than the content based methods. In this paper, the authors introduce a unified expanding method for content-ignorant web page clustering by mining the ＂click-through＂ log, which tries to solve the problem that the ＂click-through＂ log is sparse. The relationship between two nodes which have been expanded is also defined and optimized. Analysis and experiment show that the performance of the new method has improved, by the comparison with the standard content-ignorant method. The new method can also work without iterative clustering.

作者 Shi Lin Chen Chen

机构地区 School of Aerospace Science and Engineering School of Electronics Engineering and Computer Science

出处《Journal of Electronics(China)》 2010年第1期105-112,共8页 电子科学学刊（英文版）

关键词 Web data mining CLUSTERING Content-ignorant clustering Web data mining Clustering Content-ignorant clustering

分类号 TP391.3 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献6

1Kamal Nigam,Andrew Kachites Mccallum,Sebastian Thrun,Tom Mitchell.Text Classification from Labeled and Unlabeled Documents using EM[J].Machine Learning (-).2000(2-3)
2S. C. Wing,T. L. Wai,L. L. Dik.Clustering search engine query log containing noisy clickthroughs[].Proceedings of the International Symposium on Applications and the Internet (SAINT’ ).2004
3Filip Radlinski,Madhu Kurup,Thorsten Joachims.How does clickthrough data reflect retrieval quality[].Proceeding of the th ACM Conference on Information and Knowledge Management.2008
4C. Hang,R. W. Ji,Y. N. Jian,et al.Probabilistic query expansion using query logs[].Proceeding of the th World Wide Web Conference (WWW’ ).2002
5Doug Beeferman,Adam Berger.Agglomerative Clustering of a Search Engine QueryLog[].Proceedings of the sixth ACM SICKDD international conference on knowledge discovery and data mining.2000
6Hanhua Chen,Hai Jin,Jiliang Wang,Lei Chen.Efficient multi-keyword search over p2p web[].Proceeding of the th international conference on World Wide Web.2008

1吴麒,陈兴蜀,朱锴,王春晖.Relevance-based content extraction of HTML documents[J].Journal of Central South University,2012,19(7):1921-1926.

Journal of Electronics(China)

2010年第1期

浏览历史

内容加载中请稍等...

A UNIFIED EXTENDING METHOD FOR CONTENT-IGNORANT WEB PAGE CLUSTERING

参考文献6

相关作者

相关机构

相关主题

浏览历史