摘要
Web逻辑域挖掘是当前Web挖掘领域的研究热点之一,它强调从网站设计者的角度来挖掘站点中有逻辑联系的网页,以形成一个逻辑域,而不是单纯的文本聚类或超链排序。随着应用的不同,站点逻辑域的界定也有所不同。在综合分析了几种具有代表性的站点逻辑域及其挖掘方法后,提出了基于网页分块聚类的Web站点逻辑域挖掘模型和挖掘算法。实验结果表明,该算法具有很好的稳定性和适应性,其精度不受站点规模、语言、镜像等因素的影响,召回率则会随着取回网页数目的增加而增加。
Web logical domain mining is a pioneer brunch in the filed of Web mining. It emphasizes to find those Web pages, which in the view of Web site master, have intra logic relationship and is not purely text cluster or hypedink ranking. The definitions of Web site logical domain differ from different applications. After summarizing several kinds of Web logical domain models and the mining algorithm, this paper proposes a model and an algorithm. The experimental results show that the algorithm is stable and adjustable. Its precision is hardly effected by the scale of Web site, language and mirror sites. And its recall will improve as the quantity of Web pages obtained increases.
出处
《计算机工程》
CAS
CSCD
北大核心
2007年第4期52-54,57,共4页
Computer Engineering