In this paper, we improve the trawling and point out some communities missed by trawling. We use the DBG (Dense Bipartite Graph) to identify a structure of a potential community instead of CBG (Complete Bipartite G...In this paper, we improve the trawling and point out some communities missed by trawling. We use the DBG (Dense Bipartite Graph) to identify a structure of a potential community instead of CBG (Complete Bipartite Graph). Based on DBG, we proposed a new method based on edge removal to extract cores from a web graph. Moreover, we improve the crawler to save only potential pages as fans of a core and save a lot of disk storage space. To evaluate the set of cores whether or not belong to a community, the statistics of term frequency is used. In the paper, the dataset of experiment were crawled under domain ".cn". The result show that the our algorithm works properly and some new cores can be found by our method.展开更多
基金Supported by the Natural Science Fund of Renmin Uni-versity of China (30207108)
文摘In this paper, we improve the trawling and point out some communities missed by trawling. We use the DBG (Dense Bipartite Graph) to identify a structure of a potential community instead of CBG (Complete Bipartite Graph). Based on DBG, we proposed a new method based on edge removal to extract cores from a web graph. Moreover, we improve the crawler to save only potential pages as fans of a core and save a lot of disk storage space. To evaluate the set of cores whether or not belong to a community, the statistics of term frequency is used. In the paper, the dataset of experiment were crawled under domain ".cn". The result show that the our algorithm works properly and some new cores can be found by our method.