In the recent research of network sampling, some sampling concepts are misunderstood, and the variance of subnets is not taken into account. We propose the correct definition of the sample and sampling rate in network...In the recent research of network sampling, some sampling concepts are misunderstood, and the variance of subnets is not taken into account. We propose the correct definition of the sample and sampling rate in network sampling, as well as the formula for calculating the variance of subnets. Then, three commonly used sampling strategies are applied to databases of the connecting nearest-neighbor(CNN) model, random network and small-world network to explore the variance in network sampling. As proved by the results, snowball sampling obtains the most variance of subnets, but does well in capturing the network structure. The variance of networks sampled by the hub and random strategy are much smaller. The hub strategy performs well in reflecting the property of the whole network, while random sampling obtains more accurate results in evaluating clustering coefficient.展开更多
基金supported by the Basic Research Fund of Beijing Institute of Technology(20120642008)
文摘In the recent research of network sampling, some sampling concepts are misunderstood, and the variance of subnets is not taken into account. We propose the correct definition of the sample and sampling rate in network sampling, as well as the formula for calculating the variance of subnets. Then, three commonly used sampling strategies are applied to databases of the connecting nearest-neighbor(CNN) model, random network and small-world network to explore the variance in network sampling. As proved by the results, snowball sampling obtains the most variance of subnets, but does well in capturing the network structure. The variance of networks sampled by the hub and random strategy are much smaller. The hub strategy performs well in reflecting the property of the whole network, while random sampling obtains more accurate results in evaluating clustering coefficient.