Liran Katzir (Yahoo! Israel Labs)
Wednesday, 15.6.2011, 11:30
Online social networks have become very popular in recent years and their number of users is already measured in many hundreds of millions. For various commercial and sociological purposes, an independent estimate of their sizes is important. In this work, algorithms for estimating the number of users in such networks are considered. The proposed schemes are also applicable for estimating the sizes of networks' sub-populations.
The suggested algorithms interact with the social networks via their public APIs only, and rely on no other external information. Due to obvious traffic and privacy concerns, the number of such interactions is severely limited. We therefore focus on minimizing the number of API interactions needed for producing good size estimates. We adopt the abstraction of social networks as undirected graphs and use random node sampling. By counting the number of collisions or non-unique nodes in the sample, we produce a size estimate. Then, we show analytically that the estimate error vanishes with high probability for smaller number of samples than those required by prior-art algorithms. Moreover, although our algorithms are provably correct for any graph, they excel when applied to social network-like graphs. The proposed algorithms were evaluated on synthetic as well real social networks such as Facebook, IMDB, and DBLP. Our experiments corroborated the theoretical results, and demonstrated the effectiveness of the algorithms.