Technical Report PHD-2007-18

TR#:PHD-2007-18
Class:PHD
Title: Internet Topology: From the Discovery Process to the Real Picture
Authors: Rami Cohen
Supervisors: Danny Raz
PDFPHD-2007-18.pdf
Abstract: The topological structure of the Internet infrastructure is an important and interesting subject that attracted significant research attention in the last few years. Apart from the pure intellectual challenge of understanding a very big, complex, and ever evolving system, knowing the structure of the Internet topology is very important for developing and studying new protocols and algorithms. Starting with the fundamental work of Falostous et al., a considerable amount of research was done recently in this field, improving our knowledge and understanding of the Internet structure. The work in this area composed of collecting information regarding the current (and possibly past) state of the Internet, inferring from the collected data the actual topological and the hierarchy structure, and analyzing this structure in order to understand the inherently important characteristic and evolution of the system. In this research we focus on the Autonomous System (AS) level. First, we question the basic problem: how big is the Internet. In the AS level this means: how many peering relations exist between ASes. Finding this number is hard since there is no direct way to retrieve information from all nodes regarding their direct neighbors, and all our knowledge is based on sampling processes. Thus, it is very difficult to characterize the Internet since it may well be the case that this characterization is a result of the sampling process, and it does not hold for the "real" Internet. In the first part of this thesis we present strong evidence to the fact that a considerable amount (at least 35%) of the links in the AS level are still to be unveiled. Our findings indicate that almost all these missing links are of type peer-peer. We also examine the vertex degree distribution of the AS connectivity map, and show that while the customer-provider subgraph follow the power law, the peer-peer subgraph behave ifferently. Next, we study another interesting problem, the Type of Relationship (ToR) problem. In this problem, given an AS connectivity graph and a set of paths, the goal is to assign relationship of customer-provider or peer-peer to AS links such that pre-defined policy restrictions are not violated. It turns out that the original ToR problem does not reflect the full nature of the commercial relationship between connected ASes. The main problem with the current definition is that hierarchical cycles may be formed, while this is impossible in the real internet. To address this point, we define a new problem called Acyclic Type of Relationship, AToR. We present an efficient algorithm determining whether an acyclic solution without invalid paths exists, and we show that the general decision version of the problem is NP-hard. We also present 2/3-approximation algorithm for a maximum version of the problem, and consider practical aspects of inferring the actual type of relationships between ASes. This includes heuristics to infer also peer-peer and sibling-sibling relationships. We support our approach by experimental and simulation results showing that our algorithms classify the type of relationship between ASes much better than all previous algorithms. Finally, in the last part of the thesis, we study the path inflation phenomenon in which, due to BGP routing policy, routing between ASes is not necessarily done along shortest paths. To overcome this problem and to route packets over shortest paths after all, we consider an intermediate routing scheme. In this routing scheme routing between clients can be done via a set of relay servers located in intermediate nodes. We study this problem as an optimization problem where the objective target is to enable routing through shortest paths with a minimum number of relay servers. We show that this problem is NP-hard, present a O(log(n)) approximation algorithm for it, and show that this is the best approximation one can get. We examine the practical aspects of the scheme by evaluating the gain one can get over real up-to-date data reflecting the current BGP routing policy in the Internet. We show that a relative small number of relay servers are sufficient to enable routing over the shortest paths between almost all considered nodes, which reduces the average path length of these inflated paths by 40%.
CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi/2007/PHD/PHD-2007-18), rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the PHD technical reports of 2007
To the main CS technical reports page

Computer science department, Technion
admin