Probabilistic methods and algorithms

Given by Eldar Fischer. Office hours: Wednesday, 14:30-16:30, Room 625.

06.10.2002: Due to Miluim I will only be able to begin the course on November 10. From then the classes will last three hours (Sundays 14:30-17:30) instead of two, until we catch up with lost time. I am sorry for the inconvenience, and hope that in the mean time this page will allow you a glimpse into the course. From the beginning of the semester until November 6 I will be reachable only through email to answer your questions.

11.11.2002: I intend to put up my talk excerpts as this course moves along. These will probably be in one postscript file that will grow through the course (and usually updated about a week after each lecture). Note however that these are excerpts, it is still recommended that you take your own notes during the lectures.

14.11.2002: With regards to room change (requested by some of you), unfortunately there are no alternate rooms for this course in these hours. For the time being we will have to make do. On the good news front, I am putting my lecture notes on-line. Scroll below to see them.

24.11.2002: Note that there was an error in exercise 1, question 4. The number of edges in the hypergraph there should be 2k-1, not 2k. The version linked to below is the corrected one. Other announcements: (a) The links to the postscript files now work also when the course homepage is accessed from the course list. (b) The next two classes will be in the original 2-hour format (and not the 3-hour one). I may give one or two (but not more) 3-hour classes after that, depending on our progress.

03.12.2002: The submission deadline for Exercise 1 has been extended by 24 hours. This exercise is now due on Monday, December 9, 18:00 (you can bring it to me personally or put it in my mailbox). Also, on December 15 we will have again a three-hour class. The December 22 one may be three-hour as well, and all the rest will be two-hour classes.

15.12.2002: Next class (December 22) will indeed be the last three-hour class for this semester. As for Exercise 2 (due January 5), I have put it on site. The solutions for Exercise 1 and the lecture notes about the martingales will follow in about a week.

29.12.2002: The deadline for Exercise 2 has been extended to Wednesday, January 8, 18:00. I have also sent an email containing this and other announcements.

12.02.2003: The course material on this webpage is now complete, with the placement of the solutions to the last exercise. I hope you have enjoyed the course.

Introduction

Combinatorics and computer science have more to do with probabilistic theory then meets the eye.

In the realm of combinatorics, probabilistic methods were first introduced as a result of Erdös's pioneering work in the field, starting with his lower bounds on the function resulting from Ramsey's theorem. In computer science theory, apart from the numerous applications that combinatorial theory has in this field, probabilistic theory plays also an explicit role. There are many instances in which probabilistic algorithms are considered, and other instances in which the algorithms are analyzed for a random input rather than for the worst case. It is worthy of note that there are still major open questions with regards to the strengths probabilistic algorithms may have over deterministic ones.

The primary goal of this course is to instruct in some of of the probabilistic methods, used in the analysis of algorithms and in combinatorial proofs. This includes the presentation of the methods themselves (in most cases with an accompanying proof of their correctness), and some of their applications. Basic knowledge of probabilistic theory is assumed, as well as fair knowledge of algorithms, and some knowledge in combinatorial theory and mainly graph theory.

This course is mathematical as much as it is a course in computer science. This is not unexpected, as the boarders between combinatorial theory and computer science are sometimes blurred, and the subject of this course is one of these instances.

Homework and grading

During the course I will hand assignment sheets, each with about four or five questions; considering the late schedule there will probably be three assignments, and you should complete all of them. For every assignment you will have three weeks to complete as much of it as you can; there is a good chance that the last assignment will be given near the end of the course, in which case these three weeks will extend after the last lecture. Each of you have to submit your own answers and understand them, also in the case that you have consulted others about the harder questions.

I reserve the right to call some of you up and ask you about your answers, although I expect to do this rarely if at all. The scenarios in which I may do this are the following:

There is no test; the grade will be based solely on the performance in answering the assignment questions (all questions and assignments count, not just the submitted ones). I will be keeping a record of the performance for every individual question, so the final formula for the grade will not necessarily be the average grade of the assignments (in fact, it is probable that the grades will be somewhat higher).

Course material

Lecture excerpts

I am putting my lecture notes for this course on-line. Here they are (Final update: 09.02.2002). Please note the following:

Exercises and solutions

Some highlights of the course

The following are not all of the subjects of the course, just a list of some of its topics that I believe gives an idea as to what the course is about.

The basic method

To prove that a certain structure exists, one can prove instead that a random structure will with some probability be the required one. In its simplest form the probabilistic method is just a shorthand for counting. If it can be proven that the probability that a random structure is the required one is not only positive, but also bounded from below, this may result in a probabilistic algorithm for obtaining the structure as well.

The linearity of expectation

For every instance of 3SAT with n clauses, there exists an assignment to its variables for which at least 7n/8 of the clauses are satisfied. Why? Because the expected number of satisfied clauses when choosing the assignment uniformly at random is 7n/8. Since the expectation is linear (the expectation of a sum of variables is the sum of their expectations), its calculation is easy. The expectation is probably the most used parameter in probabilistic proofs in computer science.

The second moment

This is a fancy name to proving bounds using the mean deviation parameter of a random variable. It is useful when one needs to prove that a random variable gets with high probability a value that is close to its expectation, using the Chebyschev inequality.

Large deviation inequalities

In some settings, such as when summing many independent coin-flips, the probability that a random variable deviates substantially from its expectation is exponentially low. Sometimes such bounds are what is required for completing a proof; this relatively simple method is also one of the most used ones.

The Local Lemma

This lemma is used more in mathematical proofs than in computer science. Still, this unorthodox and beautiful use of probabilistic theory merits the attention of the course.

Yao's method

This is method for proving lower bounds on probabilistic algorithms, that works by shifting the randomness from the algorithm to its input. In many cases deterministic algorithms with random inputs are easier to analyze than probabilistic algorithms.

Random walks (time permitting)

Given a graph, start with one of its nodes, and at each step move from the currently selected node to one of its neighbors, chosen uniformly at random. How long should it take to reach each of the other nodes? This and other related questions could form the basis of a whole course by themselves. I hope to present an introduction here; additional results will be presented in an advanced topics course on algebraic methods and algorithms, which Yuval Rabani and I are working on and is intended to be given in the future.

Derandomization techniques

Deterministic algorithms are preferable to probabilistic ones, and this is where derandomization techniques may help. I will cover the method of conditional expectations, and if there is time, I will also outline the method of small sample spaces; the complete construction of small sample spaces is also intended to be part of an advanced topics course on algebraic methods.