Skip to content

Lecture 8 - The Probabilistic Method

约 2028 个字 预计阅读时间 7 分钟

  • Speaker: Prof. Alexander Scott

It can be very difficult to construct mathematical objects without embedding some sort of regular structure.

Random or typical objects often have desirable properties that are difficult to construct explicitly.

Example 1: Tournaments

A tournament is an orientation of a complete graph: every edge between xx and yy is assigned a direction (towards xx or towards yy).

A tournament has Property PkP_{k} if for every set of kk players, there is someone who beats all of them. For example, the cyclic tournament on three vertices has Property P1P_{1}. For large kk, tournaments with Property PkP_{k} are difficult to construct!

Theorem

If nk22k+1n \geq k^{2} 2^{k + 1}, then some tournament with nn vertices has Property PkP_k.

Idea:

  • Consider a random tournament.
  • Show that with positive probability it has Property PkP_{k}.
  • Deduce that some tournament must have this property.

Proof

Let TT be a random tournament on nn vertices, where each edge between xx and yy is directed independently with equal probability towards xx or yy. For each set SS of kk vertices, let BSB_{S} be the event that no vertex beats every vertex in SS (we say that SS is then a bad set).

Then

P(BS)=(112k)nkenk2kP(B_{S}) = (1 - \frac{1}{2^{k}})^{n - k} \leq e^{-\frac{n - k}{2^{k}}}

Here we have used the inequality 1+xex1 + x \leq e^{x}, which holds for all real numbers xx (and is extremely useful).

We deduce that the expected number of bad kk-sets BB is at most

(nk)enk2k\binom{n}{k} e^{-\frac{n - k}{2^{k}}}

Suppose this expectation is strictly less than 11. If on average we have less than 11 bad set, then there must be some tournament where we have less than 11 bad set (the minimum is at most the average).

Thus, it is sufficient to show that

(nk)enk2k<1\binom{n}{k} e^{-\frac{n - k}{2^{k}}} < 1

Recall that nk22k+1n \geq k^{2} 2^{k + 1}; then

(nk)enk2knkk!en2k+1\binom{n}{k} e^{-\frac{n - k}{2^{k}}} \leq \frac{n^{k}}{k!} e^{-\frac{n}{2^{k + 1}}}

It is straightforward to see that this is less than 11. \square

Let's note:

  • We did not directly show that TT exists: we showed that a random tournament had a strictly positive probability of having the required properties.
  • We needed to perform some calculations!

We can consider random structures, even when our original problem does not mention randomness.All the tools of Probability Theory are now at our disposal!

Example 2: Coding Theory

Coding Theory addresses the problem of transmitting information through a binary channel: in other words, we aim to send information as a sequence of 00s and 11s.

A code is a collection of binary strings, one for each type of information we aim to transmit (for example, one string for each letter of our alphabet).

Sometimes the channel is noisy, in which case we require our strings to be highly distinct (we need an error-correcting code). Even without noise, important questions arise. For example, how quickly can we transmit information through a channel?

A prefix of a binary string is an initial segment. For example, 010010 is a prefix of 01010101 but not of 10101010.

A set FF of binary strings is prefix-free if no string in FF is a prefix of another. A fundamental theorem, the Kraft-McMillan Inequality, applies to prefix-free codes:

Theorem (Kraft-McMillan Inequality)

Let FF be a prefix-free set of binary strings, and suppose that FF contains NiN_{i} strings of length ii for each ii. Then

i0Ni2i1\sum_{i \geq 0} \frac{N_{i}}{2^{i}} \leq 1

Proof

Consider a random (infinite) sequence B=b1b2B = b_{1} b_{2} \dots. For a string CC of length kk,

P(C is a prefix of B)=2kP(C\ \text{is a prefix of}\ B) = 2^{-k}

Thus, the expected number of strings from FF that occur as a prefix of BB is

CF2C=i0Ni2i\sum_{C \in F} 2^{-|C|} = \sum_{i \geq 0} \frac{N_{i}}{2^{i}}

On the other hand, we can never have more than one string from FF as prefixes simultaneously, so we deduce that

i0Ni2i1\sum_{i \geq 0} \frac{N_{i}}{2^{i}} \leq 1

(the average is at most the maximum). \square

Example 3: Max Cut

In the first example, the linearity of expectation is straightforward: if X=XiX = \sum X_{i}, then

E(X)=E(Xi)\mathrm{E}(X) = \sum \mathrm{E}(X_{i})

It also applies to variance if the random variables are independent.

In the Max Cut problem, we are given a graph GG and aim to divide its vertices into two classes V1,V2V_{1}, V_{2} so that as many edges as possible have ends in both classes.

The Max Cut problem is known to be a challenging algorithmic problem. (It is NP-hard; in fact, it is NP-hard even to find a good approximate solution!)

A theorem provides a simple bound for the Max Cut problem:

Theorem

For every graph GG, there exists a partition V(G)=V1V2V(G) = V_{1} \cup V_{2} such that at least half the edges of GG have one end in each class.

Proof

Consider a random partition. For each edge e=(x,y)e = (x, y), we define a random variable XeX_{e} by setting Xe=1X_{e} = 1 if ee has one end in each set and Xe=0X_{e} = 0 otherwise. Let

X=eXeX = \sum_{e} X_{e}

We aim to show that there exists some partition in which Xe(G)2X \geq \frac{e(G)}{2}.

Observe that

E(Xe)=12\mathrm{E}(X_{e}) = \frac{1}{2}

Thus, by linearity of expectation

E(X)=e(G)2\mathrm{E}(X) = \frac{e(G)}{2}

It follows that there exists some partition for which at least half the edges have one end in each class. \square

It is possible to derandomise this argument to obtain a very fast algorithm.

Example 4: Independent Sets

In the alteration method, we generate a random structure and then modify it to achieve the desired properties.

For example: an independent set in a graph is a set of vertices, none of which are joined by edges.

Theorem

Let GG have nn vertices and average degree dd. Then GG contains an independent set of size at least n2d\frac{n}{2d}.

Proof

We generate a set in two steps. First, let SS be a random subset of V(G)V(G) obtained by including each vertex independently with probability pp. Then, let TT be obtained from SS by deleting one end from each edge.

The expected size of SS is

E(S)=pn\mathrm{E}(S) = pn

The expected number of edges contained in SS is

E(e(S))=p2e(G)=p2nd2\mathrm{E}(e(S)) = p^{2}e(G) = \frac{p^{2}nd}{2}

Thus, on average, the number of vertices remaining is

E(T)pnp2nd2\mathrm{E}(T) \geq pn - \frac{p^{2}nd}{2}

Setting p=1dp = \frac{1}{d} yields

E(T)n2d\mathrm{E}(T) \geq \frac{n}{2d}

\square

Broader Horizons: Random Graphs

A highly important and fascinating example is provided by random graphs. The theory of random graphs was initially developed in the 1960s, but it has since grown into a significant and influential area of research, with connections to numerous fields.

In the G(n,p)G(n, p) model, we consider an nn-vertex graph in which each edge is present independently with probability pp. Thus, G(n,0)G(n, 0) has no edges, G(n,1)G(n, 1) is the complete graph, and G(n,12)G(n, \frac{1}{2}) has (on average) approximately half the edges.

Random graphs are used to model numerous real-world processes, from social networks to epidemics. A deep theory exists regarding the various changes that a typical random graph undergoes as pp increases from 00 to 11.

What can we say about the typical structure of a graph in G(n,p)G(n, p), where p=p(n)p = p(n) depends on nn?

Let us consider triangles. How large must pp be for triangles to appear in G(n,p)G(n, p)? The expected number of triangles is given by

(n3)p3n3p36\binom{n}{3} p^{3} \sim \frac{n^{3}p^{3}}{6}

Thus, if p1np \ll \frac{1}{n}, then the expected number of triangles tends to 00. It follows (for example, by Markov's Inequality) that the probability that GG contains a triangle tends to 00.

On the other hand, if p1np \gg \frac{1}{n}, then the expected number of triangles tends to infinity. Does this imply that the probability of obtaining a triangle tends to 11?

To address this, we need to consider the variance.

Idea:

  • Consider a random graph G(n,p)G(n, p), and let XX denote the number of triangles
  • Calculate the mean and variance of XX
  • Use Chebyshev's Inequality to show that it is highly unlikely that X=0X = 0

For each triple of vertices A={x,y,z}A = \{x, y, z\}, we define a random variable XAX_{A} as follows:

XA={1, if xyz is a triangle in G0, otherwiseX_{A} =\left\{ \begin{aligned} 1,\ & \text{if}\ xyz\ \text{is a triangle in}\ G \\ 0,\ & \text{otherwise} \end{aligned} \right.

We know that E(X)n3p36\mathrm{E}(X) \sim \frac{n^{3}p^{3}}{6}. Let us calculate its variance σ2\sigma^{2}. We have

σ2=E[(XE(X))2]=A,Bcov(A,B)\sigma^{2} = \mathrm{E}[(X - \mathrm{E}(X))^{2}] = \sum_{A, B} \text{cov}(A, B)

where the sum is taken over all pairs of AA and BB.

If AA and BB are disjoint, then XAX_{A} and XBX_{B} are independent. Thus, cov(A,B)=0\text{cov}(A, B) = 0. In fact, if AB=1|A \cup B| = 1, then they remain independent.

Thus

σ2=AB=2cov(A,B)+AB=3cov(A,B)\sigma^{2} = \sum_{|A \cup B| = 2} \text{cov}(A, B) + \sum_{|A \cup B| = 3} \text{cov}(A, B)

If AB=2,then|A \cup B| = 2, then

cov(A,B)=E(XAXB)E(XA)E(XB)=p4p6p4\text{cov}(A, B) = \mathrm{E}(X_{A} X_{B}) - \mathrm{E}(X_{A}) \mathrm{E}(X_{B}) = p^{4} - p^{6} \leq p^{4}

and if AB=3|A \cup B| = 3, then A=BA = B,

cov(A,B)=p3p6p3\text{cov}(A, B) = p^{3} - p^{6} \leq p^{3}

Thus

σ2(n3)p3+(n2)(n2)(n3)p42n4p4\sigma^{2} \leq \binom{n}{3} p^{3} + \binom{n}{2} (n - 2)(n - 3) p^{4} \leq 2 n^{4} p^{4}

We now apply Chebyshev's Inequality:

P(Xμt)σ2t22n4p4t2P(|X - \mu| \geq t) \leq \frac{\sigma^{2}}{t^{2}} \leq \frac{2 n^{4} p^{4}}{t^{2}}

By setting tt to different values, one can determine how large pp must be for the expected number of triangles to meet certain conditions.

We have demonstrated:

  • If np0np \rightarrow 0, then P(G contains a triangle)0P(G \ \text{contains a triangle}) \rightarrow 0
  • If npnp \rightarrow \infty, then P(G contains a triangle)1P(G \ \text{contains a triangle}) \rightarrow 1

We say that p(n)=1np(n) = \frac{1}{n} is a threshold function for the presence of triangles in GG.

We observe the same pattern for other graphs: there exists a threshold pHp_{H} such that if ppHp \ll p_{H}, then it is highly unlikely that G(n,p)G(n, p) contains a copy of HH; but if ppHp \gg p_{H}, then G(n,p)G(n, p) will almost certainly contain copies. This is an example of a phase transition.

For many graphs HH, the threshold for GG to contain a copy of HH occurs around the point where the expected number of copies of HH becomes large.

Finally, let us briefly discuss martingale methods. A martingale is (informally) a sequence of random variables X0,X1,X_{0}, X_{1}, \dots where E(Xi+1Xi)=Xi\mathrm{E}(X_{i + 1} | X_{i}) = X_{i}: at each step, the expected value remains unchanged. (This can be likened to making a sequence of fair bets in a casino.)

Martingales are highly useful in analysing various types of random graph processes. For example, let χ(G)\chi(G) denote the chromatic number of GG. Estimating the average value of χ(G)\chi(G) for a random graph is challenging, but it is known to be approximately

χ(G)n2log2n\chi(G) \sim \frac{n}{2}\log_{2}n

The key idea is as follows:

  • Reveal the graph one vertex at a time: let XiX_{i} denote the expected chromatic number of GG given the information provided by the first ii vertices.
  • The sequence (Xi)i=0n(X_{i})_{i = 0}^{n} forms a martingale!
  • Now apply a martingale inequality (for example, the Hoeffding Inequality, a relative of the Chernoff Inequality for binomial random variables) to show that XX is likely to be close to its mean.

The result follows elegantly!