=============================================================== Lect 19 - December 2, 2008 - ECS 20 - Fall 2008 - Phil Rogaway =============================================================== Today: o Probability, cont. Announcements - Last topic: graphs (Schaum's chapter 8) ---------------------------------------------------------------------------- 1. Basic definitions / theory ---------------------------------------------------------------------------- * mark what we didn't give before Schaum's, chapt 7. DEF: A [*finite*] *probability space* is a finite set S ("the sample space") together with a function P: S-> [0,1] (the *probability measure*) that \sum P(x) = 1 More often: omega, mu, Omega x \in S for x P S In general, whenever you hear "probability" make sure that you are clear WHAT is the probability space and WHAT is the event in question. DEF: Let (S, P) be a probability space. An *event* is a subset of S. An *outcome* is a point in S. DEF: Let A be an event of probability space (S, P). P(A) = \sum P(a) (used to using Pr, will probability slip) a \in A "The probability of event A" DEF: The *uniform* distribution is the one where P(a) = 1/|S| -- all points equiprobable. DEF: Events A and B are *independent* if P(A\cap B) = P(A) P(B). ** DEF: A *random variable* is a function X: S -> \R ** DEF: E[X] = \sum P(s)X(s) // expected value of X ("average value") s in S DEF P(A|B) = P(A \cap B)/P(B) Propositions: - P(\emptyset) = 0 // by definition - P(S) = 1 - P(A) + P(S \ A) = 1 - If A and B are disjoint events (that is, disjoint sets) then P(A u B) = P(A) + P(B) - ("sum bound") P(A u B) <= P(A) + P(B) - In general, P(A u B) = Pr(A) + Pr(B) - P(A intersect B) // inclusion-exclusion principle ** - Pr(A) = P(A|B1)P(B1) + P(A|B2)P(B2) If B1,B2 disjoint sets whose union is S - More generally, P(A) = P(A|B1)P(B1) + ... + P(A|Bn)P(Bn) if B1 ... Bn partition S. ** - E(X+Y) = E(X) + E(Y) // expectation is linear. ---------------------------------------------------------------------- Eg 1: Dice. The singular, the students assure me, is Die. Lice Mice and Mie. Or something like that. ---------------------------------------------------------------------- o Pair of dice, what's the chance of rolling an "8"? Event E = {(2,6),(3,5),(4,4),(5,3),(6,2)} P(E) = 5/36 \approx 14% Be careful: P(E) = |E|/|S| *if* we are assuming the *uniform* distribution. o What's the chance of rolling an 8 if I tell you "both dice were even". Probability (Roll an 8 | both dice even) Method 1: Imagine the new probability space: (2,2), (2,4), (2,6), (4,2), (4,4), (4,6), (6,2), (6,4), (6,6) *** **** *** So probability is 3/8 \approx 33% Method 2: A little more "mechanically" A = "rolled an 8" B = "both dice even" P(A | B) = P(A\cap B)/P(B) = (3/36) / (9/36) = 5/9 ^ look back at the five points -- 3 of the five had both even ---------------------------------------------------------------------- Eg 2: An urn contains 30 white balls and 30 black balls. You pull out 5 balls (no replacement). a. What's the chance they all have the same color? b. What's the chance if I tell you that the first one was white? c. What's the chance if I tell you that the first two were white? ---------------------------------------------------------------------- a) P(monochromatic) = P(allwhite) + P(allblack) = 2 * C(30,5) / C(60,5) = 285,012 / 5 461 512 \approx 0.521 (5.2%) b) first one red -- unchanged, no information Symbolically, P(monochromatic |firstwhite) P(monochromatic and firstwhite) C(30,5) / C(60,5) ---------------------------------- = ----------------- P(firstwhite) 1/2 c) P(monochromatic | firsttwowhite) P(monochromatic and firsttwowhite) C(30,5) / C(60,5) ---------------------------------- = ----------------- \approx 0.1062 P(firsttwowhite) (1/2)(29/59) (10.6%) ----------------------------------------------- Eg 4: Monty Hall Problem (keep-or-switch game) ----------------------------------------------- ======== ======== ======== | | | | | | | bad | | bad | | good | | | | | | | ======== ======== ======== 1 2 3 You choose a random door Should you switch? loc of good prize my guess S = {1, 2, 3} x {1, 2, 3} WIN = get good prize (1,1) (1,2) (1,3) (2,1) (2,2) (2,3) (3,1) (3,2) (3,3) Lose Win Win Win Lose Win Win Win Lose Win: 6/9 = 2/3 Or just choose door 1 S = {1, 2, 3} 1 2 3 lose win win ----------------------------------------- Eg 5: Same parity game ----------------------------------------- Alice randomly, uniformly chooses two distinct numbers between 1 and 10. What is the probability they have the same parity? Is it exactly 1/2? Should actually be less than a 1/2, because distinct S = { (a,b) \in {1..10}^2: a\ne b} |S| = 90 E = {(a,b) \in {1..10}^2: a mod 2 = b mod 2} |E|= |evenAevenB| + |oddAoddB| 5 * 4 + 5 * 4 40/90 = 4/9 \approx 44% ----------------------------------------- Eg 6: Bigger/ smaller game ----------------------------------------- Alice uniformly chooses two distinct numbers between 1 and 10, announces the that FIRST. Bob guesses if the second is SMALLER or LARGER. How should Bob play optimally and, if he does so, what is his chance to win? As usual, start by figuring out the sample space S = {(i,j) \in {1..10}^2: i\ne j} 1 2 3 4 5 6 7 8 9 10 - If Alice announces 1,2,3,4,5 guess SMALLER - If Alice announces 6,7,8,9,10 guess LARGER P(Win) = P(Win| AliceAnswers1) P(AliceAnswers1) + P(Win| AliceAnswers2) P(AliceAnswers2) + .. P(Win| AliceAnswers10) P(AliceAnswers10) = (1/10) (P(Win | AliceAnswers1) + ... + P(Win | AliceAnswers10) = (1/10) (9/9 + 8/9 + 7/9 + 6/9 + 5/9 + 9/9 + 8/9 + 7/9 + 6/9 + 5/9) = (1/10) (7*10/9) numbers clearly average 7 = 70/90 = 7/9 \approx 78% ----------------------------------------- Eg 7: Expected value ----------------------------------------- Alice rolls a die. What's do you expect the square of her roll to be? could be 1 .... could be a 36! Definition: a RV is a function from X: S -> \R Definition: E[X] = \sum X(s) P(s) s So, in this problem, E[X] = 1(1/6) + 2^2(1/6) + 3^2(1/6) + ... + 6^2(1/6) = 1/6(1+4+9+15+25+36) = 91/6 \approx 15.2 Exercise: Repeat, supposing she rolls a *pair* of dice: ----------------------------------------- Eg 8: Subway ----------------------------------------- When Pablo leaves his office late at night, he wanders to the subway and takes the first train North or South: Girlfriend's home /|\ | | | Laboratory -------> subway stop | | | \|/ Student's home There are trains every 10 mins, both N and S. During the last 31 days, Pablo only has gone home 3 times, and this seems to be about typical Explain what is going on and compute Pablo's average wait time for a triain? Example: Northbound Southbound 11:00 1 min 11:01 9 min 11:10 1 min 11:11 9 min 11:20 11:21 Let X = Wait time (1/10) (0.5 min) + (9/10) ( 4.5 mins) = 0.05 + 4.05 mins = 4.1 mins How should the trains be staggered to minimize Pablo's wait time? 11:00 11:05 11:10 11:05 11:20 Average wait time will be 2.5 mins ---------------------------------------------------------- Eg 9: Birthday analysis -- again (we didn't get to this) ---------------------------------------------------------- Select q random points, with replacement, from universe of N points Let C_i = event that point i collides with a previous one. D_i = event that no collision up to time i P(collision) = 1 - P(D_q) = 1 - P(D_q | D_{q-1}) P(D_{q-1}) = 1 - P(D_q | D_{q-1}) P(D_{q-1}) = 1 - P(D_q | D_{q-1}) P(D_{q-1} | D_{q-2}) P(D_{q-2}) = ... q-1 = 1 - \prod P(D_{i+1}|D_i) i = 1 q-1 = 1 - \prod (1- i/N) i = 1 Now, let's approximate 1-x by exp(-x) (1-x <= e^-x) 1+x \approx exp(x) when x\approx 0 . q-1 = 1 - \prod exp(-i/N) i = 1 = 1 - exp(-1/N - 2/N - 3/N - ... - (q-1)/N) = 1 - exp(-q(q-1)/2N) . = 1 - exp(-q^2/2N) So: about how large should q be for this to be 1/2? 0.5 = 1 - exp(-q^2/2N} 0.5 = exp(-q^2/2N} - ln 2 = -q^2/2N (2 ln 2) N = q^2 q = \sqrt(2 ln 2) sqrt(N) = 1.177 sqrt(N) Application: SHA1 has 160-bit outputs. About how long to find a collision by trying successive points? Assume SHA1 behaves as a random function would. 1.177 * 2^80 tries If each try take 1 usec, so can do 10^6 tries/second: 1.177 * 2^80 / 10^6 / 3.1*10^7 = 1 1.177 * 2^80 / 10^6 / 3.1*10^7 > 10^10 years