# Expectations and Conditional Expectations

Happy 2021 everyone! This year, we are bringing more solutions to common interview questions to you.

First up, we are continuing with a couple of data science questions, especially on expectations. Expected value of random variables (and their correlations) are some of the most common, and some of the hardest interview questions to crack. Follow along to see some techniques on solving them.

(Feel free to browse a few other posts on data science interview questions)

# Problem 1 (Disney):

Alice and Bob are choosing their top 3 shows from a list of 50 shows. Assume that they choose independently of one another. Being relatively new to Hulu, assume also that they choose randomly within the 50 shows. What is the expected number of shows they have in common, and what is the probability that they do not have any shows in common?

# Solution:

Let’s recall the expectation of a discrete random variable `X` which takes values `{x_1, x_2, ..., x_n}`

What is `X` in our problem? It's the number of shows in common between Alice and Bob. What values does it take? Since there are at most 3 shows in common, the possible values of `X` are `{0, 1, 2, 3}`.

Let’s calculate the probability for each of these values. What is the probability that Alice and Bob have exactly one show in common?

• We first need to pick the exact show in common. There are `50` shows to choose from, and hence, there are exactly `50` ways to pick the common show.
• Alice needs to pick the remaining two shows out of `49`. This can be done in (⁴⁹₂) ways.
• Finally, Bob can only pick his shows from the remaining `47` shows. (Remember, Alice and Bob are only allowed one show in common). This can be done in (⁴⁷₂)ways.
• Finally, the total number of possible ways both of them picks their movies independently is (⁵⁰₃) * (⁵⁰₃)

Thus, by frequency counting, the probability of both Alice and Bob picking exactly one common movie is

After a few cancellations, this expression reduces to `3243/19600`.

We can similarly calculate `Pr(X = 2) = 141/19600`, and `Pr(X = 3) = 1/19600`.

Finally, the required expected value is

Remember, although the number of common movies is an integers, its expectation can be a fraction. In fact, because there are so many movies to choose from, the number of common movies that Alice and Bob will ever pick is pretty close to zero.

Finally, let’s calculate the probability of zero common movies between Alice and Bob in two different ways.

2. By simple frequency calculation. Alice has 3 movies to pick out of 50, while Bob has 3 movies to pick, but can only pick them from the remaining 47. As a result,

As expected, Alice and Bob pick no common movie with a very high probability.

# Problem 2 (Tesla):

Say you roll a fair dice repeatedly. Let X be the number of rolls until a 1 is rolled, and Y the number of rolls until a 4 is rolled. What is E[X|Y=2]?

# Solution:

This is a good problem because it tests your understanding of conditional expectation without explicitly involving the formula for conditional expectation!

What do we mean by the conditional random variable X|Y=2? It means,

• Our sample space is restricted to all rolls of dice where the second roll is a 4, and the first roll is not 4.
• Or if we write the sequence of rolls as (a₁, a₂, a₃, …), then we know that a₁≠ 4, and a₂ = 4.

To calculate the conditional expectation,

• We first find the possible values X can take, given Y=2, and
• Calculate the weighted probability (i.e. probability times value) for each possible value of X.

What are the possible values of X given Y=2. X takes infinitely many values in the set `{1, 3, 4, 5, ...}`. Notice — and here is the key — X can never take a value of 2, because the second roll of the die is always 4.

Pr[X=1|Y=2] is the simplest of all. It means that the first roll of the die is 1, which can happen with the probability 1/6. For X=n (n >= 3),

• The first roll of the die must be neither 1 or 4. The remaining 4 values can show up with probability 4/6.
• The second roll of the die is fixed at 4.
• Every roll of the die at index in `{3, 4, ..., n-1}` can take any value except 1, each independently, with probability 5/6.
• Finally, the nth roll of the die should be 1, with probability 1/6.

Multiplying these independent probabilities together, we get, for n >= 3,

Finally, The conditional expectation is

All that remains is to sum the infinite geometric series in the second term!. We are going to simplify the calculation by a variable substitution, `m = n - 3`. Thus,