I was recently reading through the PBRT fourth edition, an exciting followup of the excellent third edition. They derive & show that the variance of stratified sampling is necessarily lower than that of random sampling in Chapter 2, Sect 2.2.1 (Page 61). The derivation was not very clear to me and required some thinking/writing of my own. Plus, they refer to Eric Veach’s thesis for an important expression of the variance (that thesis is a gold mine!). However, I could not for the life of me find a step by step derivation of that expression there too!

Well, that’s the motivation for this blog. Writing stuff like this down helps my understanding as well :-).


Definitions

We will denote random variables with capital letters like so: \(X\). Expected value is denoted as \(E[\ ]\) & variance is denoted as \(V[\ ]\).

Recall the definition of expectation: \(E[f(X)] = \int_{D} f(x) \cdot p(x) dx\), where \(D\) is the domain and \(p(x)\) is the probability of choosing \(x\).

Also, recall that variance is expected squared deviation from mean: \(V[f(X)] = E \left [ (f(X) - E[f(X)])^2 \right ]\).

Finally, recall that a function of a random variable is still a random variable: \(Y = f(X)\).

Let’s first define our target integral over the domain \(\Lambda\):

$$ L = \int_{\Lambda} f(x) dx. $$

Lets now define all stratum \(\Lambda_1, \Lambda_2, ... \Lambda_n\), which are non-overlapping and combine to form the whole domain: \(\bigcup_{i=1}^{n} \Lambda_i = \Lambda\). Each stratum has fractional volume \(v_i \in (0, 1]\).

The integral within a stratum \(i\) is: \(L_i = \int_{\Lambda_i} f(x) dx\).

With this, the target integral can be decomposed like so:

$$ L = \int_{\Lambda_1} f(x) dx + \int_{\Lambda_2} f(x) dx + ... + \int_{\Lambda_n} f(x) dx = \sum_i \int_{\Lambda_i} f(x) dx. $$

We will use the following two properties of expectation and variance, given a constant \(a\):

$$ E[af(X)] = a E[f(X)], $$

$$ V[af(X)] = a^2 V[f(X)] $$

We will also use the following two properties of expectation and variance, given random variables \(X_i\) are identically and independently distributed (as is in rendering with Monte Carlo):

$$ E \left [ \sum_i f(X_i) \right ] = \sum_i E \left [ f(X_i) \right ], $$

$$ V \left [ \sum_i f(X_i) \right ] = \sum_i V \left [ f(X_i) \right ] $$

Finally, we will make use of the decomposition property of variance, for two random varibles \(X, Y\):

$$ V[X] = E \left [ V[X | Y] \right ] + V \left [ E[X | Y] \right ], $$

which states that the variance of \(X\) is the sum of mean of the conditional variance and the variance of the conditional mean.


Estimators

Using uniform sampling in the stratum \(\Lambda_i\) with volume \(v_i\) using \(n_i\) samples, we can write down it’s monte carlo estimator like so:

$$ F_i = \frac{1}{n_i} \sum_{j} \frac{f(X_{i, j})}{1 / v_i} = \frac{v_i}{n_i} \sum_{j} f(X_{i, j}), $$

where \(X_{i, j}\) is the jth sample in the ith stratum.

The estimator of the target integral can be written in terms of estimators of individual strata like so:

$$ F = \sum_i F_i. $$


Variance of Stratified Sampling

Denote the variance within a stratum as: \(\sigma_i^2 = V[f(X_{i, j})]\).

Now, the vaiance of the stratum estimator \(F_i\) is:

$$ V[F_i] = V \left [ \frac{v_i}{n_i} \sum_j f(X_{i, j}) \right ] $$

Using the first set of properties:

$$ V[F_i] = \frac{v_i^2}{n_i^2} V \left [ \sum_j f(X_{i, j}) \right ] $$

Using the second set of properties:

$$ V[F_i] = \frac{v_i^2}{n_i} \cdot \frac{1}{n_i} \sum_j V \left [ f(X_{i, j}) \right ] $$ $$ V[F_i] = \frac{v_i^2}{n_i} \cdot \frac{1}{n_i} \sum_j \sigma_i^2 $$ $$ V[F_i] = \frac{v_i^2}{n_i} \cdot \sigma_i^2 \cdot \frac{1}{n_i} \sum_j 1 $$ $$ V[F_i] = \frac{v_i^2 \sigma_i^2}{n_i} $$

Finally, the variance of the target estimator is:

$$ V[F] = V[\sum_i F_i] = \sum_i V[F_i] $$ $$ V[F] = \sum_i \frac{v_i^2 \sigma_i^2}{n_i} $$

Typically, we take \(n_i = v_i n\), thus:

$$ \boxed{ V[F] = \frac{1}{n} \sum_i v_i \sigma_i^2 } $$


Variance of Random Sampling

Random sampling can be done within this framework by first choosing a stratum \(i\) according to discrete probabilities defined by \(v_i\) and then choosing a sample uniformly within that stratum.

Define probabilities of choosing strata like so (since \(v_i \in (0, 1]\)):

$$ p(i) = v_i, $$

and the probability of uniformly choosing a sample \(j\) given a stratum as:

$$ p(X_{i, j} | i) = \frac{1}{v_i}. $$

With this, the target estimator is:

$$ F = \frac{1}{n} \sum_j \frac{f(X_{i, j})}{p(X_{i, j} | i) p(i)} $$ $$ F = \frac{1}{n} \sum_j f(X_{i, j}) $$


We are now choosing \(X_{i, j}\) conditionally on \(I\) (random variable on strata).

Variance of \(f(X_{i, j})\) is not just witin a stratum like defined before. But, we can use the decomposition property to get variance of \(f(X_{i, j})\):

$$ V[f(X_{i, j})] = E \left [ V[f(X_{i, j}) | I] \right ] + V \left [ E[f(X_{i, j}) | I] \right ] $$


Lets look a the first term in the sum: \(E[V[..]]\). The variance is of \(f(X_{i, j})\) given a stratum, which is exactly \(\sigma_i^2\). Thus,

$$ E \left [ V[f(X_{i, j}) | I] \right ] = E[\sigma_i^2] $$

Since expectation over strata are on discrete probabilities:

$$ E \left [ V[f(X_{i, j}) | I] \right ] = \sum_i \sigma_i^2 \cdot p(i) $$ $$ E \left [ V[f(X_{i, j}) | I] \right ] = \sum_i \sigma_i^2 v_i $$


Lets now look at the second term: \(V[E[..]]\). Here, the expected value is of \(f(X_{i, j})\) given a stratum, which is exactly \(L_i\).

$$ V \left [ E[f(X_{i, j}) | I] \right ] = V[L_i] $$

In this case, the variance will be with respect to the target integral.

$$ V \left [ E[f(X_{i, j}) | I] \right ] = E[(L_i - L)^2] $$

Again, due to discrete probabilities:

$$ V \left [ E[f(X_{i, j}) | I] \right ] = \sum_i (L_i - L)^2 v_i $$


We now get an expression of the variance of \(f(X_{i, j})\) for random sampling:

$$ V[f(X_{i, j})] = \sum_i \sigma_i^2 v_i + \sum_i (L_i - L)^2 v_i $$


Finally, the variance of the target estimator is:

$$ V[F] = V \left [ \frac{1}{n} \sum_j f(X_{i, j}) \right ] $$

Again, using the first and second set of properties:

$$ V[F] = \frac{1}{n^2} \sum_j V \left [ f(X_{i, j}) \right ] = \frac{1}{n} \cdot \frac{1}{n} \sum_j V \left [ f(X_{i, j}) \right ] $$ $$ V[F] = \frac{1}{n} V \left [ f(X_{i, j}) \right ] $$ $$ \boxed{ V[F] = \frac{1}{n} \left [ \sum_i \sigma_i^2 v_i + \sum_i (L_i - L)^2 v_i \right ] } $$


Conclusion

The first and the second boxed equations show the variance of stratified and random sampling respectively. Its clear that the latter’s variance is larger due to the additional term.