Practical Significance - Using baking to explain sampling theory

This post was written when I was sitting with my mother in the hospital last winter and I found myself needing to simultaneously: 1) write an entry in my “statistics diary” for a JPSM course on sampling theory; and 2) explain to my mom all the messy math I was scribbling away at that week. I was inspired by this post from Sharon Lohr’s blog about her experience conducting randomized experiments at home on how to brew the tastiest coffee. Aside from the fact that I just enjoy baking, one reason I like thinking about applying sampling theory to baking is that it represents an application where we care about both finite population and superpopulation inferences. For another matter, the ubiquity of batches in baking represents a clear example of how stratification and clustering impact our inferences.

Finite population vs. superpopulation inferences in baking

With baking, you could imagine a scenario where you care about finite population inferences–that is, inferences about a population of specific elements at a specific point in time–for the purposes of testing the quality of batches of cookies, for example. You might bake five batches of 20 cookies each, and you want to be confident that a certain minimum number of the 100 cookies you baked are tasty enough to present in a baking competition. In such a scenario, it’s clear that your inferential population is the specific set of 100 cookies that you baked. It’s also fairly apparent in this example that stratifying your sample by batch will be advantageous, because it’s clear that variation between batches (caused by cross-batch variations in oven temperatures, ingredient measurements, etc.) should be greater than variation within batches. ¹

Similarly, it’s clear that if you conduct experiments for the purposes of inferring ‘general lessons’ or recipes for baking, then it would be insufficient to quantify uncertainty using the tools of finite population inference. Why? Because if you bake a single batch of 20 cookies and measure a variable of interest for all of them (i.e. your outcome, ‘tastiness’, and the independent variables which comprise a recipe), then you have essentially zero uncertainty about your finite population of cookies that you’ve baked, but you still have a great deal of uncertainty about the “superpopulation” of cookies that you might potentially bake using your recipe.

Why clustering reduces effective sample size and stratifying increases it (and why it’s a statistically better tradeoff to sample more clusters than observations within clusters)

Let’s say you wanted to evaluate the average ‘tastiness’ of a new cookie recipe. To measure the average tastiness, you’d ideally want to take a random sample from the population of times you eat a cookie baked using that recipe. However, the times that you ‘sample’ a cookie from the recipe are clustered by batch: typically when you eat a cookie prepare using the recipe, you also eat at least one other cookie baked in the same batch. It’s pretty clear that any two cookies baked in the same batch are likely to be especially similar to one another: cookies from the same batch have almost exactly the same composition of ingredients, bake times, and oven temperatures.² Thus, when you taste a sample of, say, six cookies from the recipe, you have to take into account that your evaluation of the recipe’s effectiveness is heavily influenced by batch-level randomness.

If you bake a batch, sample six cookies, and find that they aren’t all that tasty, it could be because the recipe is not so good, or it could be because you left that particular batch in the oven too long. You can’t discern whether the general recipe or the particulars of the batch at hand are to blame, unless you try the recipe again on some new batches. That’s why–if you’re trying to learn about the recipe–you’re better off sampling six cookies from different batches than sampling six cookies from the same batch.

In statistical terms, you typically get more information by sampling additional clusters (i.e. trying a cookie from a previously untasted batch) rather than sampling additional elements in each cluster (i.e. trying one more cookie from a batch you’ve already tasted from). And if you’re analyzing a sample from a superpopulation in order to make inferences about a process, your confidence in your inference should depend on the degree of clustering in your sample are: are all the cookies you’re tasting just from a single batch or two, or are they drawn from several different batches?

When we draw a sample of six cookies in order to quality-check a finite population of 60 cookies sitting in our kitchen (5 batches of a dozen cookies each), we get more information by stratifying our sample to make sure we draw at least one cookie from each of the batches compared to simply picking a few of the 100 cookies totally at random. If we sample six cookies totally at random, we might by chance taste three cookies from one delicious batch and three cookies from a so-so batch but neglect to taste the over-salted cookies from a terrible batch. Stratifying reduces the chance of these extreme oversights: it reduces the variability of the kind of sample we get. That’s particularly useful when we know that the thing we’re measuring (in this case, taste) varies substantially between the strata (batches).

Balancing statistical efficiency against practical cost

While we learn more by baking several batches of a few cookies than by baking a few batches of several cookies, this comes at a cost. For example, if we decide to bake 60 cookies, it will take much more time and effort to prepare ten batches of six cookies than to prepare five batches of a dozen cookies. So for inferences about the tastiness of a superpopulation of cookies baked from a recipe, we might want to sacrifice statistical efficiency and bake a few batches of several cookies rather than several batches of a few cookies.

In finite population sampling, another type of tradeoff arises. Imagine you’re a head baker supervising a team of 50 bakers, and you want to make sure that the thousands of cookies they baked that morning are mostly up to standard. In order to draw a simple random sample, you’d have to go through the time-consuming process of indexing all of the thousands of cookies. Or you’d have to gather and shuffle them all up and hope that the result is random and that you haven’t ruined them in the process. It’d be much easier to just index the 50 bakers, randomly sample a few of them, and then taste the cookies from a random tray or two from each of the sample bakers. While this kind of multistage sampling isn’t as statistically efficient as a random sample, it’s much easier to pull off.

Footnotes

Assuming that is that you did a good job mixing on each batch (so that, for example, you don’t end up with a clump of walnuts in your batter that doesn’t end up being evenly mixed).↩︎
Oven temperature is surprisingly random, since most ovens’ actual temperatures vary randomly from the stated temperatures on their controls↩︎