Bootstrapping all the things – Practical Significance

As of this morning, version 0.4.1 of the ‘svrep’ package is now up on CRAN. The major release 0.4.0 adds some useful new functionality for implementing bootstrap methods for surveys. The new vignette, “Bootstrap Methods for Surveys”, gives an overview of the bootstrap methods included in the package and explains how to use them for a wide variety of survey designs. If you’re interested in using the new functionality, I’d recommend starting with the vignette.

This blog post highlights four helpful new features that are introduced by this version of ‘svrep’.

A simplified wrapper function for bootstrap methods from the ‘survey’ package
The Rao-Wu-Yue-Beaumont bootstrap
The generalized bootstrap
Tools to help choose the number of bootstrap replicates

Simplified Wrapper for Bootstrap Methods of the ‘survey’ Package

The survey package already implements a number of fairly-general bootstrap methods for complex surveys. But it’s hard to figure out which of these methods is appropriate for your survey, and there are a couple of important limitations to consider for each method. The new ‘svrep’ vignette has a section to help clarify these options and their appropriateness for different survey designs. In addition, the new function as_bootstrap_design() provides a hopefully simpler interface for using those bootstrap methods as well as clearer documentation to explain when each method is appropriate.

Below is an example of using the new wrapper functions. First, we create our usual survey design object.

library(survey)

## Load an example dataset from a multistage sample, with two stages of SRSWOR
  data("mu284", package = 'survey')
  multistage_srswor_design <- svydesign(data = mu284,
                                        ids = ~ id1 + id2,
                                        fpc = ~ n1 + n2)

Next, we simply call as_bootstrap_design().

library(svrep)

## Convert the survey design object to a bootstrap design
as_bootstrap_design(multistage_srswor_design, type = "Rao-Wu")

as_bootstrap_design(multistage_srswor_design, type = "Canty-Davison")

as_bootstrap_design(multistage_srswor_design, type = "Preston")

The Rao-Wu-Yue-Beaumont Bootstrap

The ‘svrep’ update also adds a new bootstrap method (referred to as the “Rao-Wu-Yue-Beaumont” bootstrap) which was introduced this year in a nice open-access paper by Jean-François Beaumont and Nelson Émond. This method is more generally applicable than the methods currently available in the ‘survey’ package, and can be used for multistage stratified samples with large sampling fractions and stages that use unequal-probability sampling or Poisson sampling. Plus, it can be used for designs that use multiple types of sampling (e.g., simple random sampling at one stage and unequal probability sampling at another). It also has the nice property of generating nonnegative weights, which is important from a practical standpoint.

The method doesn’t really have a name given in the paper, so I referred to it in ‘svrep’ as “Rao-Wu-Yue-Beaumont” since it’s an extension of the Rao-Wu-Yue bootstrap and in the author contributions section of the paper it says Jean-François Beaumont came up with it.

For those reasons, the new Rao-Wu-Yue-Beaumont bootstrap seems like a reasonable default method to use for many stratified multistage samples. So that’s why I’ve made it the default method used in the new convenience function as_bootstrap_design(). Here’s an example of using the method for a multistage survey with unequal probability sampling at the first stage and simple random sampling at the second stage.

library(svrep)

# Load data from a multistage-sample,
# first stage selected with unequal probabilities without replacement
# second stage selected with simple random sampling without replacement
data("library_multistage_sample", package = "svrep")

# Create a survey design object
multistage_design <- svydesign(data = library_multistage_sample,
                               ids = ~ PSU_ID + SSU_ID,
                               fpc = ~ PSU_SAMPLING_PROB + SSU_SAMPLING_PROB,
                               pps = "brewer")

# Convert to a bootstrap design object
bootstrap_rep_design <- as_bootstrap_design(
  design = multistage_design, 
  samp_method_by_stage = c("PPSWOR", "SRSWOR"),
  replicates = 500
)

# Estimate sampling variance
svymean(x = ~ TOTCIR, na.rm = TRUE, design = bootstrap_rep_design)

         mean    SE
TOTCIR 196699 42814

The Generalized Bootstrap

The new feature I’m most excited about is the generalized survey bootstrap. The generalized bootstrap for surveys is an extremely flexible tool for estimating sampling variances. You can use it for stratified multistage samples. And it works for simple random sampling, unequal probability sampling, and even systematic sampling, regardless of whether sampling is with-replacement or without-replacement.

The basic idea behind the generalized bootstrap

The basic idea of the generalized bootstrap is to “mimic” a target variance estimator for population totals, where the target variance estimator is appropriate to the particular sampling design and can be written down as a quadratic form. For example, the generalized bootstrap can mimic the Horvitz-Thompson estimator or the usual variance estimator used for simple random sampling. By “mimic”, here, I mean that the generalized bootstrap variance estimate for a population total on average exactly matches the variance estimate for the population total produced by the target variance estimator. But of course, using the bootstrap rather than the target variance estimator is nice because you get all the benefits of replication: it’s easy for analysts to use and it simplifies variance estimation for complex statistics or nonresponse-adjusted estimates.

The generalized bootstrap’s flexibility comes from the fact that basically all of the variance estimators for totals that are commonly used for surveys can be written down as a quadratic form, and any variance estimator which can be written as a quadratic form can be turned into a replication variance estimator. Bob Fay proved and explained this in a series of fascinating papers (1, 2, and 3) in the 1980s (check them out!), and Bertail and Combris (1997) and Beaumont and Patak (2012) developed theory to help us understand the effectiveness of the generalized bootstrap as one particular strategy for forming replicate weights from a target variance estimator.

Why the generalized bootstrap isn’t already popular

First introduced in the 1997, the generalized bootstrap for surveys has so far seen little use in practice (at least in the U.S. statistical system). Part of this is because the method is somewhat difficult to implement compared to earlier bootstrap methods, such as those developed by Rao and Wu. As I explain later, there’s some tedious math involved in setting it up; fortunately, this tedious math is actually easy to automate once you work it out, and the ‘svrep’ package does this for you. Another reason the method hasn’t been used more in practice is that, in its most basic form, the method can generate negative replicate weights, which is statistically valid but practically undesirable. Fortunately, the negative weights can be eliminated using a rescaling adjustment proposed by Beaumont and Patak (2012). The ‘svrep’ package does this for you, too. So hopefully, the ‘svrep’ package can help this useful statistical method see greater use in practice.

More generally, bootstrap methods of any kind have been slow to gain traction in official statistics because they require a larger number of replicates compared to other replication methods such as BRR or the jackknife. So while bootstrap methods are more flexible and arguably easier to implement than other replication methods, they require analysts to have better computers and take up more digital storage space. Statistics Canada seems to deem bootstrap methods worth the tradeoff, but U.S. statistical agencies at least still tend to prefer other replication methods.

I suspect this will change in the future as U.S. statistical agencies gradually upgrade their computing infrastructure and as survey methods graduate programs such as JPSM and MPSM continue to expand their focus on data science methods and computing tools.

Implementation

The tedious step in implementing the generalized bootstrap is the part where we first write down a target variance estimator as a quadratic form. Once you have that, it’s embarrassingly easy to generate bootstrap replicate weights: simply sample from a multivariate normal distribution. Fortunately, the ‘svrep’ package does both of these steps for us.

The new function make_quad_form_matrix() lets you specify the name of a target variance estimator (e.g., “Stratified Multistage SRS” or “Horvitz-Thompson”) and data describing the survey design (e.g., strata and cluster IDs), and outputs the quadratic form of your variance estimator. In other words, that’s the function that does all the hard work involved in implementing the generalized bootstrap. Once the quadratic form has been identified, the function make_gen_boot_factors() generates the bootstrap replicate weights by sampling from a multivariate normal distribution, and then does some optional adjustments to ensure that the weights are all strictly positive.

Application to systematic samples

One area where the generalized bootstrap is particularly helpful is for surveys with systematic sampling (and one-PSU-per-stratum designs, more generally). Compared to the main replication variance estimators used for systematic sampling, I think the generalized bootstrap is much easier to learn and understand.

Here, the alternative methods I’m referring to are Successive Differences Replication (SDR) and Balanced Repeated Replication (BRR) based on collapsing strata. Both of these require an understanding of Hadamard matrices, orthogonal balancing, partial vs. full balancing, etc. Stephen Ash’s 2014 paper does a nice job explaining SDR, but the concepts are just inherently complicated.

There are a couple of other tradeoffs worth mentioning:

Cons:

Compared to SDR and BRR, the generalized bootstrap is less efficient in the sense of requiring, say, 500 replicates to get the same stability as we can get with 100 replicates from the other methods.

Pros:

The generalized bootstrap can handle additional complexities. If there are multiple stages or phases of sampling, the generalized bootstrap can be adapted to this case by simply modifying the target variance estimator it’s mimicking. And if there are large sampling fractions, the generalized bootstrap easily accommodates this.
Like the bootstrap, SDR and BRR in practice often use a reduced number of replicates compared to what is actually needed to obtain the most precise variance estimates possible. But the effect of changing the number of replicates is transparent and easy to understand for bootstrap methods, while for SDR and BRR, understanding the impact of reducing the number of replicates is difficult and relies on some intricate theory related to full vs. partial balancing.

An example

Below is an example of how simply we can implement the generalized bootstrap for a systematic sample.

Step 1: Choose a target variance estimator. For systematic samples, the most commonly-used estimator is the “SD2” estimator (this is the estimator that SDR is based on). It can be written as follows:

$\begin{aligned} {\hat{v}}_{S D 2} (\hat{Y}) & = (1 - \frac{n}{N}) \frac{1}{2} [\sum_{k = 2}^{n} {({\overset{˘}{y}}_{k} - {\overset{˘}{y}}_{k - 1})}^{2} + {({\overset{˘}{y}}_{n} - {\overset{˘}{y}}_{1})}^{2}] \\ where & {\overset{˘}{y}}_{k} = w_{k} y_{k} is the weighted value for unit k \end{aligned}$

Step 2: Create a survey design object to represent the stratification and clustering structure of our data, and sort the rows of the data according to the sort order used in systematic sampling.

library(survey)
library(svrep)

data('library_stsys_sample', package = 'svrep')

## First, ensure data are sorted in same order as was used in sampling
library_stsys_sample <- library_stsys_sample[
  order(library_stsys_sample$SAMPLING_SORT_ORDER),
]

## Create a survey design object
design_obj <- svydesign(
  data = library_stsys_sample,
  strata = ~ SAMPLING_STRATUM,
  ids = ~ 1,
  fpc = ~ STRATUM_POP_SIZE
)

Step 3: Use the function as_gen_boot_design() to create the bootstrap weights

With that out of the way, we can simply use the function as_gen_boot_design() with the argument variance_estimator = 'SD2'.

set.seed(2020)
## Convert to generalized bootstrap replicate design
gen_boot_design <- as_gen_boot_design(
  design = design_obj,
  variance_estimator = "SD2",
  replicates = 2000
)

For `variance_estimator='SD2', assumes rows of data are sorted in the same order used in sampling.

## Estimate sampling variances
svymean(x = ~ LIBRARIA, na.rm = TRUE, design = gen_boot_design)

           mean     SE
LIBRARIA 7.1658 1.5374

Using helpful tools from the ‘survey’ package, we can inspect and visualize how estimates of interest vary across the bootstrap replicates.

Click to show R code for the plot

# Obtain estimates from each bootstrap replicate
boot_estimates <- svymean(x = ~ LIBRARIA, na.rm = TRUE, design = gen_boot_design,
                          return.replicates = TRUE)

# Visualize distribution of bootstrap replicate estimates
library(ggplot2)
library(schneidr)

data.frame('Estimate' = boot_estimates$replicates) |>
  ggplot(aes(x = Estimate)) +
  geom_histogram(bins = 20,alpha = 1/3, fill = schneidr::schneidr_purple(),
                 color = "white") +
  geom_vline(xintercept = boot_estimates$mean, color = schneidr::schneidr_purple(),
             linetype = 1, size = 3) +
  geom_label(x = boot_estimates$mean, y = 150, size = 5,
             hjust = -0.25, family = "Roboto Condensed",
             label = round(boot_estimates$mean, 2)) +
  # Add formatting, titles, and caption
  scale_y_continuous(name = NULL, expand = c(0,0), labels = NULL) +
  schneidr::theme_schneidr() +
  theme(axis.line.y = NULL) +
  labs(
    title = "Bootstrap estimates of the average number of librarians at U.S. public library systems in FY2020",
    subtitle = "Bootstrap estimates produced using the generalized survey bootstrap",
    caption = "Data come from the U.S. FY2020 Public Libraries Survey. Accessed via the 'svrep' R package."
  )

Tools for Choosing the Number of Replicates

Regardless of which bootstrap method you use, a decision has to be made of how many bootstrap replicates to create. Rules of thumb abound (e.g., “use 10,000” or “500 is plenty”). Some references simply say to try out a few different options and see how much the estimates move around, which is reasonable but vague advice.

The ‘svrep’ package now includes tools to help make a more principled decision for the number of replicates, based on the strategy proposed in Beaumont and Emond (2012). The new ‘svrep’ vignette has the details.

What’s Next?

There are still some important types of designs that aren’t automatically handled by as_gen_boot_design() or as_bootstrap_design(). Most notably, two-phase designs aren’t covered at all yet. I’m currently working on adding a generalized bootstrap method that adapts the usual two-phase variance estimators utilized by the ‘survey’ package’s twophase() functions. Implementing this is going to require some work figuring out both the appropriate quadratic form representation of the estimators and a reasonable user interface.

A more generally useful functionality would be an option to reduce the number of bootstrap replicates produced without sacrificing precision. There are a few good methods for this that have been proposed over the years, and I hope to implement at least one of them this spring.

After this has been implemented, I’ll collect all of the writing I’ve done for ‘svrep’ and wrap it up into a paper submission for JSS.