TL;DR

In this post, I compare an R and Julia implementation of the cube method for selecting balanced random samples. While the code in both languages looks almost identical, the Julia implementation runs over about 5 times faster for the example dataset considered in this blog post. However, both the R and Julia implementations are way slower than an R function that uses C++ and Rcpp. This provides a nice applied example of how we can use Julia in certain places to write code that’s readable and fast, albeit slower than using a lower-level language like C++ or Rust. At the end of the post, I show how we can integrate R and Julia so that we can call the Julia function from R and enjoy the quick speed boost from using Julia.

Background

Recently on this blog, I wrote a post about the cube method and why I think it’s a promising method for survey sampling. The cube method in many ways simplifies our goals we have when we draw a stratified sample: it allows us to allocate sample sizes to several (potentially overlapping) domains and automatically get a sample that is well-calibrated to known population information. It’s a pretty neat method.

As I’ve learned more about the cube method, I’ve written some software that helps simplify variance estimation. The ‘svrep’ R package, for example, includes the Deville-Tille variance estimator for the cube method (see here for statistical details and here for a function that uses the estimator to construct replicate weights). I’ve also been working to implement an alternative variance estimator proposed by Breidt and Chauvet, which requires writing code that implements the cube method. For this variance estimator to be feasible in practice, the code that implements the cube method needs to be fast, since the variance estimator relies on replicating the sampling process over at least a few thousand simulations. Already there is a very fast and accurate implementation of the cube method in R and C++, available in the BalancedSampling package. Unfortunately, that C++ code is quite difficult to read and modify, and so is the R code in the ‘sampling’ package. So unfortunately the existing implementations of the cube method don’t seem to be a good starting point for this variance estimator code, which requires not only implementing the cube method but also extracting internal “diagnostic” information from the sampling algorithm.

Why Try Writing a Julia Implementation?

The key promise of Julia is that it’s supposed to provide a balance between readable and fast code for scientific programming. I do have some complaints about the readability of Julia, but on the whole it’s pretty comparable to Python and R. ¹ It has some nice bells and whistles, though: as in R, you can use Greek letters (or other unicode characters) in your code, which is cool. For example, (X/π) * β is valid code in both Julia and R. It also uses 1-based indexing, which is great for data analysis and statistical programming, even though Pythonistas and many developers in other programming domains prefer 0-based indexing. It’s also not too difficult to call Julia code from within R or Python: in R, the ‘JuliaConnectoR’ package makes it pretty easy to run Julia code from R, provided that you’ve installed Julia and any Julia packages needed in your code.

Side-by-Side R and Julia Code

The cube method contains two distinct steps, the ‘flight phase’ and the ‘landing phase’. In practice, both the flight phase and the landing phase also use what is referred to as a ‘fast flight’ algorithm. So to organize the code, I’ve written an overall function fast_cube() which uses internal helper functions cube_flight(), cube_land(), and fast_cube_flight(). The R and Julia code for each function is listed below.

The R and Julia code for each function is almost identical. There are some slight differences in function names (e.g., in R we can use ncol(X) but in Julia we use size(X, 2)). Also, basic operatiors like + are naturally vectorized in R, but in Julia operations are vectorized by writing a . before the operator.

Both the R and Julia implementations use the algorithms and notation in Leuenberger et al. (2022), which so far provides the most clear and complete algorithm description I’ve seen for the cube method and its “fast flight” version.²

Preliminary Helper Function

A key step in the cube method algorithm requires generating a vector in the null space of a given matrix. This kind of problem is best handled using a singular value decomposition. Below, I list R and Julia code that generates a null space vector. For Julia, I rely on a function nullspace() from the ‘LinearAlgebra’ package. For R, I could have used a similar function Null() from the ‘MASS’ package, but it uses a QR decomposition, which is faster but less accurate than SVD.

# R

# Helper function to obtain a vector in null space of a matrix
get_null_space_vector <- function(X, ϵ=1e-11) {
  svd_decomp <- svd(X, nu = 0, nv = ncol(X))
  rank_of_X <- sum(abs(svd_decomp$d) > ϵ)
  
  if (rank_of_X < ncol(X)) {
    svd_decomp$v[,ncol(svd_decomp$v),drop=TRUE]
  } else {
    NULL
  }
}

# Julia

# Helper function to obtain a vector in matrix's null space
import LinearAlgebra: nullspace
function get_null_space_vector(X::Matrix{Float64}) 
  try nullspace(X)[:,end] 
  catch Nothing 
  end
end

Basic Flight Phase Code

This code implements the basic flight phase step, which iteratively updates the vector of probabilities \(\mathbf{\pi}\) while respecting the constraints implied by the matrix \(\mathbf{X}\). At time \(t\) in the flight phase, the algorithm moves \(\mathbf{\pi}_t\) by adding a random vector \(\mathbf{\delta}_t = \lambda_t \mathbf{u}_t\) whose expectation is \(\mathbf{0}\) but which always has the effect of changing at least one element \(\pi_k \in (0,1)\) to \(\pi_k \in \{0,1\}.\).

The vector \(\mathbf{u}_t\) is generated by finding a vector in the null space of the weighted constraint matrix \(A = (diag\left(\pi^{-1}\right)\mathbf{X})^T\), which has dimension \(p \times n\). This is what ensures that the sample is perfectly balanced and that the sampling process has the desired selection probabilities \(\mathbf{\pi}\).

The flight phase ends either when the sampling process has concluded (i.e., \(\pi_{k,t} \in \{0,1\}, k=1,\dots,N\)) or if a time \(t\) is reached such that the constraint matrix for remaining case (i.e., cases \(k\) such that \(\pi_{k,t} \in (0,1)\)) has an empty null space and so a vector \(\mathbf{u}_t\) cannot be generated.

# R

# Conducts flight phase of cube method
cube_flight <- function(π, X, ϵ=1e-11) {
  
  # Create constraint matrix
  N = length(π)
  p = ncol(X)
  A = t((π^(-1)) * X) 
  
  # Initialize variables
  t = 0
  π_t = π
  
  remaining_indices = which((π_t > ϵ) & (π_t < (1 - ϵ)))
  n_remaining = length(remaining_indices)
  
  # Identify vector (u_B) to move π for remaining cases,
  # and randomly pick direction (λ_1 or -λ_2)
  while ((t <= N) & (n_remaining > 0)) {
    
    u_B = get_null_space_vector(A[,remaining_indices,drop=FALSE])
    
    if (is.null(u_B)) {
      break
    }
    
    u_B_nonzero = (u_B != 0)
    
    vec_1 = (1 - π_t[remaining_indices][u_B_nonzero]) / u_B[u_B_nonzero]
    vec_2 = (π_t[remaining_indices][u_B_nonzero]) / u_B[u_B_nonzero]
    
    λ_1 = min(pmax(vec_1, -vec_2))
    λ_2 = min(pmax(-vec_1, vec_2))
    
    q_1 = λ_2/(λ_1 + λ_2)
    
    if (runif(n = 1) < q_1) {
      λ =  λ_1
    } else {
      λ = -λ_2
    }
    π_t[remaining_indices] = π_t[remaining_indices] + (λ * u_B)
    
    remaining_indices = which((π_t > ϵ) & (π_t < (1 - ϵ)))
    n_remaining = length(remaining_indices)
    t = t + 1
  }
  π_t[π_t > (1-ϵ)] = 1
  π_t[π_t < ϵ] = 0
  return(π_t)
}

# Julia

# Define function to conduct cube sampling
function cube_flight(π::Vector{Float64}, X::Matrix{Float64}, ϵ=1e-11)
 
    # Create constraint matrix
    N = length(π)
    p = size(X, 2)
    A = transpose((π.^(-1)) .* X) 
 
    # Initialize variables
    t = 0
    π_t = copy(π)
 
    remaining_indices = findall((π_t .> ϵ) .& (π_t .< (1 - ϵ)))
    n_remaining = length(remaining_indices)
 
    # Identify vector (u_B) to move π for remaining cases,
    # and randomly pick direction (λ_1 or -λ_2)
    while ((t <= N) & (n_remaining > 0))
 
        u_B = get_null_space_vector(A[:,remaining_indices])
 
        if (isnothing(u_B))
            break
        end
 
        u_B_nonzero = findall(u_B .!= 0)
 
        vec_1 = (1 .- π_t[remaining_indices][u_B_nonzero]) ./ u_B[u_B_nonzero]
        vec_2 = (π_t[remaining_indices][u_B_nonzero]) ./ u_B[u_B_nonzero]
 
        λ_1 = minimum(max.(vec_1, -vec_2))
        λ_2 = minimum(max.(-vec_1, vec_2))
 
        q_1 = λ_2/(λ_1 + λ_2)
 
        if (rand(Float64) < q_1)
            λ =  λ_1
        else 
            λ = -λ_2
        end
        π_t[remaining_indices] = π_t[remaining_indices] .+ (λ .* u_B)
 
        remaining_indices = findall((π_t .> ϵ) .& (π_t .< (1 - ϵ)))
        n_remaining = length(remaining_indices)
        t += 1
    end
    π_t[π_t .> (1-ϵ)] .= 1
    π_t[π_t .< ϵ] .= 0
    return π_t
end

Fast Flight Phase Code

The “fast flight phase” algorithm is a particular implementation of the flight phase that avoids the costly problem of conducting a SVD of the entire constraint matrix. Instead, it breaks that matrix up into chunks of \(p+1\) cases, where \(p\) is the number of columns of \(X\), and conducts the regular flight phase on that chunk. After the flight phase has completed for that chunk, a new chunk is created by gathering up \(p+1\) unresolved cases (including unresolved cases from any previous chunks), and then a regular flight phase is applied to that chunk. This continues until only \(p\) unresolved cases are left.

# R

# Conducts 'fast flight phase' by applying flight phase
# on only (p+1) cases at a time, where p = # columns of X
cube_fast_flight <- function(π, X, ϵ = 1e-11, π_0 = π) {
  
  # Create constraint matrix
  N = length(π)
  p = ncol(X)
  A = t((π^(-1)) * X)
  
  # Initialize variables
  t = 0
  π_t = π
  
  remaining_indices = which((π_t > ϵ) & (π_t < (1 - ϵ)))
  n_remaining = length(remaining_indices)
  
  # Conduct flight phase only on (p+1) cases at a time
  while (n_remaining > p) {
    
    B_t = remaining_indices[seq_len(p+1)]
    X_t = (π_t[B_t] / π_0[B_t]) * X[B_t,,drop=FALSE]
    
    π_t[B_t] = cube_flight(π_t[B_t], X_t)
    
    remaining_indices = which((π_t > ϵ) & (π_t < (1 - ϵ)))
    n_remaining = length(remaining_indices)
    t = t + 1
  }
  return(π_t)
}

# Julia

# Conducts 'fast flight phase' by applying flight phase
# on only (p+1) cases at a time, where p = # columns of X
function cube_fast_flight(π::Vector{Float64}, X::Matrix{Float64}; ϵ = 1e-11, π_0 = π)
 
    # Create constraint matrix
    N = length(π)
    p = size(X, 2)
    A = transpose((π.^(-1)) .* X)
 
    # Initialize variables
    t = 0
    π_t = copy(π)
 
    remaining_indices = findall((π_t .> ϵ) .& (π_t .< (1 - ϵ)))
    n_remaining = length(remaining_indices)
 
    # Conduct flight phase only on p+1 elements at a time
    while (n_remaining > p)
 
        B_t = remaining_indices[begin:(p+1)]
        X_t = (π_t[B_t] ./ π_0[B_t]) .* X[B_t,:]
 
        π_t[B_t] = cube_flight(π_t[B_t], X_t)
 
        remaining_indices = findall((π_t .> ϵ) .& (π_t .< (1 - ϵ)))
        n_remaining = length(remaining_indices)
        t += 1
    end
    return π_t
end

Landing Phase Code

At the end of the fast flight phase, there will still be a few unresolved cases. The landing phase resolves these cases by relaxing balancing constraints as needed: columns of \(X\) are removed and a flight phase is applied to the updated matrix \(X\) until every case has been resolved.

# R
# Conducts landing phase of cube method,
# by dropping columns of X and applying fast flight phase
# until all cases have been finalized
cube_land <- function(π, π_0, X, ϵ = 1e-11) {
  
  # Get dimensions of problem
  N = length(π)
  p = ncol(X)
  
  # Initialize variables
  t = 0
  π_t = π
  remaining_indices = which((π_t > ϵ) & (π_t < (1-ϵ)))
  n_remaining = length(remaining_indices)
  n_dropped_variables = 0
  
  while ((n_remaining > 0)) {
    
    # Drop variables as needed
    n_dropped_variables = n_dropped_variables + 1
    p = p - 1
    
    π_t[remaining_indices] = cube_fast_flight(π_t[remaining_indices], 
                                              X[remaining_indices,seq_len(p),drop=FALSE],
                                              π_0 = π_0[remaining_indices])
    
    remaining_indices = which((π_t > ϵ) & (π_t < (1-ϵ)))
    n_remaining = length(remaining_indices)
  }
  
  a = π_t
  a[a > (1-ϵ)] = 1
  a[a < ϵ] = 0
  return(as.integer(round(a, digits = 0)))
}

# Julia
# Conducts landing phase of cube method,
# by dropping columns of X and applying fast flight phase
# until all cases have been finalized
function cube_land(π::Vector{Float64}, π_0::Vector{Float64}, X::Matrix{Float64}; ϵ = 1e-11)
 
    # Get dimensions of problem
    N = length(π)
    p = size(X, 2)
 
    # Initialize variables
    t = 0
    π_t = copy(π)
    remaining_indices = findall((π_t .> ϵ) .& (π_t .< (1-ϵ)))
    n_remaining = length(remaining_indices)
    n_dropped_variables = 0
 
    while ((n_remaining > 0))
 
        # Drop variables as needed
        n_dropped_variables += 1
        p -= 1
 
        π_t[remaining_indices] = cube_fast_flight(π_t[remaining_indices], 
                                                  X[remaining_indices,1:p];
                                                  π_0 = π_0[remaining_indices])
 
        remaining_indices = findall((π_t .> ϵ) .& (π_t .< (1-ϵ)))
        n_remaining = length(remaining_indices)
    end
 
    a = π_t
    a[a .> (1-ϵ)] .= 1
    a[a .< ϵ] .= 0
    return Int64.(round.(a; digits = 0))
end

Overall Fast Cube Function

The entire fast cube algorithm is simply the combination of a fast flight phase followed by the landing phase.

# R

# Combined flight and landing phase of cube method
fast_cube <- function(π, X) {
  π_f = cube_fast_flight(π, X)
  a = cube_land(π_f, π, X)
  return(a)
}

# Julia

# Combined flight and landing phase of cube method
function fast_cube(π::Vector{Float64}, X::Matrix{Float64})
    π_f = cube_fast_flight(π, X)
    a = cube_land(π_f, π, X)
    return a
end

Comparing Speed

Now let’s see how fast these two implementations are. We’ll create a small population to sample from, defined the same in both R and Julia.

Creating Example Data

# R
mvrnorm <- MASS::mvrnorm

# Generate a population to sample from
N = 10000
X = mvrnorm(
  n = N,
  mu = c(x1 = 0, x2 = 0, x3 = 0), 
  Sigma = matrix(
    c(1, 0.5, 0,
      0.5, 1, 0,
      0,   0, 1),
    3, 3, byrow = TRUE
  )
)

# Generate sampling probabilities
π = runif(N)
π = 100 * π/sum(π)

# Create matrix of balancing variables
X = cbind(π, X)

head(X)

              π         x1         x2         x3
[1,] 0.01450594  0.8420238  0.4079737 -0.8989027
[2,] 0.01520158  0.9075647  0.1410441  0.2826768
[3,] 0.01464616  0.6394842 -0.9563048  1.5928478
[4,] 0.01554473  2.3683016  2.5556251  1.0847491
[5,] 0.01318031 -0.3435753 -0.5042470  1.1415666
[6,] 0.01866831 -0.4933864 -0.1068191 -1.6905893

# Julia
import Distributions: MvNormal

# Generate a population to sample from
N = 10000
X = rand(MvNormal([0, 0, 0], 
                  [1   0.5 0
                   0.5 1   0
                   0   0   1]),
        N)
X = transpose(X)

# Generate sampling probabilities
π = rand(N)
π = 100 .* π/sum(π)

# Create matrix of balancing variables
X = hcat(π, X)

Benchmarking

Now we have the actual benchmarks below. I used the ‘microbenchmark’ R package to do the benchmarking in R, and in Julia I used the ‘BenchmarkTools’ package. Below is the code and the output from R and Julia, respectively.

# R
library(microbenchmark)

# Run once as an example
  fast_cube(π, X) |> head()

[1] 0 0 0 0 0 0

# Use benchmarking function
  microbenchmark(
    fast_cube(π, X),
    times = 100
  )

Unit: seconds
            expr      min       lq     mean   median       uq      max neval
 fast_cube(π, X) 2.134629 2.203176 2.276864 2.246024 2.322998 2.714758   100

# Julia
using BenchmarkTools

# Run once as an example
  fast_cube(π, X)

# Use benchmarking function
  @benchmark fast_cube($π, $X) seconds=10

BenchmarkTools.Trial: 20 samples with 1 evaluation.
 Range (min … max):  495.603 ms … 547.734 ms  ┊ GC (min … max): 7.60% … 8.18%
 Time  (median):     522.724 ms               ┊ GC (median):    7.92%
 Time  (mean ± σ):   524.520 ms ±  16.945 ms  ┊ GC (mean ± σ):  7.90% ± 0.52%
 Memory estimate: 543.50 MiB, allocs estimate: 657941.

On this example dataset, the Julia implementation was about 5 times faster than the base R implementation, despite the code being nearly identical. This can be attributed to the compiler optimizations that occur when Julia compiles these functions.

Interestingly, both my R and Julia implementations are much slower than the comparable R function in the ‘BalancedSampling’ package, which uses C++ and the Rcpp package. That package’s function runs about 20-30 times faster than the Julia implementation (see the benchmark below). It’s truly incredible what Anton Grafström was able to do with C++.

In the R code below, I use the ‘JuliaConnectoR’ package to run Julia from R, and benchmark all three of these cube method implementations. The code passes data from R to a Julia session, and then runs the Julia function in that session. For comparison, in R it runs the base R implementation of the cube method as well as the R/C++ implementation from the BalancedSampling package.

library(JuliaConnectoR)

# Launch Julia and run a file defining the cube sampling functions
  juliaCall("include", normalizePath("cube-method.jl"))

# Pass R objects to Julia
  π_jl = juliaPut(π)
  X_jl = juliaPut(X)
  
# Benchmark the three different cube method implementations
microbenchmark(
  'Julia'      = juliaCall("fast_cube", π_jl, X_jl),
  'R'          = fast_cube(π, X),
  'R with C++' = BalancedSampling::cube(π, X),
  times = 100
)

Unit: milliseconds
       expr       min         lq       mean     median         uq       max
      Julia  450.0307  475.67180  520.02278  485.45690  495.78540 3172.2622
          R 2160.5454 2211.14060 2274.02426 2249.03120 2342.06145 2489.0466
 R with C++   12.0273   13.11795   16.10482   13.79765   14.46885  160.1392
 neval cld
   100  b 
   100   c
   100 a

Integrating R and Julia

For my application, it could be helpful to use the Julia sampling function while otherwise working in R. A nice way to do this is to use the ‘JuliaConnectoR’ package to create an R function that uses Julia under the hood. This is distinct from what we did earlier with juliaCall("fast_cube", π_jl, X_jl), which simply told our Julia session to run a Julia function on data that was already available in the Julia session. If we want to create an R function, we simply call juliaFun("fast_cube"), which returns an R function that under-the-hood passes data from R to Julia, runs the Julia function fast_cube(), and then passes the result back to R. Here’s a small example of that.

# Create an R function based on calling a Julia function
julia_fast_cube <- juliaFun("fast_cube")

# Use the R function once as an example
julia_fast_cube(π, X) |> head()

[1] 0 0 0 0 0 0

This approach works pretty nicely when I’m otherwise just doing everything in R. And speedwise it’s barely slower than just running the Julia function directly.

# Compare the R function call to the Julia-based R function
microbenchmark(
  'R+Julia' = julia_fast_cube(π, X),
  'Julia'   = juliaCall("fast_cube", π_jl, X_jl),
  'R'       = fast_cube(π, X),
  times = 100
)

Unit: milliseconds
    expr       min        lq      mean    median        uq       max neval cld
 R+Julia  460.5675  473.6736  482.4566  481.8651  490.5258  516.6900   100  a 
   Julia  461.7137  475.1873  485.7026  482.8503  488.7539  684.1864   100  a 
       R 2133.4158 2176.3700 2232.1948 2201.4050 2305.1346 2349.7402   100   b

The biggest drawback is that, unlike integrations with C or C++, using a Julia backend for R/Python functions requires you to have Julia installed along with any Julia packages that your code uses.

Miscellaneous Observations

Despite how nice it was to get a quick speedup from using Julia, I’m still left with a number of reservations about using Julia in production. For one thing, reading and writing Julia code has some weird barriers that make it harder to read than it could be. Most notably, the language has some unfortunate design decisions about function argument names. And as you’d soon discover when working with Julia, the error message system is far from user friendly for your typical data analyst: the typical error message is an inscrutable message followed by dozens of unhelpful stack trace lines that users have to scroll through in order to get to the inscrutable message at the top. In addition, there are ongoing correctness concerns for the language and its core packages.

All this said, I do quite like a lot of the fundamental language design features of Julia and I am hoping the user-friendliness and trustworthiness of the language and its package ecosystem continues to improve. It’s still not at a point where I could recommend it to others at work for everyday usage; that day is probably a few years out yet. But for certain statistics applications like simulation studies it’s a nice tool to have available.

Footnotes

Mainly, Julia’s readability suffers from weird restrictions around named arguments of functions. The vast majority of functions don’t let you type something readable like mean(x = age, group = race, df = data) and instead requires you to type mean(age, data, race), with the expectation that you’ll memorize the positions of all the key function arguments or else read the documentation every time you use the function.. Which is doubly annoying because Julia documentation is frequently sparse and lacks consistent conventions, even for core packages like ‘LinearAlgebra’.↩︎
Note that I don’t use the distance-based sorting method which is the focus of that paper; I’m just using their nice algorithm write-ups for the cube method.↩︎