Introduction to HOIF: Higher-Order Influence Function Estimators for the ATE
Xingyu Chen
2026-06-24
Source:vignettes/hoif-vignette.Rmd
hoif-vignette.RmdIntroduction
The HOIF package implements Higher-Order Influence Function (HOIF) estimators for Average Treatment Effect (ATE) estimation in causal inference. This methodology extends the doubly robust (DR) estimator—also known as the Augmented Inverse Propensity Weighted (AIPW) or Double Machine Learning (DML) estimator—by systematically accounting for higher-order bias terms.
Background
The HOIF methodology was developed through a series of seminal works by James M. Robins and collaborators:
- Robins et al. (2008): Higher order influence functions and minimax estimation of nonlinear functionals
- Robins et al. (2017): Minimax estimation of a functional on a structured high-dimensional model
- Liu et al. (2017): Semiparametric efficient empirical higher order influence function estimators
- Liu & Li (2023): New √n-consistent, numerically stable higher-order influence function estimators
The key idea is to represent the bias of the standard DR estimator as a sum of higher-order influence functions, which can be estimated and used to debias the original estimator.
The computation of the higher-order U-statistics that arise in HOIF
estimators is delegated to the ustats
package; the algorithm and its computational complexity are analyzed
in:
- Chen, Zhang & Liu (2025): On computing and the complexity of computing higher-order U-statistics, exactly
Key Features
- Flexible basis expansion: Supports B-splines, Fourier basis, or raw covariates
- Multiple inversion methods: Direct, nonlinear shrinkage, or corpcor regularization
- Sample splitting support: Optional K-fold cross-fitting between the estimation of the inverse weighted Gram matrix and the computation of the U-statistics (eHOIF; see the Sample Splitting section)
-
Dual computational backends: Fast Python backend
via the
ustatspackage, or a pure R fallback - Any order with the Python backend (pure R mode supports orders up to 6)
Mathematical Background
The HOIF Framework
Let denote the standard first-order doubly robust (AIPW/DML) estimator of , and . Conditional on the estimated nuisance functions, this estimator is biased; HOIF estimates the estimable part of that bias.
For each treatment arm , the -th order HOIF bias-correction term is
and the corresponding correction for the ATE is their difference:
is the -th order HOIF estimator of the estimable bias of the AIPW estimator of the ATE — it is not itself an estimator of the ATE. The sign convention is such that adding it to the AIPW estimate removes that bias:
In the output of hoif_ate(), the components
HOIF1, HOIF0 and ATE correspond
to
,
and
for
(the component name ATE is kept for backward compatibility
— it holds the ATE bias correction, not the ATE itself).
U-Statistics Formulation
The core computational building block is the U-statistic:
where:
- is the treatment indicator
- is the inverse propensity weight residual
- is the outcome residual
- are the transformed covariates
- is the inverse weighted Gram matrix
The full algorithmic workflow, mathematical formulas, and all
parameters are documented in the PDF shipped with the package
(system.file("extdoc", "HOIF.pdf", package = "HOIF"), also
available on
GitHub).
Sample Splitting
Conceptually, HOIF estimation involves three separate estimation tasks, and ideally each would use its own, independent part of the data:
- The nuisance functions: estimating , and ;
- The inverse weighted Gram matrix ;
- The higher-order U-statistics built from .
This package deliberately does not implement task 1:
hoif_ate() only takes the predicted values
mu1, mu0, pi as inputs, and how
(and on which sample) the nuisance functions are estimated is entirely
up to the user. Accordingly, the full three-way cross-fitting is
not performed by the package; the
sample_split argument only controls the split between tasks
2 and 3:
-
sample_split = TRUE(eHOIF): the sample passed tohoif_ate()is split inton_foldsfolds . For each fold , is estimated on the observations not in , the U-statistics are computed on the observations in , and the results are averaged across the folds. -
sample_split = FALSE(sHOIF): and the U-statistics are computed on the same sample, without distinction.
The Quick Start example below follows this structure: the nuisance
functions are fitted on one half of the data, and
hoif_ate() is run on the other half.
Installation
# From CRAN:
install.packages("HOIF")
# Or the development version from GitHub (also installs the ustats R package):
# install.packages("devtools")
devtools::install_github("cxy0714/HOIF")Setting up the Python backend
The default backend computes the higher-order U-statistics in Python
via the ustats package. The Python dependencies
(u-stats, numpy, torch) are
provisioned automatically: with reticulate
(>= 1.41) installed, the first call that needs Python (e.g. the first
hoif_ate() call) downloads a private Python together with
the required packages into a cached environment and reuses it in later
sessions. No manual setup is needed in most cases:
library(HOIF)
# First call provisions the Python environment automatically:
# results <- hoif_ate(...)If you prefer a persistent, dedicated environment, or want to control
how PyTorch is installed (CPU-only build vs. CUDA), use the helpers from
ustats:
ustats::setup_ustats() # CPU-only PyTorch (small download)
ustats::setup_ustats(gpu = TRUE) # default PyPI PyTorch (CUDA on Linux)
ustats::check_ustats_setup() # verify the environmentYou can also point reticulate to your own Python/conda
environment (with u-stats installed via
pip install u-stats) before Python
initializes:
reticulate::use_condaenv("your_env_name", required = TRUE)
# or set the RETICULATE_PYTHON environment variableSee vignette("ustats", package = "ustats") for a
complete installation guide and troubleshooting tips.
No Python at all?
Set pure_R_code = TRUE in hoif_ate(): the
U-statistics are then computed with a pure R implementation (orders up
to 6) and no Python runtime is required.
Quick Start Example
Let’s demonstrate the recommended workflow with a simulated example,
following the three-task structure described in the Sample Splitting
section above: the nuisance functions are fitted on one half of the data
(the nuisance sample), hoif_ate() is run on the
other half (the estimation sample), and within the estimation
sample we compute both the eHOIF and the sHOIF estimators.
To make the role of the HOIF correction visible, the nuisance models are deliberately misspecified (they use only 3 of the 10 confounders), so the first-order AIPW estimator carries a clear bias — which the HOIF terms then estimate and remove.
Generate Simulated Data
library(HOIF)
set.seed(123)
n <- 2000
p <- 10
# Covariates (all of them are confounders)
X <- matrix(rnorm(n * p), ncol = p)
# True propensity score and outcome regressions: linear, loading on
# ALL covariates
beta_pi <- c(0.3, -0.2, 0.2, rep(0.25, 7))
beta1 <- c(0.5, 0.4, 0.3, rep(0.4, 7))
beta0 <- c(0.3, 0.2, 0.1, rep(0.3, 7))
true_pi <- plogis(as.vector(X %*% beta_pi))
A <- rbinom(n, 1, true_pi)
mu1_true <- as.vector(1 + X %*% beta1)
mu0_true <- as.vector(X %*% beta0)
Y <- A * mu1_true + (1 - A) * mu0_true + rnorm(n, 0, 0.2)
# True targets
psi1_true <- mean(mu1_true)
psi0_true <- mean(mu0_true)
true_ate <- psi1_true - psi0_true
cat("True E[Y(1)]:", round(psi1_true, 4), "\n")
#> True E[Y(1)]: 0.9799
cat("True E[Y(0)]:", round(psi0_true, 4), "\n")
#> True E[Y(0)]: -0.016
cat("True ATE: ", round(true_ate, 4), "\n")
#> True ATE: 0.9958Split the Sample
The nuisance sample is used only to fit the nuisance
functions; the estimation sample is used only by
hoif_ate(). This keeps the nuisance predictions independent
of the sample on which the HOIF estimator is computed.
Estimate (Misspecified) Nuisance Functions on the Nuisance Sample
The package only consumes the predicted values of the nuisance functions; in practice, any flexible machine learning method can be used to produce them. Here, to mimic misspecification, the working models use only the first 3 covariates and ignore the remaining confounders:
S <- 1:3 # the working models ignore covariates 4..10
# Outcome regressions, fitted on the nuisance sample only
fit_mu1 <- lm(Y_nuis ~ X_nuis[, S], subset = A_nuis == 1)
fit_mu0 <- lm(Y_nuis ~ X_nuis[, S], subset = A_nuis == 0)
# Propensity score, fitted on the nuisance sample only
fit_pi <- glm(A_nuis ~ X_nuis[, S], family = binomial)
# Predict all nuisance functions on the estimation sample
mu1_hat <- as.vector(cbind(1, X_est[, S]) %*% coef(fit_mu1))
mu0_hat <- as.vector(cbind(1, X_est[, S]) %*% coef(fit_mu0))
pi_hat <- as.vector(plogis(cbind(1, X_est[, S]) %*% coef(fit_pi)))
# Ensure propensity scores are bounded away from 0 and 1
pi_hat <- pmax(pmin(pi_hat, 0.95), 0.05)The First-Order AIPW Estimator and its Bias
The HOIF terms estimate the estimable bias of the standard first-order doubly robust (AIPW/DML) estimator, so let us compute that estimator first, together with its actual error:
psi1_aipw <- mean(mu1_hat + A_est / pi_hat * (Y_est - mu1_hat))
psi0_aipw <- mean(mu0_hat + (1 - A_est) / (1 - pi_hat) * (Y_est - mu0_hat))
ate_aipw <- psi1_aipw - psi0_aipw
cat("AIPW estimates: E[Y(1)] =", round(psi1_aipw, 4),
" E[Y(0)] =", round(psi0_aipw, 4),
" ATE =", round(ate_aipw, 4), "\n")
#> AIPW estimates: E[Y(1)] = 1.1749 E[Y(0)] = -0.2676 ATE = 1.4425
cat("AIPW errors: E[Y(1)] =", round(psi1_aipw - psi1_true, 4),
" E[Y(0)] =", round(psi0_aipw - psi0_true, 4),
" ATE =", round(ate_aipw - true_ate, 4), "\n")
#> AIPW errors: E[Y(1)] = 0.195 E[Y(0)] = -0.2516 ATE = 0.4466Because of the misspecified nuisance models, the AIPW estimator misses the true ATE by about 0.45 — a bias that no amount of data will remove. We now estimate this bias with HOIF.
Compute the eHOIF Estimator (with sample splitting)
Within the estimation sample, hoif_ate() cross-fits the
inverse weighted Gram matrix against the U-statistics over
n_folds folds:
results_ehoif <- hoif_ate(
X = X_est,
A = A_est,
Y = Y_est,
mu1 = mu1_hat,
mu0 = mu0_hat,
pi = pi_hat,
transform_method = "none", # Use raw covariates
inverse_method = "direct",
m = 7, # Compute up to 7th order
sample_split = TRUE,
n_folds = 2,
seed = 42,
backend = "torch" # Use Python backend
)
print(results_ehoif)
#> HOIF Estimators for Average Treatment Effect
#> =============================================
#>
#> Higher-order correction terms by order:
#> Order ATE HOIF1 HOIF0
#> 1 2 -0.4621 -0.2615 0.2006
#> 2 3 -0.4050 -0.2341 0.1709
#> 3 4 -0.4335 -0.2480 0.1855
#> 4 5 -0.4272 -0.2443 0.1829
#> 5 6 -0.4289 -0.2456 0.1833
#> 6 7 -0.4293 -0.2455 0.1838
#>
#> Estimated AIPW bias correction for the ATE (highest order): -0.4293
#> (add this value to the first-order AIPW/DR estimate of the ATE to debias it)Compute the sHOIF Estimator (without sample splitting)
Still using the independent nuisance predictions from above; only the split between the Gram matrix and the U-statistics is dropped:
results_shoif <- hoif_ate(
X = X_est,
A = A_est,
Y = Y_est,
mu1 = mu1_hat,
mu0 = mu0_hat,
pi = pi_hat,
transform_method = "none",
inverse_method = "direct",
m = 7,
sample_split = FALSE,
backend = "torch"
)
print(results_shoif)
#> HOIF Estimators for Average Treatment Effect
#> =============================================
#>
#> Higher-order correction terms by order:
#> Order ATE HOIF1 HOIF0
#> 1 2 -0.4318 -0.2496 0.1821
#> 2 3 -0.4458 -0.2568 0.1890
#> 3 4 -0.4373 -0.2520 0.1853
#> 4 5 -0.4362 -0.2515 0.1848
#> 5 6 -0.4366 -0.2517 0.1849
#> 6 7 -0.4367 -0.2517 0.1849
#>
#> Estimated AIPW bias correction for the ATE (highest order): -0.4367
#> (add this value to the first-order AIPW/DR estimate of the ATE to debias it)Debias the AIPW Estimator
The reported HOIF1/HOIF0/ATE
values are the higher-order influence function terms of orders 2 to
— they are not by themselves point estimates of
,
or the ATE. They estimate the estimable bias of the AIPW estimator, with
the sign convention that adding them to the AIPW
estimates removes it:
psi1_ehoif <- psi1_aipw + tail(results_ehoif$HOIF1, 1)
psi0_ehoif <- psi0_aipw + tail(results_ehoif$HOIF0, 1)
psi1_shoif <- psi1_aipw + tail(results_shoif$HOIF1, 1)
psi0_shoif <- psi0_aipw + tail(results_shoif$HOIF0, 1)
comparison <- data.frame(
row.names = c("E[Y(1)]", "E[Y(0)]", "ATE"),
Truth = c(psi1_true, psi0_true, true_ate),
AIPW = c(psi1_aipw, psi0_aipw, ate_aipw),
eHOIF = c(psi1_ehoif, psi0_ehoif, psi1_ehoif - psi0_ehoif),
sHOIF = c(psi1_shoif, psi0_shoif, psi1_shoif - psi0_shoif)
)
round(comparison, 4)
#> Truth AIPW eHOIF sHOIF
#> E[Y(1)] 0.9799 1.1749 0.9294 0.9232
#> E[Y(0)] -0.0160 -0.2676 -0.0838 -0.0826
#> ATE 0.9958 1.4425 1.0132 1.0058
# Errors relative to the truth
round(sweep(comparison[, -1], 1, comparison$Truth, "-"), 4)
#> AIPW eHOIF sHOIF
#> E[Y(1)] 0.1950 -0.0505 -0.0567
#> E[Y(0)] -0.2516 -0.0678 -0.0666
#> ATE 0.4466 0.0174 0.0100The HOIF correction removes essentially all of the AIPW bias for the ATE (error about ) and the bulk of it for each treatment arm.
Visualize Convergence
plot(results_ehoif)Main Function: hoif_ate()
The primary interface for computing HOIF estimators.
Arguments
- X: Covariate matrix ()
- A: Treatment vector (0 or 1)
- Y: Outcome vector
- mu1, mu0: Predicted outcomes under treatment and control
- pi: Predicted propensity scores
-
transform_method: Covariate transformation method
-
"none": No transformation (use raw covariates) -
"splines": B-splines basis expansion -
"fourier": Fourier basis expansion
-
- basis_dim: Number of basis functions (required if using splines or Fourier)
-
inverse_method: Method for Gram matrix inversion
-
"direct": Cholesky-based inversion (Moore-Penrose fallback) -
"nlshrink": Nonlinear shrinkage (Ledoit-Wolf) -
"corpcor": Shrinkage via corpcor package
-
- m: Maximum order (any order for the Python backend, 2-6 for pure R)
- sample_split: Logical; whether to cross-fit the Gram matrix against the U-statistics (eHOIF vs sHOIF, see the Sample Splitting section)
- n_folds: Number of folds for sample splitting
-
backend:
"torch"(default) or"numpy"for the Python backend -
dtype: Numeric precision of the Python backend
(
"float32","float64", orNULLfor automatic selection) - pure_R_code: Use pure R implementation (limited to order 6)
- seed: Random seed for reproducibility
Return Value
A list of class "hoif_ate" containing:
- ATE: Vector of ATE estimates for orders 2 to m
- HOIF1: Vector of treatment arm HOIF estimates
- HOIF0: Vector of control arm HOIF estimates
- IIFF1: Vector of treatment arm incremental influence function terms
- IIFF0: Vector of control arm incremental influence function terms
- orders: Vector of orders (2:m)
- convergence_data: Data frame for plotting
Advanced Usage
Using Basis Expansion
For more flexible modeling of covariate relationships:
# B-splines basis
results_splines <- hoif_ate(
X = X_est, A = A_est, Y = Y_est,
mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
transform_method = "splines",
basis_dim = 5, # 5 basis functions per covariate
degree = 3, # Cubic splines
m = 5,
sample_split = FALSE
)
# Fourier basis
results_fourier <- hoif_ate(
X = X_est, A = A_est, Y = Y_est,
mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
transform_method = "fourier",
basis_dim = 4, # 4 Fourier components per covariate
period = 1,
m = 5,
sample_split = FALSE
)Sample Splitting (Cross-Fitting)
To cross-fit the estimation of the inverse weighted Gram matrix against the computation of the U-statistics — the eHOIF estimator (see the Sample Splitting section under Mathematical Background for what exactly is split):
results_cf <- hoif_ate(
X = X_est, A = A_est, Y = Y_est,
mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
transform_method = "none",
m = 5,
sample_split = TRUE,
n_folds = 5, # 5-fold cross-fitting
seed = 42
)Regularized Gram Matrix Inversion
For high-dimensional settings or when facing numerical instability:
# Nonlinear shrinkage
results_nlshrink <- hoif_ate(
X = X_est, A = A_est, Y = Y_est,
mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
inverse_method = "nlshrink",
m = 5
)
# corpcor shrinkage
results_corpcor <- hoif_ate(
X = X_est, A = A_est, Y = Y_est,
mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
inverse_method = "corpcor",
m = 5
)Pure R Backend
If the Python environment is not available or for smaller problems:
results_pure_r <- hoif_ate(
X = X_est, A = A_est, Y = Y_est,
mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
pure_R_code = TRUE,
m = 6 # Maximum 6 for pure R
)Computational Details
Python Backend (ustats)
The default backend uses the ustats
R package, an interface to the Python package u-stats, for
efficient U-statistic computation:
- Supports PyTorch and NumPy backends (GPU acceleration with CUDA-enabled PyTorch)
- Handles arbitrary order U-statistics
- Optimized for memory and speed via Einstein summation
For HOIF estimators of the ATE, a key takeaway from Chen, Zhang & Liu (2025) is:
- when the order , the computational complexity is
- when the order , the computational complexity exceeds
where is the sample size and is the dimension of the transformed covariates.
Pure R Implementation
The pure R fallback (pure_R_code = TRUE):
- Implements orders 2-6 using recursive matrix operations
- Uses the
SMUTpackage for fast matrix multiplication - Removes diagonal contributions to ensure proper degenerate U-statistics
- Suitable for smaller datasets or when Python is unavailable
Performance Considerations
- Order: Higher orders increase computation time
- Basis dimension: Larger basis increases memory and computation
- Cross-fitting: Multiplies computation by number of folds
-
Precision: On a CUDA GPU the Python backend
defaults to
float32; passdtype = "float64"for maximum numerical precision
Practical Recommendations
Choosing the Transformation Method
- “none”: Use the covariates as provided (e.g. when you have already constructed your own basis)
- “splines”: Good for smooth nonlinear relationships
- “fourier”: Suitable for periodic patterns or high-frequency features
Choosing the Order
For orders the computational complexity does not exceed that of ordinary matrix computations, (Chen, Zhang & Liu, 2025), so going up to order 7 is cheap. A practical default is therefore to compute all orders up to (the package default) and inspect the convergence plot. Beyond order 7 the complexity exceeds , so expect substantially longer run times.
Use a GPU if Available
With backend = "torch" and a CUDA-enabled PyTorch
(e.g. installed via ustats::setup_ustats(gpu = TRUE)), the
U-statistics are computed on the GPU automatically — by far the most
effective speedup for large
or
.
On a GPU the default precision is float32; pass
dtype = "float64" if you need maximum numerical
precision.
Sample Splitting
See the Sample Splitting section under Mathematical Background: the
package only cross-fits the inverse weighted Gram matrix against the
U-statistics (sample_split = TRUE, eHOIF). The nuisance
functions are inputs — estimating them, ideally on a separate,
independent sample, is the user’s responsibility (as demonstrated in the
Quick Start example).
Troubleshooting
Python Backend Issues
Verify the Python environment used by ustats:
ustats::check_ustats_setup()If the environment cannot be repaired easily, switch to the pure R implementation:
results <- hoif_ate(
X = X_est, A = A_est, Y = Y_est,
mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
pure_R_code = TRUE,
m = 6 # Limited to order 6
)Numerical Instability
If you see warnings about matrix inversion:
# Try shrinkage methods
results <- hoif_ate(
X = X_est, A = A_est, Y = Y_est,
mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
inverse_method = "nlshrink" # or "corpcor"
)References
Robins, J. M., Li, L., Tchetgen Tchetgen, E., & van der Vaart, A. (2008). Higher order influence functions and minimax estimation of nonlinear functionals. Probability and Statistics: Essays in Honor of David A. Freedman, 335-421. doi:10.1214/193940307000000527
Robins, J. M., Li, L., Mukherjee, R., Tchetgen Tchetgen, E., & van der Vaart, A. (2017). Minimax estimation of a functional on a structured high-dimensional model. Annals of Statistics, 45(5), 1951-1987.
Liu, L., Mukherjee, R., Newey, W. K., & Robins, J. M. (2017). Semiparametric efficient empirical higher order influence function estimators. arXiv preprint doi:10.48550/arXiv.1705.07577
Liu, L., & Li, C. (2023). New √n-consistent, numerically stable higher-order influence function estimators. arXiv preprint doi:10.48550/arXiv.2302.08097
Chen, X., Zhang, R., & Liu, L. (2025). On computing and the complexity of computing higher-order U-statistics, exactly. arXiv preprint doi:10.48550/arXiv.2508.12627