Toolkit for Simulation-Based Inference

nsbi-common-utils provides building blocks for Simulation-Based Inference (SBI) analysis tailored to the statistical models used at the ATLAS and CMS experiments. It implements a semi-parametric approach to SBI in which the statistical model combines non-parametric density-ratio estimates with parametric HistFactory-style systematic uncertainties. The toolkit is modular, offering APIs for dataset preparation, density-ratio estimation, model building, and profiled-likelihood-ratio fitting.

Why Simulation-Based Inference?

Traditional LHC analyses compress high-dimensional collision data into summary observables, which can discard information sensitive to the parameters of interest. SBI uses neural networks to approximate the likelihood ratio directly from the full feature space — no hand-crafted summaries required. This toolkit makes SBI practical for LHC-scale analyses by embedding neural density-ratio estimates inside a HistFactory-style statistical model, combining unbinned SBI regions with traditional binned template regions in a single profiled-likelihood fit.

See the Overview for a detailed introduction, or the method and measurement papers: arXiv:2412.01600, arXiv:2412.01548.

Key features

  • Training pipeline — density-ratio neural network training with PyTorch Lightning, ensemble support, and calibration and reweighting diagnostics.

  • Statistical models — pyhf-like workspace specification supporting both binned (template-based) and unbinned (SBI-style) analysis regions with HistFactory-style systematic uncertainties, compiled to JAX for fast NLL evaluation and automatic differentiation.

  • Inference engine — profile-likelihood fits and NLL scans via iminuit with analytic gradients, plus plotting utilities.

  • Configuration-driven — a single YAML fit configuration defines samples, features, systematics, and regions, and is consumed across all stages of the pipeline via ConfigManager.

  • Workflow integration — HTCondor/DAGMan job descriptions for large-scale cluster submission, with end-to-end example pipelines.

Getting started

Install the package with pip:

pip install 'nsbi-common-utils @ git+https://github.com/iris-hep/NSBI-workflow-tutorial.git'

Or, for a full environment with all dependencies (recommended):

pixi install -e nsbi-env      # CPU-only
pixi install -e nsbi-env-gpu  # with CUDA support

Statistical Model Building with SBI

Once installed, building a statistical model and running a fit takes just a few lines:

from nsbi_common_utils import workspace_builder, models, inference

# Build workspace from the fit configuration (also used for data loading, training, etc.)
ws = workspace_builder.WorkspaceBuilder("config_fit.yml").build()

# Create the statistical model (JAX-compiled NLL)
model = models.sbi_parametric_model(workspace=ws, measurement_to_fit="my_meas")
params, init_vals = model.get_model_parameters()

# Fit and profile
fitter = inference.inference(
    model_nll=model.model,
    list_parameters=params,
    initial_parameter_values=init_vals,
)
fitter.perform_fit()

For a hands-on walkthrough, see the Model Building Example.