Model Building Example

The examples/model_building_example/ directory provides a standalone walkthrough of workspace construction, model building, and profile-likelihood fitting — without requiring the full training pipeline. It is the recommended starting point for understanding how the fitting stage works.

Prerequisites

You need four ingredients, all of which are included in the example directory (via symlinks to the main dataset):

A fit configuration YAML file (config_fit_nsbi.yml or config_fit_histogram.yml).
Pre-computed density-ratio .npy files (nominal + systematic variations) — produced by the training pipeline or any external source.
Asimov weights (or real-data weights) for the unbinned region.
ROOT files containing the MC samples used by binned channels.

Note

The saved_datasets/ directory uses symlinks into the parent FAIR_universe_Higgs_tautau/saved_datasets/ directory to avoid duplicating large files. After a fresh git clone, make sure to run git lfs pull so that the LFS-tracked ROOT and NumPy files are downloaded, not just pointer stubs.

Directory layout

model_building_example/
  config_fit_histogram.yml          # Binned-only fit configuration
  config_fit_nsbi.yml               # Combined binned + unbinned (NSBI) fit
  1_workspace_building.ipynb        # Notebook: build workspaces from config
  2_parameter_fitting.ipynb         # Notebook: fit and profile scan
  saved_datasets/
    asimov_weights.npy              # Per-event weights (unbinned region)
    dataset_nominal.root            # Nominal MC (binned channels)
    dataset_JES_up.root             # Systematic variation ROOT files
    dataset_JES_dn.root
    dataset_TES_up.root
    dataset_TES_dn.root
    output_training_nominal/
      output_ratios_<sample>/
        ratio_<sample>.npy          # Nominal density ratios
    output_training_systematics/
      output_ratios_<sample>_<syst>_<dir>/
        ratio_<sample>.npy          # Systematic density ratios

Quick start

from nsbi_common_utils import workspace_builder, models, inference

# 1. Build a workspace from the YAML config
ws = workspace_builder.WorkspaceBuilder(
    config_path="config_fit_nsbi.yml"
).build()

# 2. Initialise the statistical model (JAX-compiled NLL)
model = models.sbi_parametric_model(
    workspace=ws, measurement_to_fit="my_measurement"
)

# 3. Fit
params, init_vals = model.get_model_parameters()
fitter = inference.inference(
    model_nll=model.model,
    model_grad=model.model_grad,
    initial_values=init_vals,
    list_parameters=params,
    num_unconstrained_params=model.num_unconstrained_param,
)
fitter.perform_fit()

Step 1 — Build the workspace

The WorkspaceBuilder reads the YAML config, loads ROOT datasets and density-ratio arrays, and assembles a JSON-serialisable workspace dictionary:

builder = workspace_builder.WorkspaceBuilder(config_path="config_fit_nsbi.yml")
ws = builder.build()

# Optionally persist to disk so you can skip this step next time
builder.dump_workspace(ws, "workspace_nsbi.json")

# Re-load later without re-reading ROOT files
ws = workspace_builder.WorkspaceBuilder.load_workspace("workspace_nsbi.json")

See Workspace Builder API for the full API.

Step 2 — Initialise the model

sbi_parametric_model parses the workspace, stacks all histogram yields and density-ratio arrays onto the JAX device, and compiles a JIT-optimised negative log-likelihood function:

model = models.sbi_parametric_model(
    workspace=ws,
    measurement_to_fit="NSBI_measurement",
)

# Inspect the parameter ordering and starting values
param_names, init_values = model.get_model_parameters()

The compiled NLL is exposed as model.model(param_array) and its analytical gradient as model.model_grad(param_array).

See Statistical Models for the full API.

Step 3 — Fit and profile scan

inference wraps iminuit to perform the minimisation and profile-likelihood scans:

fitter = inference.inference(
    model_nll=model.model,
    model_grad=model.model_grad,
    initial_values=init_vals,
    list_parameters=param_names,
    num_unconstrained_params=model.num_unconstrained_param,
)

# Global fit
fitter.perform_fit()

# Profile likelihood scan of the POI
pts, nll, pts_stat, nll_stat = fitter.perform_profile_scan(
    parameter_name="mu_htautau",
    bound_range=(0, 3),
    size=50,
    doStatOnly=True,
)

See Parameter Fitting and Hypothesis Testing for the full API.

Fit configuration

The YAML config defines five sections consumed by the workspace builder. See Writing a Fit Configuration for the full specification; the key points are summarised here.

Measurement — which parameters to fit and the parameter of interest (POI).

Samples — physics processes (signal, backgrounds) with paths to ROOT files and tree names.

NormFactors — free normalisation parameters (one per sample or shared).

Systematics — nuisance parameters with paths to up/down ROOT variation files. Currently only NormPlusShape is supported.

Regions — analysis regions tagged as binned (control/signal regions built from histograms) or unbinned (signal region using density ratios). Unbinned regions reference the trained model outputs:

Regions:
- Name: SR
  Type: unbinned
  AsimovWeights: ./saved_datasets/asimov_weights.npy
  TrainedModels:
    - SampleName: htautau
      Nominal:
        Ratios: ./saved_datasets/output_training_nominal/output_ratios_htautau/ratio_htautau.npy
      Systematics:
        - SystName: JES
          RatiosUp: .../output_ratios_htautau_JES_Up/ratio_htautau.npy
          RatiosDn: .../output_ratios_htautau_JES_Dn/ratio_htautau.npy

Notebooks

The example ships with two Jupyter notebooks:

1_workspace_building.ipynb — walks through workspace construction for both the histogram-only and NSBI configurations, and serialises the workspaces to JSON.
2_parameter_fitting.ipynb — loads the workspaces, initialises both models, performs global fits (with JAX autodiff gradients via model_grad), runs profile-likelihood scans, and plots an NSBI-vs-histogram sensitivity comparison.