Writing a Fit Configuration

The fit configuration YAML file is the central input to the entire SBI workflow. It defines the measurement, samples, systematic uncertainties, analysis regions, training features, and pointers to trained density-ratio models. The ConfigManager reads this file and exposes its contents through a set of accessor methods that are consumed at every stage of the pipeline:

Data preparation — the datasets module reads sample paths, ROOT tree names, weight branches, and systematic variation files from the config to load and organise the data.
Preselection training — training features and feature scaling lists are retrieved via ConfigManager.get_training_features().
Density-ratio training — basis processes (get_basis_samples()), the reference hypothesis (get_reference_samples()), training features, and region filters (get_channel_filters()) are all read from the config.
Systematic uncertainty training — the Systematics block and get_samples_in_syst_for_training() determine which processes are affected by each variation and where the varied ROOT files are located.
Evaluation — Asimov weight paths (get_channel_asimov_weight_path()) and trained model metadata come from the config.
Workspace building — the WorkspaceBuilder assembles the full statistical model from all of the above, producing a workspace dictionary that the statistical model consumes.

This page is the canonical reference for the YAML format. For the Python API, see Workspace Builder API. For a hands-on walkthrough, see Model Building Example.

Measurement and parameters

The General block names the measurement and declares which parameters enter the fit:

General:
  Measurement:
    Name: higgs_tautau_signal_strength
    POI: mu_htautau
    ParametersToFit:
      - mu_htautau
      - mu_ztautau
      - mu_ttbar
      - TES
      - JES

POI — the parameter of interest (signal strength).
ParametersToFit — only these parameters appear in the likelihood. Omitting a parameter here effectively fixes it at its nominal value.

Samples

Each row in Samples is a physics process read from a ROOT file:

Samples:
  - Name: htautau
    Tree: tree_htautau
    SamplePath: ./saved_datasets/dataset_nominal.root
    Weight: weights
    UseAsReference: True
    UseAsBasis: True

UseAsBasis — this process gets its own density-ratio network and normalization factor.
UseAsReference — the denominator process in the density ratio \(r(x) = p / p_{\text{ref}}\). (Optional, users can choose to pass their own reference hypothesis when just using APIs from the toolkit).

Normalization factors

Free multiplicative parameters (signal strengths, background norms):

NormFactors:
  - Name: mu_htautau
    Samples: htautau
    Nominal: 1
    Bounds: [0, 10]

These are unconstrained in the fit — no Gaussian penalty term.

Systematic uncertainties

Shape + normalization uncertainties point to the up/down varied ROOT files:

Systematics:
  - Name: JES
    Type: NormPlusShape
    Nominal: 0
    Samples: [htautau, ztautau, ttbar]
    Up:
      - SampleName: htautau
        Path: ./saved_datasets/dataset_JES_up.root
        Tree: tree_htautau
        Weight: weights
    Dn:
      - SampleName: htautau
        Path: ./saved_datasets/dataset_JES_dn.root
        Tree: tree_htautau
        Weight: weights

Nominal: 0 — the nuisance parameter starts at zero (the Gaussian constraint is centred here).
The workspace builder computes variation ratios (varied / nominal) automatically from the histograms.

Analysis regions

Regions define event selections and whether the channel is binned or unbinned:

Regions:
  # Binned control region
  - Name: CR
    Filter: presel_score < -1.0
    Variable: presel_score
    Type: binned
    Binning: [-8.75, -4.0, -2.5, -1.0]

  # Unbinned SBI signal region
  - Name: SR
    Filter: presel_score >= -1.0 & presel_score <= 4.5
    Type: unbinned
    AsimovWeights: ./saved_datasets/asimov_weights.npy
    TrainedModels:
      - SampleName: htautau
        Nominal:
          Ratios: ./saved_datasets/.../ratio_htautau.npy
        Systematics:
          - SystName: JES
            RatiosUp: ./saved_datasets/.../ratio_htautau.npy
            RatiosDn: ./saved_datasets/.../ratio_htautau.npy

For unbinned regions, TrainedModels points to the pre-evaluated density-ratio .npy arrays produced by the evaluation pipeline step. For binned regions, the workspace builder histograms the data automatically using Variable and Binning.

Training features

Which branches from the ROOT file are used as NN inputs:

TrainingFeatures:
  - DER_mass_transverse_met_lep
  - log_DER_mass_vis
  - log_DER_pt_h

TrainingFeaturesToStandardize:
  - DER_mass_transverse_met_lep
  - log_DER_mass_vis

Features listed in TrainingFeaturesToStandardize are z-scored before being passed to the network. Features not listed are passed through unchanged.