Writing a Fit Configuration

The fit configuration YAML file is the central input to the entire SBI workflow. It defines the measurement, samples, systematic uncertainties, analysis regions, training features, and pointers to trained density-ratio models. The ConfigManager reads this file and exposes its contents through a set of accessor methods that are consumed at every stage of the pipeline:

  • Data preparation — the datasets module reads sample paths, ROOT tree names, weight branches, and systematic variation files from the config to load and organise the data.

  • Preselection training — training features and feature scaling lists are retrieved via ConfigManager.get_training_features().

  • Density-ratio training — basis processes (get_basis_samples()), the reference hypothesis (get_reference_samples()), training features, and region filters (get_channel_filters()) are all read from the config.

  • Systematic uncertainty training — the Systematics block and get_samples_in_syst_for_training() determine which processes are affected by each variation and where the varied ROOT files are located.

  • Evaluation — Asimov weight paths (get_channel_asimov_weight_path()) and trained model metadata come from the config.

  • Workspace building — the WorkspaceBuilder assembles the full statistical model from all of the above, producing a workspace dictionary that the statistical model consumes.

This page is the canonical reference for the YAML format. For the Python API, see Workspace Builder API. For a hands-on walkthrough, see Model Building Example.

Measurement and parameters

The General block names the measurement and declares which parameters enter the fit:

General:
  Measurement:
    Name: higgs_tautau_signal_strength
    POI: mu_htautau
    ParametersToFit:
      - mu_htautau
      - mu_ztautau
      - mu_ttbar
      - TES
      - JES
  • POI — the parameter of interest (signal strength).

  • ParametersToFit — only these parameters appear in the likelihood. Omitting a parameter here effectively fixes it at its nominal value.

Samples

Each row in Samples is a physics process read from a ROOT file:

Samples:
  - Name: htautau
    Tree: tree_htautau
    SamplePath: ./saved_datasets/dataset_nominal.root
    Weight: weights
    UseAsReference: True
    UseAsBasis: True
  • UseAsBasis — this process gets its own density-ratio network and normalization factor.

  • UseAsReference — the denominator process in the density ratio \(r(x) = p / p_{\text{ref}}\). (Optional, users can choose to pass their own reference hypothesis when just using APIs from the toolkit).

Normalization factors

Free multiplicative parameters (signal strengths, background norms):

NormFactors:
  - Name: mu_htautau
    Samples: htautau
    Nominal: 1
    Bounds: [0, 10]

These are unconstrained in the fit — no Gaussian penalty term.

Systematic uncertainties

Shape + normalization uncertainties point to the up/down varied ROOT files:

Systematics:
  - Name: JES
    Type: NormPlusShape
    Nominal: 0
    Samples: [htautau, ztautau, ttbar]
    Up:
      - SampleName: htautau
        Path: ./saved_datasets/dataset_JES_up.root
        Tree: tree_htautau
        Weight: weights
    Dn:
      - SampleName: htautau
        Path: ./saved_datasets/dataset_JES_dn.root
        Tree: tree_htautau
        Weight: weights
  • Nominal: 0 — the nuisance parameter starts at zero (the Gaussian constraint is centred here).

  • The workspace builder computes variation ratios (varied / nominal) automatically from the histograms.

Analysis regions

Regions define event selections and whether the channel is binned or unbinned:

Regions:
  # Binned control region
  - Name: CR
    Filter: presel_score < -1.0
    Variable: presel_score
    Type: binned
    Binning: [-8.75, -4.0, -2.5, -1.0]

  # Unbinned SBI signal region
  - Name: SR
    Filter: presel_score >= -1.0 & presel_score <= 4.5
    Type: unbinned
    AsimovWeights: ./saved_datasets/asimov_weights.npy
    TrainedModels:
      - SampleName: htautau
        Nominal:
          Ratios: ./saved_datasets/.../ratio_htautau.npy
        Systematics:
          - SystName: JES
            RatiosUp: ./saved_datasets/.../ratio_htautau.npy
            RatiosDn: ./saved_datasets/.../ratio_htautau.npy

For unbinned regions, TrainedModels points to the pre-evaluated density-ratio .npy arrays produced by the evaluation pipeline step. For binned regions, the workspace builder histograms the data automatically using Variable and Binning.

Training features

Which branches from the ROOT file are used as NN inputs:

TrainingFeatures:
  - DER_mass_transverse_met_lep
  - log_DER_mass_vis
  - log_DER_pt_h

TrainingFeaturesToStandardize:
  - DER_mass_transverse_met_lep
  - log_DER_mass_vis

Features listed in TrainingFeaturesToStandardize are z-scored before being passed to the network. Features not listed are passed through unchanged.