Writing a Fit Configuration
The fit configuration YAML file is the central input to the entire SBI workflow. It defines the measurement, samples, systematic uncertainties, analysis regions, training features, and pointers to trained density-ratio models. The ConfigManager reads this file and exposes its contents through a set of accessor methods that are consumed at every stage of the pipeline:
Data preparation — the
datasetsmodule reads sample paths, ROOT tree names, weight branches, and systematic variation files from the config to load and organise the data.Preselection training — training features and feature scaling lists are retrieved via
ConfigManager.get_training_features().Density-ratio training — basis processes (
get_basis_samples()), the reference hypothesis (get_reference_samples()), training features, and region filters (get_channel_filters()) are all read from the config.Systematic uncertainty training — the
Systematicsblock andget_samples_in_syst_for_training()determine which processes are affected by each variation and where the varied ROOT files are located.Evaluation — Asimov weight paths (
get_channel_asimov_weight_path()) and trained model metadata come from the config.Workspace building — the
WorkspaceBuilderassembles the full statistical model from all of the above, producing a workspace dictionary that the statistical model consumes.
This page is the canonical reference for the YAML format. For the Python API, see Workspace Builder API. For a hands-on walkthrough, see Model Building Example.
Measurement and parameters
The General block names the measurement and declares which parameters enter the fit:
General:
Measurement:
Name: higgs_tautau_signal_strength
POI: mu_htautau
ParametersToFit:
- mu_htautau
- mu_ztautau
- mu_ttbar
- TES
- JES
POI — the parameter of interest (signal strength).
ParametersToFit — only these parameters appear in the likelihood. Omitting a parameter here effectively fixes it at its nominal value.
Samples
Each row in Samples is a physics process read from a ROOT file:
Samples:
- Name: htautau
Tree: tree_htautau
SamplePath: ./saved_datasets/dataset_nominal.root
Weight: weights
UseAsReference: True
UseAsBasis: True
UseAsBasis — this process gets its own density-ratio network and normalization factor.
UseAsReference — the denominator process in the density ratio \(r(x) = p / p_{\text{ref}}\). (Optional, users can choose to pass their own reference hypothesis when just using APIs from the toolkit).
Normalization factors
Free multiplicative parameters (signal strengths, background norms):
NormFactors:
- Name: mu_htautau
Samples: htautau
Nominal: 1
Bounds: [0, 10]
These are unconstrained in the fit — no Gaussian penalty term.
Systematic uncertainties
Shape + normalization uncertainties point to the up/down varied ROOT files:
Systematics:
- Name: JES
Type: NormPlusShape
Nominal: 0
Samples: [htautau, ztautau, ttbar]
Up:
- SampleName: htautau
Path: ./saved_datasets/dataset_JES_up.root
Tree: tree_htautau
Weight: weights
Dn:
- SampleName: htautau
Path: ./saved_datasets/dataset_JES_dn.root
Tree: tree_htautau
Weight: weights
Nominal: 0 — the nuisance parameter starts at zero (the Gaussian constraint is centred here).
The workspace builder computes variation ratios (varied / nominal) automatically from the histograms.
Analysis regions
Regions define event selections and whether the channel is binned or unbinned:
Regions:
# Binned control region
- Name: CR
Filter: presel_score < -1.0
Variable: presel_score
Type: binned
Binning: [-8.75, -4.0, -2.5, -1.0]
# Unbinned SBI signal region
- Name: SR
Filter: presel_score >= -1.0 & presel_score <= 4.5
Type: unbinned
AsimovWeights: ./saved_datasets/asimov_weights.npy
TrainedModels:
- SampleName: htautau
Nominal:
Ratios: ./saved_datasets/.../ratio_htautau.npy
Systematics:
- SystName: JES
RatiosUp: ./saved_datasets/.../ratio_htautau.npy
RatiosDn: ./saved_datasets/.../ratio_htautau.npy
For unbinned regions, TrainedModels points to the pre-evaluated density-ratio .npy arrays produced by the evaluation pipeline step.
For binned regions, the workspace builder histograms the data automatically using Variable and Binning.
Training features
Which branches from the ROOT file are used as NN inputs:
TrainingFeatures:
- DER_mass_transverse_met_lep
- log_DER_mass_vis
- log_DER_pt_h
TrainingFeaturesToStandardize:
- DER_mass_transverse_met_lep
- log_DER_mass_vis
Features listed in TrainingFeaturesToStandardize are z-scored before being passed to the network. Features not listed are passed through unchanged.