Writing a Fit Configuration
===========================

The fit configuration YAML file is the central input to the entire SBI workflow. It defines the measurement, samples, systematic uncertainties, analysis regions, training features, and pointers to trained density-ratio models. The :class:`~nsbi_common_utils.configuration.ConfigManager` reads this file and exposes its contents through a set of accessor methods that are consumed at every stage of the pipeline:

- **Data preparation** — the ``datasets`` module reads sample paths, ROOT tree names, weight branches, and systematic variation files from the config to load and organise the data.
- **Preselection training** — training features and feature scaling lists are retrieved via ``ConfigManager.get_training_features()``.
- **Density-ratio training** — basis processes (``get_basis_samples()``), the reference hypothesis (``get_reference_samples()``), training features, and region filters (``get_channel_filters()``) are all read from the config.
- **Systematic uncertainty training** — the ``Systematics`` block and ``get_samples_in_syst_for_training()`` determine which processes are affected by each variation and where the varied ROOT files are located.
- **Evaluation** — Asimov weight paths (``get_channel_asimov_weight_path()``) and trained model metadata come from the config.
- **Workspace building** — the :class:`~nsbi_common_utils.workspace_builder.WorkspaceBuilder` assembles the full statistical model from all of the above, producing a workspace dictionary that the statistical model consumes.

This page is the canonical reference for the YAML format. For the Python API, see :doc:`/api/workspace_builder`. For a hands-on walkthrough, see :doc:`model_building_example`.

Measurement and parameters
--------------------------

The ``General`` block names the measurement and declares which parameters enter the fit:

.. code-block:: yaml

   General:
     Measurement:
       Name: higgs_tautau_signal_strength
       POI: mu_htautau
       ParametersToFit:
         - mu_htautau
         - mu_ztautau
         - mu_ttbar
         - TES
         - JES

- **POI** — the parameter of interest (signal strength).
- **ParametersToFit** — only these parameters appear in the likelihood. Omitting a parameter here effectively fixes it at its nominal value.

Samples
-------

Each row in ``Samples`` is a physics process read from a ROOT file:

.. code-block:: yaml

   Samples:
     - Name: htautau
       Tree: tree_htautau
       SamplePath: ./saved_datasets/dataset_nominal.root
       Weight: weights
       UseAsReference: True
       UseAsBasis: True

- **UseAsBasis** — this process gets its own density-ratio network and normalization factor.
- **UseAsReference** — the denominator process in the density ratio :math:`r(x) = p / p_{\text{ref}}`. (Optional, users can choose to pass their own reference hypothesis when just using APIs from the toolkit).

Normalization factors
---------------------

Free multiplicative parameters (signal strengths, background norms):

.. code-block:: yaml

   NormFactors:
     - Name: mu_htautau
       Samples: htautau
       Nominal: 1
       Bounds: [0, 10]

These are **unconstrained** in the fit — no Gaussian penalty term.

Systematic uncertainties
------------------------

Shape + normalization uncertainties point to the up/down varied ROOT files:

.. code-block:: yaml

   Systematics:
     - Name: JES
       Type: NormPlusShape
       Nominal: 0
       Samples: [htautau, ztautau, ttbar]
       Up:
         - SampleName: htautau
           Path: ./saved_datasets/dataset_JES_up.root
           Tree: tree_htautau
           Weight: weights
       Dn:
         - SampleName: htautau
           Path: ./saved_datasets/dataset_JES_dn.root
           Tree: tree_htautau
           Weight: weights

- **Nominal: 0** — the nuisance parameter starts at zero (the Gaussian constraint is centred here).
- The workspace builder computes variation ratios (varied / nominal) automatically from the histograms.

Analysis regions
----------------

Regions define event selections and whether the channel is binned or unbinned:

.. code-block:: yaml

   Regions:
     # Binned control region
     - Name: CR
       Filter: presel_score < -1.0
       Variable: presel_score
       Type: binned
       Binning: [-8.75, -4.0, -2.5, -1.0]

     # Unbinned SBI signal region
     - Name: SR
       Filter: presel_score >= -1.0 & presel_score <= 4.5
       Type: unbinned
       AsimovWeights: ./saved_datasets/asimov_weights.npy
       TrainedModels:
         - SampleName: htautau
           Nominal:
             Ratios: ./saved_datasets/.../ratio_htautau.npy
           Systematics:
             - SystName: JES
               RatiosUp: ./saved_datasets/.../ratio_htautau.npy
               RatiosDn: ./saved_datasets/.../ratio_htautau.npy

For **unbinned** regions, ``TrainedModels`` points to the pre-evaluated density-ratio ``.npy`` arrays produced by the evaluation pipeline step.
For **binned** regions, the workspace builder histograms the data automatically using ``Variable`` and ``Binning``.

Training features
-----------------

Which branches from the ROOT file are used as NN inputs:

.. code-block:: yaml

   TrainingFeatures:
     - DER_mass_transverse_met_lep
     - log_DER_mass_vis
     - log_DER_pt_h

   TrainingFeaturesToStandardize:
     - DER_mass_transverse_met_lep
     - log_DER_mass_vis

Features listed in ``TrainingFeaturesToStandardize`` are z-scored before being passed to the network. Features not listed are passed through unchanged.