Model Building Example ====================== The ``examples/model_building_example/`` directory provides a **standalone** walkthrough of workspace construction, model building, and profile-likelihood fitting — without requiring the full training pipeline. It is the recommended starting point for understanding how the fitting stage works. Prerequisites ------------- You need four ingredients, all of which are included in the example directory (via symlinks to the main dataset): 1. A **fit configuration** YAML file (``config_fit_nsbi.yml`` or ``config_fit_histogram.yml``). 2. **Pre-computed density-ratio** ``.npy`` files (nominal + systematic variations) — produced by the training pipeline or any external source. 3. **Asimov weights** (or real-data weights) for the unbinned region. 4. **ROOT files** containing the MC samples used by binned channels. .. note:: The ``saved_datasets/`` directory uses **symlinks** into the parent ``FAIR_universe_Higgs_tautau/saved_datasets/`` directory to avoid duplicating large files. After a fresh ``git clone``, make sure to run ``git lfs pull`` so that the LFS-tracked ROOT and NumPy files are downloaded, not just pointer stubs. Directory layout ---------------- .. code-block:: text model_building_example/ config_fit_histogram.yml # Binned-only fit configuration config_fit_nsbi.yml # Combined binned + unbinned (NSBI) fit 1_workspace_building.ipynb # Notebook: build workspaces from config 2_parameter_fitting.ipynb # Notebook: fit and profile scan saved_datasets/ asimov_weights.npy # Per-event weights (unbinned region) dataset_nominal.root # Nominal MC (binned channels) dataset_JES_up.root # Systematic variation ROOT files dataset_JES_dn.root dataset_TES_up.root dataset_TES_dn.root output_training_nominal/ output_ratios_/ ratio_.npy # Nominal density ratios output_training_systematics/ output_ratios___/ ratio_.npy # Systematic density ratios Quick start ----------- .. code-block:: python from nsbi_common_utils import workspace_builder, models, inference # 1. Build a workspace from the YAML config ws = workspace_builder.WorkspaceBuilder( config_path="config_fit_nsbi.yml" ).build() # 2. Initialise the statistical model (JAX-compiled NLL) model = models.sbi_parametric_model( workspace=ws, measurement_to_fit="my_measurement" ) # 3. Fit params, init_vals = model.get_model_parameters() fitter = inference.inference( model_nll=model.model, model_grad=model.model_grad, initial_values=init_vals, list_parameters=params, num_unconstrained_params=model.num_unconstrained_param, ) fitter.perform_fit() Step 1 — Build the workspace ----------------------------- The :class:`~nsbi_common_utils.workspace_builder.WorkspaceBuilder` reads the YAML config, loads ROOT datasets and density-ratio arrays, and assembles a JSON-serialisable workspace dictionary: .. code-block:: python builder = workspace_builder.WorkspaceBuilder(config_path="config_fit_nsbi.yml") ws = builder.build() # Optionally persist to disk so you can skip this step next time builder.dump_workspace(ws, "workspace_nsbi.json") # Re-load later without re-reading ROOT files ws = workspace_builder.WorkspaceBuilder.load_workspace("workspace_nsbi.json") See :doc:`/api/workspace_builder` for the full API. Step 2 — Initialise the model ------------------------------- :class:`~nsbi_common_utils.models.sbi_parametric_model` parses the workspace, stacks all histogram yields and density-ratio arrays onto the JAX device, and compiles a JIT-optimised negative log-likelihood function: .. code-block:: python model = models.sbi_parametric_model( workspace=ws, measurement_to_fit="NSBI_measurement", ) # Inspect the parameter ordering and starting values param_names, init_values = model.get_model_parameters() The compiled NLL is exposed as ``model.model(param_array)`` and its analytical gradient as ``model.model_grad(param_array)``. See :doc:`/api/model` for the full API. Step 3 — Fit and profile scan ------------------------------- :class:`~nsbi_common_utils.inference.inference` wraps ``iminuit`` to perform the minimisation and profile-likelihood scans: .. code-block:: python fitter = inference.inference( model_nll=model.model, model_grad=model.model_grad, initial_values=init_vals, list_parameters=param_names, num_unconstrained_params=model.num_unconstrained_param, ) # Global fit fitter.perform_fit() # Profile likelihood scan of the POI pts, nll, pts_stat, nll_stat = fitter.perform_profile_scan( parameter_name="mu_htautau", bound_range=(0, 3), size=50, doStatOnly=True, ) See :doc:`/api/inference` for the full API. Fit configuration ----------------- The YAML config defines five sections consumed by the workspace builder. See :doc:`fit_config` for the full specification; the key points are summarised here. **Measurement** — which parameters to fit and the parameter of interest (POI). **Samples** — physics processes (signal, backgrounds) with paths to ROOT files and tree names. **NormFactors** — free normalisation parameters (one per sample or shared). **Systematics** — nuisance parameters with paths to up/down ROOT variation files. Currently only ``NormPlusShape`` is supported. **Regions** — analysis regions tagged as ``binned`` (control/signal regions built from histograms) or ``unbinned`` (signal region using density ratios). Unbinned regions reference the trained model outputs: .. code-block:: yaml Regions: - Name: SR Type: unbinned AsimovWeights: ./saved_datasets/asimov_weights.npy TrainedModels: - SampleName: htautau Nominal: Ratios: ./saved_datasets/output_training_nominal/output_ratios_htautau/ratio_htautau.npy Systematics: - SystName: JES RatiosUp: .../output_ratios_htautau_JES_Up/ratio_htautau.npy RatiosDn: .../output_ratios_htautau_JES_Dn/ratio_htautau.npy Notebooks --------- The example ships with two Jupyter notebooks: - **1_workspace_building.ipynb** — walks through workspace construction for both the histogram-only and NSBI configurations, and serialises the workspaces to JSON. - **2_parameter_fitting.ipynb** — loads the workspaces, initialises both models, performs global fits (with JAX autodiff gradients via ``model_grad``), runs profile-likelihood scans, and plots an NSBI-vs-histogram sensitivity comparison.