Toolkit for Simulation-Based Inference ======================================= ``nsbi-common-utils`` provides building blocks for **Simulation-Based Inference (SBI)** analysis tailored to the statistical models used at the ATLAS and CMS experiments. It implements a semi-parametric approach to SBI in which the statistical model combines non-parametric density-ratio estimates with parametric HistFactory-style systematic uncertainties. The toolkit is modular, offering APIs for dataset preparation, density-ratio estimation, model building, and profiled-likelihood-ratio fitting. Why Simulation-Based Inference? ------------------------------- Traditional LHC analyses compress high-dimensional collision data into summary observables, which can discard information sensitive to the parameters of interest. SBI uses neural networks to approximate the likelihood ratio directly from the full feature space — no hand-crafted summaries required. This toolkit makes SBI practical for LHC-scale analyses by embedding neural density-ratio estimates inside a HistFactory-style statistical model, combining unbinned SBI regions with traditional binned template regions in a single profiled-likelihood fit. See the :doc:`basics/overview` for a detailed introduction, or the method and measurement papers: `arXiv:2412.01600 `_, `arXiv:2412.01548 `_. Key features ------------ - **Training pipeline** — density-ratio neural network training with PyTorch Lightning, ensemble support, and calibration and reweighting diagnostics. - **Statistical models** — pyhf-like workspace specification supporting both binned (template-based) and unbinned (SBI-style) analysis regions with HistFactory-style systematic uncertainties, compiled to JAX for fast NLL evaluation and automatic differentiation. - **Inference engine** — profile-likelihood fits and NLL scans via iminuit with analytic gradients, plus plotting utilities. - **Configuration-driven** — a single YAML :doc:`fit configuration ` defines samples, features, systematics, and regions, and is consumed across all stages of the pipeline via :class:`~nsbi_common_utils.configuration.ConfigManager`. - **Workflow integration** — HTCondor/DAGMan job descriptions for large-scale cluster submission, with end-to-end example pipelines. Getting started --------------- Install the package with pip: .. code-block:: bash pip install 'nsbi-common-utils @ git+https://github.com/iris-hep/NSBI-workflow-tutorial.git' Or, for a full environment with all dependencies (recommended): .. code-block:: bash pixi install -e nsbi-env # CPU-only pixi install -e nsbi-env-gpu # with CUDA support Statistical Model Building with SBI ----------------------------------- Once installed, building a statistical model and running a fit takes just a few lines: .. code-block:: python from nsbi_common_utils import workspace_builder, models, inference # Build workspace from the fit configuration (also used for data loading, training, etc.) ws = workspace_builder.WorkspaceBuilder("config_fit.yml").build() # Create the statistical model (JAX-compiled NLL) model = models.sbi_parametric_model(workspace=ws, measurement_to_fit="my_meas") params, init_vals = model.get_model_parameters() # Fit and profile fitter = inference.inference( model_nll=model.model, list_parameters=params, initial_parameter_values=init_vals, ) fitter.perform_fit() For a hands-on walkthrough, see the :doc:`basics/model_building_example`. .. toctree:: :maxdepth: 2 :caption: Basics basics/overview basics/density_ratio_training basics/preselection_training basics/fit_config basics/workflow basics/model_building_example .. toctree:: :maxdepth: 2 :caption: API Reference api/training api/workspace_builder api/model api/inference