Workspace Builder API

The workspace builder converts a human-readable YAML fit configuration into a pyhf-like workspace dictionary that sbi_parametric_model consumes directly. This is the bridge between your analysis definition and the statistical model.

Typical usage

from nsbi_common_utils import workspace_builder, models, inference

# 1. Build workspace from fit config
ws = workspace_builder.WorkspaceBuilder(config_path="config_fit_nsbi.yml").build()

# 2. Optionally serialise / reload (avoids re-reading ROOT files)
builder.dump_workspace(ws, "workspace.json")
ws = workspace_builder.WorkspaceBuilder.load_workspace("workspace.json")

# 3. Pass to the statistical model
model = models.sbi_parametric_model(workspace=ws, measurement_to_fit="my_measurement")

For details on the YAML configuration format consumed by the builder, see Writing a Fit Configuration. For a hands-on walkthrough, see Model Building Example.

API Reference

class WorkspaceBuilder(config_path)[source]

Bases: object

Build a pyhf-like JSON workspace from a YAML configuration file.

Reads sample definitions, region definitions, normalization factors, and systematic uncertainties from a YAML config (via ConfigManager), loads the corresponding ROOT datasets and trained density-ratio models, and assembles them into a workspace dictionary that can be consumed by sbi_parametric_model.

Parameters:: config_path (pathlib.Path or str) – Path to the YAML configuration file that defines samples, regions, normalization factors, and systematics. See Writing a Fit Configuration for the expected format.
Parameters:: config_path (Path | str)

See also

nsbi_common_utils.models.sbi_parametric_model: Consumes the workspace dictionary produced by build().

Examples

>>> builder = WorkspaceBuilder("config_fit_nsbi.yml")
>>> ws = builder.build()
>>> builder.dump_workspace(ws, "workspace.json")

build()[source]

Construct the full workspace dictionary.

Calls channels() and measurements() and combines them with a version tag into the top-level workspace dict.

Returns:: dict – Workspace with keys "channels", "measurements", and "version". Ready to be passed to sbi_parametric_model or serialised via dump_workspace().
Return type:: Dict[str, Any]

channels()[source]

Build the "channels" list for the workspace.

For every region in the configuration, loads datasets from ROOT files, computes nominal histograms, attaches density-ratio file paths (for unbinned regions), and collects all applicable normfactor and systematic modifiers per sample.

Returns:: list of dict – Each dict represents one channel with keys "name", "type" ("binned"/"unbinned"), "samples", and optionally "weights" (path to Asimov weight file for unbinned channels).
Return type:: List[Dict[str, Any]]

dump_workspace(ws, outpath='workspace.json')[source]

Serialise a workspace dictionary to a JSON file.

Parameters:

ws (dict) – Workspace dictionary returned by build().
outpath (str, optional) – Output file path. Defaults to "workspace.json".

Parameters:

ws (dict)
outpath (str)

static load_workspace(path)[source]

Load a workspace dictionary from a JSON file.

Parameters:: path (str) – Path to the JSON workspace file written by dump_workspace().
Returns:: dict – The deserialised workspace dictionary.
Parameters:: path (str)
Return type:: dict

measurements()[source]

Build the "measurements" list for the workspace.

Extracts parameter initial values and bounds from the NormFactors and Systematics sections of the config, filters to the ParametersToFit subset (if specified), and records the parameter of interest (POI).

Returns:: list of dict – A single-element list. The dict contains "name" and "config" (with "parameters" and "poi" keys).
Return type:: List[Dict[str, Any]]

normfactor_modifiers(region_name, sample_name)[source]

Return normfactor modifiers that affect a given sample in a region.

Iterates over all NormFactors in the configuration and keeps only those whose Region and Samples lists include the requested region/sample (or are unset, meaning they apply everywhere).

Parameters:

region_name (str) – Name of the region (channel) to filter on.
sample_name (str) – Name of the physics sample to filter on.

Returns:

list of dict – Each dict has keys "name", "data", and "type": "normfactor".

Parameters:

region_name (str)
sample_name (str)

Return type:

list[dict[str, Any]]

normplusshape_modifiers(dataset, region, sample, systematic_dict, nominal_data, type_of_fit)[source]

Build a NormPlusShape modifier for one systematic on one sample.

Histograms the up/down systematic variations, divides by the nominal to obtain variation ratios, and (for unbinned channels) attaches paths to the pre-computed density-ratio arrays.

Parameters:

dataset (dict of dict of DataFrame) – Nested dict keyed by "<syst>_Up"/"<syst>_Dn" then sample name, as returned by datasets.filter_region_by_type().
region (dict) – Region (channel) configuration dictionary. Must contain "Name", "Variable", and "Binning" keys.
sample (dict) – Sample configuration dictionary with at least "Name" and "SamplePath" keys.
systematic_dict (dict) – Single systematic entry from the YAML config (has "Name").
nominal_data (np.ndarray) – Nominal histogram bin counts used to normalise the variations.
type_of_fit (:py:class:``”binned”:py:class:`` or :py:class:``”unbinned”:py:class:``) – Determines whether density-ratio file paths are included.

Returns:

list of dict – A single-element list containing the modifier dictionary with keys "name", "type": "normplusshape", and "data".

Parameters:

dataset (pandas.DataFrame)
region (dict[str, Any])
sample (dict[str, Any])
systematic_dict (dict[str, Any])
nominal_data (numpy.array)
type_of_fit (str)

Return type:

list[dict[str, Any]]

sys_modifiers(dataset, region, sample, nominal_data, type_of_fit='binned')[source]

Collect all systematic modifiers for a sample in a region.

Loops over every systematic in the configuration, checks region/sample applicability, and delegates to normplusshape_modifiers() for NormPlusShape types.

Parameters:

dataset (dict) – Dataset dictionary as returned by datasets.filter_region_by_type().
region (dict) – Region configuration dictionary.
sample (dict) – Sample configuration dictionary.
nominal_data (np.ndarray) – Nominal histogram used for ratio computation.
type_of_fit (str, optional) – "binned" (default) or "unbinned".

Returns:

list of dict – Modifier dictionaries for all applicable systematics.

Raises:

NotImplementedError – If a systematic type other than NormPlusShape is encountered.

Parameters:

dataset (pandas.DataFrame)
region (dict[str, Any])
sample (dict[str, Any])
nominal_data (numpy.array)
type_of_fit (str)

Return type:

list[dict[str, Any]]