Training API

Density-ratio estimation

Preselection network

Utility functions

save_model(lightning_model, input_sample, path_to_save_model, scaler_instance, path_to_save_scaler, softmax_output=False)[source]

Export a trained PyTorch Lightning model to ONNX format and save the feature scaler to disk.

Parameters:

lightning_model (DensityRatioLightning) – Trained PyTorch Lightning model instance. Must be in eval mode or will be set to eval mode internally.
input_sample (torch.Tensor, shape (1, n_features)) – A representative input tensor used to trace the model graph during ONNX export. Values do not affect the exported weights — only the shape matters. Typically torch.randn((1, len(features))).
path_to_save_model (str or Path) – Destination path for the exported .onnx file.
scaler_instance (sklearn transformer) – Fitted scaler object (e.g. ColumnTransformer wrapping StandardScaler) to be serialised alongside the model so that the same preprocessing is applied at inference time.
path_to_save_scaler (str or Path) – Destination path for the serialised scaler .bin file.
softmax_output (bool, optional) – If True, wraps the model with a softmax layer before export so that the ONNX output is a probability vector rather than raw logits. Set to False (default) for density-ratio training, where the raw sigmoid output is used directly.

Parameters:

path_to_save_model (str | Path)
path_to_save_scaler (str | Path)
softmax_output (bool)

Return type:

None

Notes

The scaler is serialised with joblib.dump using compression level 3.
ONNX export uses opset version 17 with dynamic batch size axes, so the exported model accepts any batch size at inference.
When softmax_output=True, the wrapper accesses model.mlp and model.out directly — these attribute names must exist on the Lightning model.

load_trained_model(path_to_saved_model, path_to_saved_scaler)[source]

Load a previously saved ONNX model and its associated feature scaler.

Parameters:

path_to_saved_model (str or Path) – Path to the .onnx model file produced by save_model().
path_to_saved_scaler (str or Path) – Path to the .bin scaler file produced by save_model().

Returns:

scaler (sklearn transformer) – The deserialised scaler object. Call scaler.transform(data) to preprocess new data consistently with the training pipeline.
model (onnx.ModelProto) – The loaded ONNX model graph. Pass this directly to predict_with_onnx() or predict_with_model(), which will create an onnxruntime.InferenceSession internally on first call.

Parameters:

path_to_saved_model (Path | str)
path_to_saved_scaler (Path | str)

Notes

The returned model is an onnx.ModelProto, not an onnxruntime.InferenceSession. The session is created lazily inside predict_with_onnx() to avoid holding GPU/CPU resources when the model is not actively being used.

predict_with_model(data, scaler, model, calibration_model=None, use_log_loss=False)[source]

Evaluate the trained density-ratio model on an input dataset.

Applies feature scaling, runs ONNX inference, optionally converts from log-likelihood-ratio space to a probability score, and optionally applies the calibration layer.

Parameters:

data (pandas.DataFrame) – Dataset to evaluate on.
scaler (sklearn transformer) – Fitted scaler with a .transform() method. Applied to dataset before inference. Must be the same scaler saved alongside the model via save_model().
model (onnx.ModelProto or onnxruntime.InferenceSession) – The ONNX model to run inference with. If a ModelProto is passed, an InferenceSession is created internally. If an InferenceSession is passed, it is used directly.
calibration_model – Calibration model with cali_pred method.
use_log_loss (bool, optional) – If True, the raw model output is interpreted as \(\log(p_A / p_B)\) and converted to a probability score via \(s = \sigma(\log r) = 1 / (1 + r^{-1})\) before returning. Must match the use_log_loss setting used during training. Default False.

Returns:

numpy.ndarray, shape (n_events,) – Predicted scores in the range (0, 1), where values close to 1 indicate high probability of belonging to hypothesis A (numerator) and values close to 0 indicate hypothesis B (denominator). If calibration is enabled, the output is additionally clipped to [1e-8, 1 - 1e-8] for numerical safety.

Notes

To obtain the density ratio \(r = p_A / p_B\) from the returned score \(s\), use \(r = s / (1 - s)\).

predict_with_onnx(dataset, scaler, model, batch_size=10000, softmax_output=False)[source]

Run batched ONNX inference on a dataset.

Scales the input features, runs inference through the ONNX runtime in fixed-size batches to avoid memory exhaustion on large datasets, and optionally applies a calibration model to the raw outputs.

Parameters:

dataset (pandas.DataFrame or numpy.ndarray) – Input data. Must contain the feature columns in the same order used during training. Additional columns are ignored if a DataFrame is passed, provided the scaler was fitted with named columns.
scaler (sklearn transformer) – Fitted scaler with a .transform() method. Applied to dataset before inference. Must be the same scaler saved alongside the model via save_model().
model (onnx.ModelProto or onnxruntime.InferenceSession) – The ONNX model to run inference with. If a ModelProto is passed, an InferenceSession is created internally. If an InferenceSession is passed, it is used directly.
batch_size (int, optional) – Number of events processed per inference call. Reduce this if GPU memory is limited. Default 10_000.
softmax_output (bool, optional) – If False (default), the output array is flattened to shape (n_events,). If True, the 2D output (n_events, n_classes) is preserved, as returned by a model exported with softmax.

Returns:

preds (numpy.ndarray) –

Shape (n_events,) when softmax_output=False.
Shape (n_events, n_classes) when softmax_output=True.

Dtype is float32.

Raises:

TypeError – If model is neither an onnx.ModelProto nor an onnxruntime.InferenceSession.

Parameters:

softmax_output (bool)

Notes

The ONNX session is configured with intra_op_num_threads=1 and inter_op_num_threads=1. This is intentional for HTCondor jobs where CPU resources are explicitly requested — unconstrained threading can cause resource contention across concurrent jobs on the same node.
CUDA execution is attempted first; the runtime falls back to CPU automatically if no compatible GPU is available.

convert_torch_to_onnx(lightning_model, input_dim, opset=17)[source]

Convert a trained PyTorch Lightning model to an onnx.ModelProto in memory, without permanently writing to disk.

Parameters:

lightning_model (DensityRatioLightning) – Trained model to convert. Must have parameters accessible via model.parameters() to determine the target device.
input_dim (int) – Number of input features. Used to construct a random dummy input tensor for graph tracing.
opset (int, optional) – ONNX opset version to target during export. Default 17.

Returns:

onnx.ModelProto – The exported ONNX model loaded into memory and ready to pass to predict_with_onnx().

Notes

A temporary .onnx file is written to the system’s temp directory during export and deleted immediately after loading. The returned object is fully in-memory.
Dynamic batch axes are set for both input and output so the returned model accepts any batch size at inference.
This function differs from save_model() in that it does not persist the model to a user-specified path and does not handle scaler serialisation. Use save_model() when you need to save model artefacts for later reuse.

convert_logLR_to_score(logLR)[source]

Convert a log-likelihood ratio to a probability score.

Maps \(\log(p_A / p_B)\) to the relative probability \(s = p_A / (p_A + p_B)\) via the sigmoid function:

\[s = \frac{1}{1 + e^{-\log(p_A/p_B)}}\]

Parameters:: logLR (numpy.ndarray) – Array of log-likelihood ratio values, unbounded in range.
Returns:: numpy.ndarray – Probability scores in the range (0, 1).

Notes

Use this function when the model was trained with use_log_loss=True, which causes the network to regress \(\log(p_A/p_B)\) directly rather than a classification score. The output of this function is compatible with downstream methods that expect scores in (0, 1).
To recover the density ratio from the score, use convert_score_to_ratio().

convert_score_to_ratio(score)[source]

Convert a probability score to a density ratio.

Given a classifier score \(s = p_A / (p_A + p_B)\), returns the density ratio \(r = p_A / p_B\) via:

\[r = \frac{s}{1 - s}\]

Parameters:: score (numpy.ndarray) – Probability scores in the range (0, 1). Values at exactly 0 or 1 will produce 0 or inf respectively — clip inputs to a safe range such as [1e-9, 1 - 1e-9] if needed.
Returns:: numpy.ndarray – Density ratio values \(p_A / p_B\), unbounded above.