Training API

Density-ratio estimation

Preselection network

Utility functions

save_model(lightning_model, input_sample, path_to_save_model, scaler_instance, path_to_save_scaler, softmax_output=False)[source]

Export a trained PyTorch Lightning model to ONNX format and save the feature scaler to disk.

Parameters:
  • lightning_model (DensityRatioLightning) – Trained PyTorch Lightning model instance. Must be in eval mode or will be set to eval mode internally.

  • input_sample (torch.Tensor, shape (1, n_features)) – A representative input tensor used to trace the model graph during ONNX export. Values do not affect the exported weights — only the shape matters. Typically torch.randn((1, len(features))).

  • path_to_save_model (str or Path) – Destination path for the exported .onnx file.

  • scaler_instance (sklearn transformer) – Fitted scaler object (e.g. ColumnTransformer wrapping StandardScaler) to be serialised alongside the model so that the same preprocessing is applied at inference time.

  • path_to_save_scaler (str or Path) – Destination path for the serialised scaler .bin file.

  • softmax_output (bool, optional) – If True, wraps the model with a softmax layer before export so that the ONNX output is a probability vector rather than raw logits. Set to False (default) for density-ratio training, where the raw sigmoid output is used directly.

Parameters:
Return type:

None

Notes

  • The scaler is serialised with joblib.dump using compression level 3.

  • ONNX export uses opset version 17 with dynamic batch size axes, so the exported model accepts any batch size at inference.

  • When softmax_output=True, the wrapper accesses model.mlp and model.out directly — these attribute names must exist on the Lightning model.

load_trained_model(path_to_saved_model, path_to_saved_scaler)[source]

Load a previously saved ONNX model and its associated feature scaler.

Parameters:
  • path_to_saved_model (str or Path) – Path to the .onnx model file produced by save_model().

  • path_to_saved_scaler (str or Path) – Path to the .bin scaler file produced by save_model().

Returns:

  • scaler (sklearn transformer) – The deserialised scaler object. Call scaler.transform(data) to preprocess new data consistently with the training pipeline.

  • model (onnx.ModelProto) – The loaded ONNX model graph. Pass this directly to predict_with_onnx() or predict_with_model(), which will create an onnxruntime.InferenceSession internally on first call.

Parameters:

Notes

  • The returned model is an onnx.ModelProto, not an onnxruntime.InferenceSession. The session is created lazily inside predict_with_onnx() to avoid holding GPU/CPU resources when the model is not actively being used.

predict_with_model(data, scaler, model, calibration_model=None, use_log_loss=False)[source]

Evaluate the trained density-ratio model on an input dataset.

Applies feature scaling, runs ONNX inference, optionally converts from log-likelihood-ratio space to a probability score, and optionally applies the calibration layer.

Parameters:
  • data (pandas.DataFrame) – Dataset to evaluate on.

  • scaler (sklearn transformer) – Fitted scaler with a .transform() method. Applied to dataset before inference. Must be the same scaler saved alongside the model via save_model().

  • model (onnx.ModelProto or onnxruntime.InferenceSession) – The ONNX model to run inference with. If a ModelProto is passed, an InferenceSession is created internally. If an InferenceSession is passed, it is used directly.

  • calibration_model – Calibration model with cali_pred method.

  • use_log_loss (bool, optional) – If True, the raw model output is interpreted as \(\log(p_A / p_B)\) and converted to a probability score via \(s = \sigma(\log r) = 1 / (1 + r^{-1})\) before returning. Must match the use_log_loss setting used during training. Default False.

Returns:

numpy.ndarray, shape (n_events,) – Predicted scores in the range (0, 1), where values close to 1 indicate high probability of belonging to hypothesis A (numerator) and values close to 0 indicate hypothesis B (denominator). If calibration is enabled, the output is additionally clipped to [1e-8, 1 - 1e-8] for numerical safety.

Notes

  • To obtain the density ratio \(r = p_A / p_B\) from the returned score \(s\), use \(r = s / (1 - s)\).

predict_with_onnx(dataset, scaler, model, batch_size=10000, softmax_output=False)[source]

Run batched ONNX inference on a dataset.

Scales the input features, runs inference through the ONNX runtime in fixed-size batches to avoid memory exhaustion on large datasets, and optionally applies a calibration model to the raw outputs.

Parameters:
  • dataset (pandas.DataFrame or numpy.ndarray) – Input data. Must contain the feature columns in the same order used during training. Additional columns are ignored if a DataFrame is passed, provided the scaler was fitted with named columns.

  • scaler (sklearn transformer) – Fitted scaler with a .transform() method. Applied to dataset before inference. Must be the same scaler saved alongside the model via save_model().

  • model (onnx.ModelProto or onnxruntime.InferenceSession) – The ONNX model to run inference with. If a ModelProto is passed, an InferenceSession is created internally. If an InferenceSession is passed, it is used directly.

  • batch_size (int, optional) – Number of events processed per inference call. Reduce this if GPU memory is limited. Default 10_000.

  • softmax_output (bool, optional) – If False (default), the output array is flattened to shape (n_events,). If True, the 2D output (n_events, n_classes) is preserved, as returned by a model exported with softmax.

Returns:

preds (numpy.ndarray) –

  • Shape (n_events,) when softmax_output=False.

  • Shape (n_events, n_classes) when softmax_output=True.

Dtype is float32.

Raises:

TypeError – If model is neither an onnx.ModelProto nor an onnxruntime.InferenceSession.

Parameters:

softmax_output (bool)

Notes

  • The ONNX session is configured with intra_op_num_threads=1 and inter_op_num_threads=1. This is intentional for HTCondor jobs where CPU resources are explicitly requested — unconstrained threading can cause resource contention across concurrent jobs on the same node.

  • CUDA execution is attempted first; the runtime falls back to CPU automatically if no compatible GPU is available.

convert_torch_to_onnx(lightning_model, input_dim, opset=17)[source]

Convert a trained PyTorch Lightning model to an onnx.ModelProto in memory, without permanently writing to disk.

Parameters:
  • lightning_model (DensityRatioLightning) – Trained model to convert. Must have parameters accessible via model.parameters() to determine the target device.

  • input_dim (int) – Number of input features. Used to construct a random dummy input tensor for graph tracing.

  • opset (int, optional) – ONNX opset version to target during export. Default 17.

Returns:

onnx.ModelProto – The exported ONNX model loaded into memory and ready to pass to predict_with_onnx().

Notes

  • A temporary .onnx file is written to the system’s temp directory during export and deleted immediately after loading. The returned object is fully in-memory.

  • Dynamic batch axes are set for both input and output so the returned model accepts any batch size at inference.

  • This function differs from save_model() in that it does not persist the model to a user-specified path and does not handle scaler serialisation. Use save_model() when you need to save model artefacts for later reuse.

convert_logLR_to_score(logLR)[source]

Convert a log-likelihood ratio to a probability score.

Maps \(\log(p_A / p_B)\) to the relative probability \(s = p_A / (p_A + p_B)\) via the sigmoid function:

\[s = \frac{1}{1 + e^{-\log(p_A/p_B)}}\]
Parameters:

logLR (numpy.ndarray) – Array of log-likelihood ratio values, unbounded in range.

Returns:

numpy.ndarray – Probability scores in the range (0, 1).

Notes

  • Use this function when the model was trained with use_log_loss=True, which causes the network to regress \(\log(p_A/p_B)\) directly rather than a classification score. The output of this function is compatible with downstream methods that expect scores in (0, 1).

  • To recover the density ratio from the score, use convert_score_to_ratio().

convert_score_to_ratio(score)[source]

Convert a probability score to a density ratio.

Given a classifier score \(s = p_A / (p_A + p_B)\), returns the density ratio \(r = p_A / p_B\) via:

\[r = \frac{s}{1 - s}\]
Parameters:

score (numpy.ndarray) – Probability scores in the range (0, 1). Values at exactly 0 or 1 will produce 0 or inf respectively — clip inputs to a safe range such as [1e-9, 1 - 1e-9] if needed.

Returns:

numpy.ndarray – Density ratio values \(p_A / p_B\), unbounded above.