Training API
Density-ratio estimation
Preselection network
Utility functions
- save_model(lightning_model, input_sample, path_to_save_model, scaler_instance, path_to_save_scaler, softmax_output=False)[source]
Export a trained PyTorch Lightning model to ONNX format and save the feature scaler to disk.
- Parameters:
lightning_model (
DensityRatioLightning) – Trained PyTorch Lightning model instance. Must be in eval mode or will be set to eval mode internally.input_sample (
torch.Tensor,shape (1,n_features)) – A representative input tensor used to trace the model graph during ONNX export. Values do not affect the exported weights — only the shape matters. Typicallytorch.randn((1, len(features))).path_to_save_model (
strorPath) – Destination path for the exported.onnxfile.scaler_instance (
sklearn transformer) – Fitted scaler object (e.g.ColumnTransformerwrappingStandardScaler) to be serialised alongside the model so that the same preprocessing is applied at inference time.path_to_save_scaler (
strorPath) – Destination path for the serialised scaler.binfile.softmax_output (
bool, optional) – IfTrue, wraps the model with a softmax layer before export so that the ONNX output is a probability vector rather than raw logits. Set toFalse(default) for density-ratio training, where the raw sigmoid output is used directly.
- Parameters:
- Return type:
None
Notes
The scaler is serialised with
joblib.dumpusing compression level 3.ONNX export uses opset version 17 with dynamic batch size axes, so the exported model accepts any batch size at inference.
When
softmax_output=True, the wrapper accessesmodel.mlpandmodel.outdirectly — these attribute names must exist on the Lightning model.
- load_trained_model(path_to_saved_model, path_to_saved_scaler)[source]
Load a previously saved ONNX model and its associated feature scaler.
- Parameters:
path_to_saved_model (
strorPath) – Path to the.onnxmodel file produced bysave_model().path_to_saved_scaler (
strorPath) – Path to the.binscaler file produced bysave_model().
- Returns:
scaler (
sklearn transformer) – The deserialised scaler object. Callscaler.transform(data)to preprocess new data consistently with the training pipeline.model (
onnx.ModelProto) – The loaded ONNX model graph. Pass this directly topredict_with_onnx()orpredict_with_model(), which will create anonnxruntime.InferenceSessioninternally on first call.
- Parameters:
Notes
The returned
modelis anonnx.ModelProto, not anonnxruntime.InferenceSession. The session is created lazily insidepredict_with_onnx()to avoid holding GPU/CPU resources when the model is not actively being used.
- predict_with_model(data, scaler, model, calibration_model=None, use_log_loss=False)[source]
Evaluate the trained density-ratio model on an input dataset.
Applies feature scaling, runs ONNX inference, optionally converts from log-likelihood-ratio space to a probability score, and optionally applies the calibration layer.
- Parameters:
data (
pandas.DataFrame) – Dataset to evaluate on.scaler (
sklearn transformer) – Fitted scaler with a.transform()method. Applied todatasetbefore inference. Must be the same scaler saved alongside the model viasave_model().model (
onnx.ModelProtooronnxruntime.InferenceSession) – The ONNX model to run inference with. If aModelProtois passed, anInferenceSessionis created internally. If anInferenceSessionis passed, it is used directly.calibration_model – Calibration model with
cali_predmethod.use_log_loss (
bool, optional) – IfTrue, the raw model output is interpreted as \(\log(p_A / p_B)\) and converted to a probability score via \(s = \sigma(\log r) = 1 / (1 + r^{-1})\) before returning. Must match theuse_log_losssetting used during training. DefaultFalse.
- Returns:
numpy.ndarray,shape (n_events,)– Predicted scores in the range(0, 1), where values close to1indicate high probability of belonging to hypothesis A (numerator) and values close to0indicate hypothesis B (denominator). If calibration is enabled, the output is additionally clipped to[1e-8, 1 - 1e-8]for numerical safety.
Notes
To obtain the density ratio \(r = p_A / p_B\) from the returned score \(s\), use \(r = s / (1 - s)\).
- predict_with_onnx(dataset, scaler, model, batch_size=10000, softmax_output=False)[source]
Run batched ONNX inference on a dataset.
Scales the input features, runs inference through the ONNX runtime in fixed-size batches to avoid memory exhaustion on large datasets, and optionally applies a calibration model to the raw outputs.
- Parameters:
dataset (
pandas.DataFrameornumpy.ndarray) – Input data. Must contain the feature columns in the same order used during training. Additional columns are ignored if a DataFrame is passed, provided the scaler was fitted with named columns.scaler (
sklearn transformer) – Fitted scaler with a.transform()method. Applied todatasetbefore inference. Must be the same scaler saved alongside the model viasave_model().model (
onnx.ModelProtooronnxruntime.InferenceSession) – The ONNX model to run inference with. If aModelProtois passed, anInferenceSessionis created internally. If anInferenceSessionis passed, it is used directly.batch_size (
int, optional) – Number of events processed per inference call. Reduce this if GPU memory is limited. Default10_000.softmax_output (
bool, optional) – IfFalse(default), the output array is flattened to shape(n_events,). IfTrue, the 2D output(n_events, n_classes)is preserved, as returned by a model exported with softmax.
- Returns:
preds (
numpy.ndarray) –Shape
(n_events,)whensoftmax_output=False.Shape
(n_events, n_classes)whensoftmax_output=True.
Dtype is
float32.- Raises:
TypeError – If
modelis neither anonnx.ModelProtonor anonnxruntime.InferenceSession.- Parameters:
softmax_output (bool)
Notes
The ONNX session is configured with
intra_op_num_threads=1andinter_op_num_threads=1. This is intentional for HTCondor jobs where CPU resources are explicitly requested — unconstrained threading can cause resource contention across concurrent jobs on the same node.CUDA execution is attempted first; the runtime falls back to CPU automatically if no compatible GPU is available.
- convert_torch_to_onnx(lightning_model, input_dim, opset=17)[source]
Convert a trained PyTorch Lightning model to an
onnx.ModelProtoin memory, without permanently writing to disk.- Parameters:
lightning_model (
DensityRatioLightning) – Trained model to convert. Must have parameters accessible viamodel.parameters()to determine the target device.input_dim (
int) – Number of input features. Used to construct a random dummy input tensor for graph tracing.opset (
int, optional) – ONNX opset version to target during export. Default17.
- Returns:
onnx.ModelProto– The exported ONNX model loaded into memory and ready to pass topredict_with_onnx().
Notes
A temporary
.onnxfile is written to the system’s temp directory during export and deleted immediately after loading. The returned object is fully in-memory.Dynamic batch axes are set for both input and output so the returned model accepts any batch size at inference.
This function differs from
save_model()in that it does not persist the model to a user-specified path and does not handle scaler serialisation. Usesave_model()when you need to save model artefacts for later reuse.
- convert_logLR_to_score(logLR)[source]
Convert a log-likelihood ratio to a probability score.
Maps \(\log(p_A / p_B)\) to the relative probability \(s = p_A / (p_A + p_B)\) via the sigmoid function:
\[s = \frac{1}{1 + e^{-\log(p_A/p_B)}}\]- Parameters:
logLR (
numpy.ndarray) – Array of log-likelihood ratio values, unbounded in range.- Returns:
numpy.ndarray– Probability scores in the range(0, 1).
Notes
Use this function when the model was trained with
use_log_loss=True, which causes the network to regress \(\log(p_A/p_B)\) directly rather than a classification score. The output of this function is compatible with downstream methods that expect scores in(0, 1).To recover the density ratio from the score, use
convert_score_to_ratio().
- convert_score_to_ratio(score)[source]
Convert a probability score to a density ratio.
Given a classifier score \(s = p_A / (p_A + p_B)\), returns the density ratio \(r = p_A / p_B\) via:
\[r = \frac{s}{1 - s}\]- Parameters:
score (
numpy.ndarray) – Probability scores in the range(0, 1). Values at exactly0or1will produce0orinfrespectively — clip inputs to a safe range such as[1e-9, 1 - 1e-9]if needed.- Returns:
numpy.ndarray– Density ratio values \(p_A / p_B\), unbounded above.