Backend module¶

The module provides a unified interface for running inference and implementations of different types of backends. Each concrete backend wraps some execution provider or technology, so you don’t need to do anything special, just create Backend instance and use it. There are also preconfigured backends (presets) that can be useful for finding the optimal backend and its parameters.

Note

With this version, we’ve introduced input_example parameter, which is used to infer shapes for models with dynamic axes and allows us to enable useful optimizations.

This parameter is mandatory for models with dynamic axes and recommended in all other cases.

List of backends for which this parameter was introduced:

OrtOpenvinoBackend

OrtOpenvinoFloatBackend

OrtOpenvinoInt8Backend

OrtTensorrtBackend

OrtTensorrtFloatBackend

OrtTensorrtFloatOptimBackend

OrtTensorrtInt8Backend

OrtTensorrtInt8QDQBackend

OpenvinoBackend

Backend interface¶

Interface for running inference. All backends implemented in ENOT Lite framework follow this basic interface.

class Backend¶

Interface for running inference.

abstract get_inputs()¶

Returns model input.

Return type: List[TensorInfo]

abstract get_outputs()¶

Returns model output.

Return type: List[TensorInfo]

abstract run(inputs, **kwargs)¶

Computes the predictions.

Parameters

inputs (Any) – Model input. There are three variants to pass data into this method, see examples.
**kwargs – Native backend options.

Examples

>>> sess.run(input_0)  # For models with only one input.
>>> sess.run([input_0, input_1, ..., input_n])
>>> sess.run((input_0, input_1, ..., input_n))  # Is equivalent to previous one.
>>> sess.run({'input_name_0': input_0, 'input_name_1': input_1, ... , 'input_name_n': input_n})

Return type: Any

ORT Backend interface¶

Interface for backends based on ONNX Runtime.

class OrtBackend(model, provider_name, provider_options=None, session_options=None, **kwargs)¶

Generic version of ORT based backend.

There are subclasses with different presets based on this backend:

OrtTensorrtBackend
OrtOpenVinoBackend
OrtCpuBackend
OrtCudaBackend

Parameters

model (Union[str, Path, ModelProto]) –
provider_name (str) –
provider_options (Optional[Dict]) –
session_options (Optional[SessionOptions]) –

__init__(model, provider_name, provider_options=None, session_options=None, **kwargs)¶

Parameters

model (TModelOrPath) – Filename or serialized ONNX format model in a byte string.
provider_name (str) – Name of an ORT execution provider which will be used in inference.
provider_options (Optional[Dict]) – Execution provider options or None.
session_options (Optional[ort.SessionOptions]) – Session options or None.

get_inputs()¶

Returns model input.

Return type: List[TensorInfo]

get_outputs()¶

Returns model output.

Return type: List[TensorInfo]

run(inputs, **kwargs)¶

Computes the predictions.

Parameters

inputs (Any) – Model input. There are three variants to pass data into this method, see examples.
**kwargs – Native backend options.

Examples

>>> sess.run(input_0)  # For models with only one input.
>>> sess.run([input_0, input_1, ..., input_n])
>>> sess.run((input_0, input_1, ..., input_n))  # Is equivalent to previous one.
>>> sess.run({'input_name_0': input_0, 'input_name_1': input_1, ... , 'input_name_n': input_n})

ORT CPU Backend¶

class OrtCpuBackend(model, inter_op_num_threads=None, intra_op_num_threads=None)¶

ORT backend with a CPU execution provider.

Parameters

model (Union[str, Path, ModelProto]) –
inter_op_num_threads (Optional[int]) –
intra_op_num_threads (Optional[int]) –

__init__(model, inter_op_num_threads=None, intra_op_num_threads=None)¶

Parameters

model (TModelOrPath) – Filename or serialized ONNX format model in a byte string.
inter_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution of the graph (across nodes). Default is None (will be set by backend).
intra_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution within nodes. Default is None (will be set by backend).

ORT CUDA Backend¶

class OrtCudaBackend(model, provider_options=None)¶

ORT backend with a CUDA execution provider.

Parameters

model (Union[str, Path, ModelProto]) –
provider_options (Optional[Dict[str, Any]]) –

__init__(model, provider_options=None)¶

Parameters

model (TModelOrPath) – Filename or serialized ONNX format model in a byte string.
provider_options (Optional[Dict[str, Any]]) – CUDA ORT provider options or None.

ORT OpenVINO Backend¶

class OrtOpenvinoBackend(model, provider_options=None, input_example=None, inter_op_num_threads=None, intra_op_num_threads=None, **kwargs)¶

ORT backend with a OpenVINO execution provider.

There are presets based on this backend:

OrtOpenvinoFloatBackend
OrtOpenvinoInt8Backend

Parameters

model (Union[str, Path, ModelProto]) –
provider_options (Optional[Dict[str, Any]]) –
input_example (Optional[Any]) –
inter_op_num_threads (Optional[int]) –
intra_op_num_threads (Optional[int]) –

__init__(model, provider_options=None, input_example=None, inter_op_num_threads=None, intra_op_num_threads=None, **kwargs)¶

Parameters

model (TModelOrPath) – Filename or serialized ONNX format model in a byte string.
provider_options (Optional[Dict[str, Any]]) – OpenVINO ORT provider options or None.
input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.
inter_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution of the graph (across nodes). Default is None (will be set by backend).
intra_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution within nodes. Default is None (will be set by backend).

ORT OpenVINO Float Backend¶

class OrtOpenvinoFloatBackend(model, input_example=None, inter_op_num_threads=None, intra_op_num_threads=None, openvino_num_threads=None)¶

ORT backend with a OpenVINO execution provider configured with CPU_FP32 option.

Examples

>>> from enot_lite.backend import OrtOpenvinoFloatBackend
>>> backend = OrtOpenvinoFloatBackend('model.onnx', input_example=sample)
>>> backend.run(sample)

Parameters

model (Union[str, Path, ModelProto]) –
input_example (Optional[Any]) –
inter_op_num_threads (Optional[int]) –
intra_op_num_threads (Optional[int]) –
openvino_num_threads (Optional[int]) –

__init__(model, input_example=None, inter_op_num_threads=None, intra_op_num_threads=None, openvino_num_threads=None)¶

Parameters

model (Union[str, Path, ModelProto]) – Filename or serialized ONNX format model in a byte string.
input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.
inter_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution of the graph (across nodes). Default is None (will be set by backend).
intra_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution within nodes. Default is None (will be set by backend).
openvino_num_threads (Optional[int]) – Lenght of async task queue which is used in OpenVINO backend. Increase of this parameter can both improve performance and degrade it. Change it last to fine tune performance. Default is None (will be set by backend).

ORT OpenVINO Int-8 Backend¶

class OrtOpenvinoInt8Backend(model, input_example=None, inter_op_num_threads=None, intra_op_num_threads=None, openvino_num_threads=None)¶

ORT backend with a OpenVINO execution provider configured with int8 precision.

Examples

>>> from enot_lite.backend import OrtOpenvinoInt8Backend
>>> backend = OrtOpenvinoInt8Backend('model.onnx', input_example=sample)
>>> backend.run(sample)

Parameters

model (Union[str, Path, ModelProto]) –
input_example (Optional[Any]) –
inter_op_num_threads (Optional[int]) –
intra_op_num_threads (Optional[int]) –
openvino_num_threads (Optional[int]) –

__init__(model, input_example=None, inter_op_num_threads=None, intra_op_num_threads=None, openvino_num_threads=None)¶

Parameters

model (Union[str, Path, ModelProto]) – Filename or serialized ONNX format model in a byte string.
input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.
inter_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution of the graph (across nodes). Default is None (will be set by backend).
intra_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution within nodes. Default is None (will be set by backend).
openvino_num_threads (Optional[int]) – Lenght of async task queue which is used in OpenVINO backend. Increase of this parameter can both improve performance and degrade it. Change it last to fine tune performance. Default is None (will be set by backend).

ORT TensorRT Backend¶

class OrtTensorrtBackend(model, provider_options=None, input_example=None, **kwargs)¶

ORT backend with a TensorRT execution provider.

There are presets based on this backend:

OrtTensorrtFloatBackend
OrtTensorrtFloatOptimBackend
OrtTensorrtInt8Backend
OrtTensorrtInt8QDQBackend

Notes

The first launch of this backend can take a long time.

Parameters

model (Union[str, Path, ModelProto]) –
provider_options (Optional[Dict[str, Any]]) –
input_example (Optional[Any]) –

__init__(model, provider_options=None, input_example=None, **kwargs)¶

Parameters

model (TModelOrPath) – Filename or serialized ONNX format model in a byte string.
provider_options (Optional[Dict]) – TensorRT execution provider options or None.
input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.

ORT TensorRT Float Backend¶

class OrtTensorrtFloatBackend(model, input_example=None)¶

ORT backend with a TensorRT execution provider with default options.

Notes

The first launch of this backend can take a long time.

Examples

>>> from enot_lite.backend import OrtTensorrtFloatBackend
>>> backend = OrtTensorrtFloatBackend('model.onnx', input_example=sample)
>>> backend.run(sample)

Parameters

model (Union[str, Path, ModelProto]) –
input_example (Optional[Any]) –

__init__(model, input_example=None)¶

Parameters

model (TModelOrPath) – Filename or serialized ONNX format model in a byte string.
input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.

ORT TensorRT Optimal Float Backend¶

class OrtTensorrtFloatOptimBackend(model, input_example=None)¶

ORT backend with a TensorRT execution provider configured with the optimal precision of floating point data type.

Notes

The first launch of this backend can take a long time.

Examples

>>> from enot_lite.backend import OrtTensorrtFloatOptimBackend
>>> backend = OrtTensorrtFloatOptimBackend('model.onnx', input_example=sample)
>>> backend.run(sample)

Parameters

model (Union[str, Path, ModelProto]) –
input_example (Optional[Any]) –

__init__(model, input_example=None)¶

Parameters

model (TModelOrPath) – Filename or serialized ONNX format model in a byte string.
input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.

ORT TensorRT Int-8 Backend¶

class OrtTensorrtInt8Backend(model, calibration_table, input_example=None, trt_fp16_enable=True)¶

ORT backend with a TensorRT execution provider configured with int8.

Notes

The first launch of this backend can take a long time.

Examples

>>> from enot_lite.backend import OrtTensorrtInt8Backend
>>> from enot_lite.calibration import CalibrationTableTensorrt
>>> from enot_lite.calibration import calibrate
>>> calibration_table = CalibrationTableTensorrt.from_file_flatbuffers('table.flatbuffers')  # Load from file.
>>> calibration_table = calibrate('model.onnx', dataloader)  # Create calibration table using Pytorch Dataloader.
>>> backend = OrtTensorrtInt8Backend('model.onnx', calibration_table, input_example=sample)
>>> backend.run(sample)

Parameters

calibration_table (Union[CalibrationTableTensorrt, str, Path]) –
input_example (Optional[Any]) –
trt_fp16_enable (bool) –

__init__(model, calibration_table, input_example=None, trt_fp16_enable=True)¶

Parameters

model – Filename or serialized ONNX format model in a byte string.
calibration_table (Union[CalibrationTableTensorrt, Union[str, Path]]) – Precalculated calibration table.
input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.
trt_fp16_enable (bool) – Whether to use fp16 precision for non quantized part. True by default. If GPU does not support fp16 this option will be ignored.

ORT TensorRT Int-8 QDQ Backend¶

class OrtTensorrtInt8QDQBackend(model, input_example=None, trt_fp16_enable=True)¶

ORT backend with a TensorRT execution provider configured with int8. All quantization parameters should be embedded in the QuantizedLinear/DequantizeLinear nodes, see TrtFakeQuantizedModel in ENOT quantization module.

Notes

The first launch of this backend can take a long time.

Examples

>>> from enot_lite.backend import OrtTensorrtInt8QDQBackend
>>> backend = OrtTensorrtInt8QDQBackend('model.onnx', input_example=sample)
>>> backend.run(sample)

Parameters

input_example (Optional[Any]) –
trt_fp16_enable (bool) –

__init__(model, input_example=None, trt_fp16_enable=True)¶

Parameters

model – Filename or serialized ONNX format model in a byte string.
input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.
trt_fp16_enable (bool) – Whether to use fp16 precision for non quantized part. True by default. If GPU does not support fp16 this option will be ignored.

OpenVINO Backend¶

class OpenvinoBackend(model, input_example=None, **kwargs)¶

Pure OpenVINO backend without ORT overhead.

Parameters

model (Union[str, Path, ModelProto]) –
input_example (Optional[Any]) –

__init__(model, input_example=None, **kwargs)¶

Parameters

model (TModelOrPath) – Filename or serialized ONNX format model in a byte string.
input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.

get_inputs()¶

Returns model input.

Return type: Any

get_outputs()¶

Returns model output.

Return type: Any

run(inputs, **kwargs)¶

Computes the predictions.

Parameters

inputs (Any) – Model input. There are three variants to pass data into this method, see examples.
**kwargs – Native backend options.

Examples

>>> sess.run(input_0)  # For models with only one input.
>>> sess.run([input_0, input_1, ..., input_n])
>>> sess.run((input_0, input_1, ..., input_n))  # Is equivalent to previous one.
>>> sess.run({'input_name_0': input_0, 'input_name_1': input_1, ... , 'input_name_n': input_n})

Return type: Any