Backend module

The module provides a unified interface for running inference and implementations of different types of backends. Each concrete backend wraps some execution provider or technology, so you don’t need to do anything special, just create Backend instance and use it. There are also preconfigured backends (presets) that can be useful for finding the optimal backend and its parameters.

Backend interface

Interface for running inference. All backends implemented in ENOT Lite framework follow this basic interface.

class Backend

Interface for running inference.

abstract get_inputs()

Returns model input.

Return type

Any

abstract get_outputs()

Returns model output.

Return type

Any

abstract run(output_names, input_feed, run_options=None)

Computes the predictions.

Parameters
  • output_names – Names of the output.

  • input_feed – Dictionary { input_name: input_value }.

  • run_options – Native backend options.

Return type

Any

ORT Backend interface

Interface for backends based on ONNX Runtime.

class OrtBackend(model, provider_name, provider_options=None, sess_opt=None)

ORT based backend.

There are subclasses with different presets based on this backend:
  • OrtTensorrtBackend

  • OrtOpenVinoBackend

  • OrtCpuBackend

  • OrtCudaBackend

Parameters
__init__(model, provider_name, provider_options=None, sess_opt=None)
Parameters
  • model – Filename or serialized ONNX or ORT format model in a byte string.

  • provider_name (str) – Name of an ORT execution provider which will be used in inference.

  • provider_options (Optional[Dict]) – Execution provider options or None.

  • sess_opt (Optional[ort.SessionOptions]) – Session options or None.

get_inputs()

Returns model input.

get_outputs()

Returns model output.

run(output_names, input_feed, run_options=None)

Computes the predictions.

Parameters
  • output_names – Names of the output.

  • input_feed – Dictionary { input_name: input_value }.

  • run_options – Native backend options.

ORT CPU Backend

class OrtCpuBackend(model, provider_options=None, sess_opt=None)

ORT backend with a CPU execution provider.

Parameters
__init__(model, provider_options=None, sess_opt=None)
Parameters
  • model – Filename or serialized ONNX or ORT format model in a byte string.

  • provider_options (Optional[Dict]) – CPU execution provider options or None.

  • sess_opt (Optional[ort.SessionOptions]) – Session options or None.

ORT CUDA Backend

class OrtCudaBackend(model, provider_options=None, sess_opt=None)

ORT backend with a CUDA execution provider.

Parameters
__init__(model, provider_options=None, sess_opt=None)
Parameters
  • model – Filename or serialized ONNX or ORT format model in a byte string.

  • provider_options (Optional[Dict]) – CUDA execution provider options or None.

  • sess_opt (Optional[ort.SessionOptions]) – Session options or None.

ORT OpenVINO Backend

class OrtOpenvinoBackend(model, provider_options=None, sess_opt=None)

ORT backend with a OpenVINO execution provider.

There are presets based on this backend:
  • OrtOpenvinoFloatBackend

Parameters
__init__(model, provider_options=None, sess_opt=None)
Parameters
  • model – Filename or serialized ONNX or ORT format model in a byte string.

  • provider_options (Optional[Dict]) – OpenVINO execution provider options or None.

  • sess_opt (Optional[ort.SessionOptions]) – Session options or None.

ORT OpenVINO Float Backend

class OrtOpenvinoFloatBackend(model, sess_opt=None)

ORT backend with a OpenVINO execution provider configured with CPU_FP32 option.

Examples

>>> from enot_lite.backend import OrtOpenvinoFloatBackend
>>> backend = OrtOpenvinoFloatBackend('model.onnx')
>>> input_name = backend.get_inputs()[0].name
>>> backend.run(None, {input_name: sample})
Parameters

sess_opt (Optional[SessionOptions]) –

__init__(model, sess_opt=None)
Parameters
  • model – Filename or serialized ONNX or ORT format model in a byte string.

  • sess_opt (Optional[ort.SessionOptions]) – Session options or None.

ORT TensorRT Backend

class OrtTensorrtBackend(model, provider_options=None, sess_opt=None)

ORT backend with a TensorRT execution provider.

There are presets based on this backend:
  • OrtTensorrtFloatBackend

  • OrtTensorrtFloatOptimBackend

  • OrtTensorrtInt8Backend

Notes

The first launch of this backend can take a long time.

Parameters
__init__(model, provider_options=None, sess_opt=None)
Parameters
  • model – Filename or serialized ONNX or ORT format model in a byte string.

  • provider_options (Optional[Dict]) – TensorRT execution provider options or None.

  • sess_opt (Optional[ort.SessionOptions]) – Session options or None.

ORT TensorRT Float Backend

class OrtTensorrtFloatBackend(model, sess_opt=None)

ORT backend with a TensorRT execution provider with default options.

Notes

The first launch of this backend can take a long time.

Examples

>>> from enot_lite.backend import OrtTensorrtFloatBackend
>>> backend = OrtTensorrtFloatBackend('model.onnx')
>>> input_name = backend.get_inputs()[0].name
>>> backend.run(None, {input_name: sample})
Parameters

sess_opt (Optional[SessionOptions]) –

__init__(model, sess_opt=None)
Parameters
  • model – Filename or serialized ONNX or ORT format model in a byte string.

  • sess_opt (Optional[ort.SessionOptions]) – Session options or None.

ORT TensorRT Optimal Float Backend

class OrtTensorrtFloatOptimBackend(model, sess_opt=None)

ORT backend with a TensorRT execution provider configured with the optimal precision of floating point data type.

Notes

The first launch of this backend can take a long time.

Examples

>>> from enot_lite.backend import OrtTensorrtFloatOptimBackend
>>> backend = OrtTensorrtFloatOptimBackend('model.onnx')
>>> input_name = backend.get_inputs()[0].name
>>> backend.run(None, {input_name: sample})
Parameters

sess_opt (Optional[SessionOptions]) –

__init__(model, sess_opt=None)
Parameters
  • model – Filename or serialized ONNX or ORT format model in a byte string.

  • sess_opt (Optional[ort.SessionOptions]) – Session options or None.

ORT TensorRT Int-8 Backend

class OrtTensorrtInt8Backend(model, calibration_table, sess_opt=None, allow_graph_optimization=False)

ORT backend with a TensorRT execution provider configured with int8.

Notes

The first launch of this backend can take a long time.

Examples

>>> from enot_lite.backend import OrtTensorrtInt8Backend
>>> from enot_lite.calibration import CalibrationTableTensorrt
>>> from enot_lite.calibration import calibrate
>>> calibration_table = CalibrationTableTensorrt.from_file_flatbuffers('table.flatbuffers')  # Load from file.
>>> calibration_table = calibrate('model.onnx', dataloader)  # Create calibration table using Pytorch Dataloader.
>>> backend = OrtTensorrtInt8Backend('model.onnx', calibration_table)
>>> input_name = backend.get_inputs()[0].name
>>> backend.run(None, {input_name: sample})
Parameters
__init__(model, calibration_table, sess_opt=None, allow_graph_optimization=False)
Parameters
  • model – Filename or serialized ONNX or ORT format model in a byte string.

  • calibration_table (Union[CalibrationTableTensorrt, Union[str, Path]]) – Precalculated calibration table.

  • sess_opt (Optional[ort.SessionOptions]) – Session options or None.

  • allow_graph_optimization (bool) – This parameter allows to use graph optimization. Note, that graph optimization can broke quantization, use it only if you are sure that it is necessary. Optimization is only applied if sess_opt is passed.