Backend module¶
The module provides a unified interface for running inference and implementations of different types of backends.
Each concrete backend wraps some execution provider or technology, so you don’t need to do anything special,
just create Backend
instance and use it.
There are also preconfigured backends (presets) that can be useful for finding the optimal backend and its parameters.
Note
With this version, we’ve introduced input_example
parameter, which is used to infer shapes for models
with dynamic axes and allows us to enable useful optimizations.
This parameter is mandatory for models with dynamic axes and recommended in all other cases.
List of backends for which this parameter was introduced:
OrtOpenvinoBackend
OrtOpenvinoFloatBackend
OrtOpenvinoInt8Backend
OrtTensorrtBackend
OrtTensorrtAutoBackend
OrtTensorrtAutoFp16Backend
OrtTensorrtInt8Backend
OpenvinoBackend
Backend interface¶
Interface for running inference. All backends implemented in ENOT Lite framework follow this basic interface.
- class Backend¶
Interface for running inference.
- abstract run(inputs, **kwargs)¶
Computes the predictions.
- Parameters
inputs (Any) – Model input. There are three variants to pass data into this method, see examples.
**kwargs – Native backend options.
Examples
>>> sess.run(input_0) # For models with only one input. >>> sess.run([input_0, input_1, ..., input_n]) >>> sess.run((input_0, input_1, ..., input_n)) # Is equivalent to previous one. >>> sess.run({'input_name_0': input_0, 'input_name_1': input_1, ... , 'input_name_n': input_n})
- Return type
ORT Backend interface¶
Interface for backends based on ONNX Runtime.
- class OrtBackend(model, provider_name, provider_options=None, session_options=None, **kwargs)¶
Generic version of
ORT
based backend.- There are subclasses with different presets based on this backend:
OrtTensorrtBackend
OrtOpenVinoBackend
OrtCpuBackend
OrtCudaBackend
- Parameters
- __init__(model, provider_name, provider_options=None, session_options=None, **kwargs)¶
- Parameters
model (TModelOrPath) – Filename or serialized
ONNX
format model in a byte string.provider_name (str) – Name of an
ORT
execution provider which will be used in inference.provider_options (Optional[Dict]) – Execution provider options or None.
session_options (Optional[ort.SessionOptions]) – Session options or None.
- run(inputs, **kwargs)¶
Computes the predictions.
- Parameters
inputs (Any) – Model input. There are three variants to pass data into this method, see examples.
**kwargs – Native backend options.
Examples
>>> sess.run(input_0) # For models with only one input. >>> sess.run([input_0, input_1, ..., input_n]) >>> sess.run((input_0, input_1, ..., input_n)) # Is equivalent to previous one. >>> sess.run({'input_name_0': input_0, 'input_name_1': input_1, ... , 'input_name_n': input_n})
ORT CPU Backend¶
- class OrtCpuBackend(model, inter_op_num_threads=None, intra_op_num_threads=None)¶
ORT
backend with aCPU
execution provider.- Parameters
- __init__(model, inter_op_num_threads=None, intra_op_num_threads=None)¶
- Parameters
model (TModelOrPath) – Filename or serialized
ONNX
format model in a byte string.inter_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution of the graph (across nodes). Default is None (will be set by backend).
intra_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution within nodes. Default is None (will be set by backend).
ORT CUDA Backend¶
- class OrtCudaBackend(model, provider_options=None)¶
ORT
backend with aCUDA
execution provider.
ORT OpenVINO Backend¶
- class OrtOpenvinoBackend(model, provider_options=None, input_example=None, inter_op_num_threads=None, intra_op_num_threads=None, **kwargs)¶
ORT
backend with aOpenVINO
execution provider.- There are presets based on this backend:
OrtOpenvinoFloatBackend
OrtOpenvinoInt8Backend
- Parameters
- __init__(model, provider_options=None, input_example=None, inter_op_num_threads=None, intra_op_num_threads=None, **kwargs)¶
- Parameters
model (TModelOrPath) – Filename or serialized
ONNX
format model in a byte string.provider_options (Optional[Dict[str, Any]]) –
OpenVINO
ORT
provider options or None.input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.
inter_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution of the graph (across nodes). Default is None (will be set by backend).
intra_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution within nodes. Default is None (will be set by backend).
ORT OpenVINO Float Backend¶
- class OrtOpenvinoFloatBackend(model, input_example=None, inter_op_num_threads=None, intra_op_num_threads=None, openvino_num_threads=None)¶
ORT
backend with aOpenVINO
execution provider configured withCPU_FP32
option.Examples
>>> from enot_lite.backend import OrtOpenvinoFloatBackend >>> backend = OrtOpenvinoFloatBackend('model.onnx', input_example=sample) >>> backend.run(sample)
- Parameters
- __init__(model, input_example=None, inter_op_num_threads=None, intra_op_num_threads=None, openvino_num_threads=None)¶
- Parameters
model (
Union
[str
,Path
,ModelProto
]) – Filename or serializedONNX
format model in a byte string.input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.
inter_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution of the graph (across nodes). Default is None (will be set by backend).
intra_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution within nodes. Default is None (will be set by backend).
openvino_num_threads (Optional[int]) – Lenght of async task queue which is used in OpenVINO backend. Increase of this parameter can both improve performance and degrade it. Change it last to fine tune performance. Default is None (will be set by backend).
ORT OpenVINO Int-8 Backend¶
- class OrtOpenvinoInt8Backend(model, input_example=None, inter_op_num_threads=None, intra_op_num_threads=None, openvino_num_threads=None)¶
ORT
backend with aOpenVINO
execution provider configured withint8
precision.Examples
>>> from enot_lite.backend import OrtOpenvinoInt8Backend >>> backend = OrtOpenvinoInt8Backend('model.onnx', input_example=sample) >>> backend.run(sample)
- Parameters
- __init__(model, input_example=None, inter_op_num_threads=None, intra_op_num_threads=None, openvino_num_threads=None)¶
- Parameters
model (
Union
[str
,Path
,ModelProto
]) – Filename or serializedONNX
format model in a byte string.input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.
inter_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution of the graph (across nodes). Default is None (will be set by backend).
intra_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution within nodes. Default is None (will be set by backend).
openvino_num_threads (Optional[int]) – Lenght of async task queue which is used in OpenVINO backend. Increase of this parameter can both improve performance and degrade it. Change it last to fine tune performance. Default is None (will be set by backend).
ORT TensorRT Backend¶
- class OrtTensorrtBackend(model, provider_options=None, input_example=None, **kwargs)¶
Base backend with
TensorRT
execution provider.Provides cache implementation.
Notes
The first launch of this backend can take a long time.
- Parameters
- __init__(model, provider_options=None, input_example=None, **kwargs)¶
- Parameters
model (TModelOrPath) – Filename or serialized
ONNX
format model in a byte string.provider_options (Optional[Dict]) –
TensorRT
execution provider options or None.input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.
ORT TensorRT Auto Backend¶
- class OrtTensorrtAutoBackend(model, input_example=None, trt_fp16_enable=False)¶
ORT
backend with aTensorRT
execution provider with optimizations which will be configured automatically.Notes
The first launch of this backend can take a long time.
Examples
>>> from enot_lite.backend import OrtTensorrtAutoBackend >>> backend = OrtTensorrtAutoBackend('model.onnx', input_example=sample) >>> backend.run(sample)
- Parameters
- __init__(model, input_example=None, trt_fp16_enable=False)¶
- Parameters
model (TModelOrPath) – Filename or serialized
ONNX
format model in a byte string.input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.
trt_fp16_enable (bool) – Whether to use fp16 precision. False by default. If GPU does not support fp16 this option will be ignored.
ORT TensorRT Auto FP16 Backend¶
- class OrtTensorrtAutoFp16Backend(model, input_example=None)¶
OrtTensorrtAutoBackend
with enabled option: trt_fp16_enable.Use
OrtTensorrtAutoFp16Backend
to explicitly set FP16 precision.Notes
The first launch of this backend can take a long time.
Examples
>>> from enot_lite.backend import OrtTensorrtAutoFp16Backend >>> backend = OrtTensorrtAutoFp16Backend('model.onnx', input_example=sample) >>> backend.run(sample)
- __init__(model, input_example=None)¶
- Parameters
model (TModelOrPath) – Filename or serialized
ONNX
format model in a byte string.input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.
ORT TensorRT Int-8 Backend¶
- class OrtTensorrtInt8Backend(model, calibration_table, input_example=None, trt_fp16_enable=True)¶
ORT
backend with aTensorRT
execution provider configured with int8 calibration table.Notes
The first launch of this backend can take a long time.
Examples
>>> from enot_lite.backend import OrtTensorrtInt8Backend >>> from enot_lite.calibration import CalibrationTableTensorrt >>> from enot_lite.calibration import calibrate >>> calibration_table = CalibrationTableTensorrt.from_file_flatbuffers('table.flatbuffers') # Load from file. >>> calibration_table = calibrate('model.onnx', dataloader) # Create calibration table using Pytorch Dataloader. >>> backend = OrtTensorrtInt8Backend('model.onnx', calibration_table, input_example=sample) >>> backend.run(sample)
- Parameters
- __init__(model, calibration_table, input_example=None, trt_fp16_enable=True)¶
- Parameters
model – Filename or serialized
ONNX
format model in a byte string.calibration_table (Union[CalibrationTableTensorrt, Union[str, Path]]) – Precalculated calibration table.
input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.
trt_fp16_enable (bool) – Whether to use fp16 precision for non quantized part. True by default. If GPU does not support fp16 this option will be ignored.
ORT TensorRT Int-8 QDQ Backend¶
OpenVINO Backend¶
- class OpenvinoBackend(model, input_example=None, **kwargs)¶
Pure
OpenVINO
backend withoutORT
overhead.- __init__(model, input_example=None, **kwargs)¶
- Parameters
model (TModelOrPath) – Filename or serialized
ONNX
format model in a byte string.input_example (Optional[Any]) – Example of input data, only required if the model has dynamic axes. None by default.
- run(inputs, **kwargs)¶
Computes the predictions.
- Parameters
inputs (Any) – Model input. There are three variants to pass data into this method, see examples.
**kwargs – Native backend options.
Examples
>>> sess.run(input_0) # For models with only one input. >>> sess.run([input_0, input_1, ..., input_n]) >>> sess.run((input_0, input_1, ..., input_n)) # Is equivalent to previous one. >>> sess.run({'input_name_0': input_0, 'input_name_1': input_1, ... , 'input_name_n': input_n})
- Return type