Using ENOT Lite
ENOT Lite provides a unified interface for running neural network inference with various technologies.
To run neural network inference using ENOT Lite all you need to do is:
Create
Backend
instance by usingcreate()
method ofBackendFactory
.Pass your input data into created
Backend
instance by using__call__()
method to obtain prediction.
Here is an example which fully covers the basic usage of ENOT Lite:
1from enot_lite.backend import BackendFactory
2from enot_lite.type import BackendType
3
4backend = BackendFactory().create('path/to/model.onnx', BackendType.AUTO_CPU)
5prediction = backend(inputs)
At line 1 in example above we import
BackendFactory
which will be used to create an instance ofBackend
.At line 2 we import
BackendType
which allows to easily choose among various backends.At line 4 we create
Backend
instance by usingcreate()
method ofBackendFactory
. Createdbackend
is a wrap for your model which provides an easy-to-use interface for inference.And finally, at line 5 inference is done by passing
inputs
(it can be images, text or something else) intobackend
and the results are stored inprediction
variable.
BackendType
allows you to choose among various inference technologies,
so you don’t need to do anything special, just create Backend
instance by
BackendFactory
and use it for inference.
To refine Backend
setting, see BackendType
,
ModelType
.
- class BackendType(value)
Inference engine.
AUTO_CPU
- Automatically selects best CPU backend.AUTO_GPU
- Automatically selects best GPU backend.ORT_CPU
- ONNXRuntime CPU engine.ORT_CUDA
- ONNXRuntime CUDA engine.ORT_OPENVINO
- ONNXRuntime OpenVINO engine.ORT_TENSORRT
- ONNXRuntime TensorRT engine.ORT_TENSORRT_FP16
- ONNXRuntime TensorRT engine with FP16 precision.OPENVINO
- OpenVINO engine.TENSORRT
- TensorRT engine.TENSORRT_FP16
- TensorRT engine with FP16 precision.TORCH_CPU
- PyTorch engine (only for benchmark).TORCH_CUDA
- PyTorch engine (only for benchmark).
- class ModelType(value)
Allows to apply model-specific optimizations.
YOLO_V5_TOPK
- YOLOv5 model type with Top K output filtering.Required arguments: top_k, default value – 2048.
YOLO_V5_NMS
- YOLOv5 model type with Non Maximum Suppression postprocessing.Required arguments: max_output_boxes_per_class, iou_threshold and score_threshold.
max_output_boxes_per_class – maximum number of boxes to be selected per batch per class (int), default value – 300.iou_threshold – threshold for deciding whether boxes overlap too much with respect to IOU (float), default value – 0.45.score_threshold – threshold for deciding when to remove boxes based on score (float) default value – 0.25.top_k – optional filter to speed up inference, skip it or pass 2048 (int).Each row of the output has the following format:
batch_index, box_x_center, box_y_center, box_width, box_height, predicted_class, class_probability
.
** Required arguments should be passed to
create()
method.
- class Device(value)
Device type.
CPU
GPU
- class BackendFactory
BackendFactory produces
Backend
instances viacreate()
method.Note, BackendFactory is a singleton, to get an instance call constructor:
BackendFactory()
.Examples
Inference on YOLOv5s model on GPU with YOLO-specific optimizations:
>>> from enot_lite.backend import BackendFactory >>> from enot_lite.type import BackendType, ModelType ... >>> backend = BackendFactory().create( ... model='path/to/yolov5s.onnx', ... backend_type=BackendType.AUTO_GPU, ... model_type=ModelType.YOLO_V5, ... input_example=np.ones((1, 3, 640, 640), dtype=np.float32), ... ) >>> prediction = backend(inputs)
- create(model, backend_type, model_type=None, input_example=None, **kwargs)
Creates
Backend
instance.- Parameters
model (Path, str or ModelProto) – Model for inference. It can be path to ONNX or ModelProto.
backend_type (BackendType) – The type of the backend to be created. Allows to choose among different inference technologies.
model_type (ModelType or None) – Specifying the type of the model allows to apply model-specific optimizations.
input_example (Any or None) – Example of input data (only required if the model has dynamic axes). Text, images or some other typical input for model.
**kwargs – Additional keyword arguments.
- Returns
Backend
instance ready-to-use for inference.- Return type
- class Backend
Interface for running inference.
All backends implemented in ENOT Lite framework follow this interface.
- __call__(inputs, outputs=None, **kwargs)
Computes the predictions for given inputs.
- Parameters
inputs (Any) – Model input. Input can be a single value or dictionary or list of values. Allowed types of values in the inputs are: numpy array, torch tensor or ort value. It is recomended to use numpy arrays as input values for CPU backends and torch tensors on GPU (
device='cuda'
) for GPU backends. Right choice of input values device prevents unnecessary copy, saves the resources and improves inference latency.outputs (Optional[Union[Dict, List]]) – Preallocated model output. Output can be dictionary or list of values. Allowed types of values in the outputs are: numpy array, torch tensor or ort value. This parameter is useful for GPU backends: preallocating arrays for outputs manually allows to save memory and reduce allocation time. It is ignored by
OPENVINO
backend.
- Returns
Prediction.
- Return type
Any
Examples
>>> backend(input_0) # For models with only one input. >>> backend([input_0, input_1, ..., input_n]) # For models with several inputs. >>> backend((input_0, input_1, ..., input_n)) # Is equivalent to previous one. >>> backend({ ... 'input_name_0': input_0, # Explicitly specifying mapping between ... 'input_name_1': input_1, # input names and input data. ... ... ... 'input_name_n': input_n, ... })