Benchmark module
- class Benchmark(batch_size, onnx_model=None, onnx_input=None, no_data_transfer=False, enot_backend_runner=<class 'enot_lite.benchmark.backend_runner.EnotBackendRunner'>, torch_model=None, torch_input=None, torch_cpu_runner=<class 'enot_lite.benchmark.backend_runner.TorchCpuRunner'>, torch_cuda_runner=<class 'enot_lite.benchmark.backend_runner.TorchCudaRunner'>, backends=Device.CPU, warmup=50, repeat=50, number=50, inter_op_num_threads=None, intra_op_num_threads=None, openvino_num_threads=None, verbose=True)
Open extendable tool for benchmarking inference.
It supports ENOT Lite and PyTorch backends out of the box, but can be extended for your own backends.
It measures inference time of ONNX models on ENOT Lite backends, PyTorch native inference time and transforms it to FPS (frame-per-second, the bigger the better) metric.
All benchmark source code is available in
benchmark
module.- Parameters
no_data_transfer (
bool
) –enot_backend_runner (
Optional
[Type
[BackendRunner
]]) –torch_cpu_runner (
Optional
[Type
[BackendRunner
]]) –torch_cuda_runner (
Optional
[Type
[BackendRunner
]]) –warmup (
int
) –repeat (
int
) –number (
int
) –verbose (
bool
) –
- __init__(batch_size, onnx_model=None, onnx_input=None, no_data_transfer=False, enot_backend_runner=<class 'enot_lite.benchmark.backend_runner.EnotBackendRunner'>, torch_model=None, torch_input=None, torch_cpu_runner=<class 'enot_lite.benchmark.backend_runner.TorchCpuRunner'>, torch_cuda_runner=<class 'enot_lite.benchmark.backend_runner.TorchCudaRunner'>, backends=Device.CPU, warmup=50, repeat=50, number=50, inter_op_num_threads=None, intra_op_num_threads=None, openvino_num_threads=None, verbose=True)
- Parameters
batch_size (Optional[int]) – Batch size value. This value should equals to
onnx_input
andtorch_input
batch sizes. Pass None if the model input does not contain batch size (for example natural language processing networks), in this casebatch_size
will be 1.onnx_model (str, ModelProto or None) – Path to ONNX model for benchmarking on ENOT Lite backends. Omit this parameter to skip benchmarking of ENOT Lite backends.
onnx_input (Optional[Any]) – Input for ONNX model. If the model has only one input pass input as one value:
onnx_input=np.random(...)
for example. There are two options for passing the input when model has multiple inputs: as list (or tuple), or as mapping (dict), where the keys are input names, values are input tensors. Keys correspond to input names, values to input values.no_data_transfer (bool) – Whether to do data transfer for every run (from CPU to GPU and back from GPU to CPU) or not. This parameter is ignored for CPU backends.
enot_backend_runner (Optional[Type[BackendRunner]]) –
BackendRunner
subclass that will be used for ENOT Lite backends. Default isEnotBackendRunner
.torch_model (Optional[Any]) – PyTorch model for native benchmarking (torch.nn.Module). Omit this parameter to skip benchmarking of PyTorch backends.
torch_input (Optional[Any]) – Input for PyTorch model.
torch_cpu_runner (Optional[Type[BackendRunner]]) –
BackendRunner
subclass that will be used for PyTorch backends. Default isTorchCpuRunner
.torch_cuda_runner (Optional[Type[BackendRunner]]) –
BackendRunner
subclass that will be used for PyTorch backends. Default isTorchCudaRunner
.backends (Union[Device, List[Union[Tuple, BackendType]]]) – Selects backends for benchmarking:
Device.CPU
- all CPU backends,Device.GPU
- all GPU backends, Also you can specify backends by type or Tuple, for example:[BackendType.ORT_CUDA, (BackendType.ORT_TENSORRT, ModelType.YOLO_V5)]
. Default isDevice.CPU
.warmup (int) – Number of warmup iterations (see
BackendBenchmark
). Default is 50.repeat (int) – Number of repeat iterations (see
BackendBenchmark
). Default is 50.number (int) – Number of iterations in each
repeat
iteration (seeBackendBenchmark
). Default is 50.inter_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution of the graph (across nodes). Default is None (will be set by backend automatically). Affects on CPU backends only.
intra_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution within nodes. Default is None (will be set by backend automatically). Affects on CPU backends only.
openvino_num_threads (Optional[int]) – Lenght of async task queue which is used in OpenVINO backend. Increase of this parameter can both improve performance and degrade it. Change it last to fine tune performance. Default is None (will be set by backend). Affects on CPU backends only.
verbose (bool) – Print status while benchmarking or not. Default is True.
Examples
ResNet-50 benchmarking.
>>> import numpy as np >>> import torch >>> from torchvision.models import resnet50 >>> from enot_lite.benchmark import Benchmark >>> from enot_lite.type import BackendType
Create PyTorch ResNet-50 model.
>>> resnet50 = resnet50() >>> resnet50.cpu() >>> resnet50.eval() >>> torch_input=torch.ones((8, 3, 224, 224)).cpu()
Export it to ONNX.
>>> torch.onnx.export( ... model=resnet50, ... args=torch_input, ... f='resnet50.onnx', ... opset_version=11, ... input_names=['input'], >>> )
Configure
Benchmark
.>>> benchmark = Benchmark( ... batch_size=8, ... onnx_model='resnet50.onnx', ... onnx_input={'input': np.ones((8, 3, 224, 224), dtype=np.float32)}, ... torch_model=resnet50, ... torch_input=torch_input, ... backends=[BackendType.ORT_CUDA, BackendType.ORT_TENSORRT_FP16], >>> )
Run
Benchmark
and print results.>>> benchmark.run() >>> benchmark.print_results()
- print_results()
Prints table with benchmarking results and environment information.
- Return type
- property results: Dict
Benchmarking results.
- Returns
Keys are backend names, values are tuples with the following structure: FPS, normalized time in ms per sample, mean time in ms per batch, standard deviation in ms. Value can be None if benchmarking failed.
- Return type
Dict
Building blocks
- class BackendBenchmark(warmup, repeat, number)
Benchmarks inference time of backends.
This class is a building block of
Benchmark
, it measures inference time for one backend.All components: inference backend, model and input data should be wrapped into an object that implements
BackendRunner
interface to work withBackendBenchmark
. We have already wrapped our and PyTorch backends, but you can extend benchmark by adding and registering builder inBackendRunnerFactory
for your own backend.- __init__(warmup, repeat, number)
To understand ctor parameters see
benchmark()
method.- Parameters
warmup (int) – Number of warmup steps before benchmarking.
repeat (int) – Number of repeat steps (see
timeit.Timer
).number (int) – Number of inference calls in each
repeat
step (seetimeit.Timer
).
- benchmark(backend_runner)
Benchmarks backend using
BackendRunner
interface.- There are two main steps:
warmup: calls
run()
method ofBackendRunner
warmup
timesbechmark: calls
run()
methodnumber × repeat
times and stores execution time
The results of benchmarking are measured mean time per one batch (in ms) and standard deviation per one batch.
All measurements in
benchmark
step are done with the help oftimeit.Timer
object.- Parameters
backend_runner (BackendRunner) – Backend, model and input data wrapped in
BackendRunner
interface.- Returns
mean time per one batch (in ms), standard deviation per one batch (in ms).
- Return type
- class BackendRunnerFactory
Produces
BackendRunner
objects.To extend
Benchmark
for your own backend, create builder and register it with the help ofregister_builder()
. Builder is a callable object that wraps your backend, model and input data intoBackendRunner
object. You can see how we wrapped our and PyTorch backends inenot_lite.benchmark.backend_runner_builder
module.Note, BackendRunnerFactory is a singleton, to get an instance call constructor:
BackendRunnerFactory()
.- __init__()
- create(backend_type, **kwargs)
Creates new
BackendRunner
object by using registred builder forbackend_type
.- Parameters
backend_type (BackendType) – The type of the backend which factory should wrap and produce.
**kwargs – Arbitrary keyword arguments that will be passed to particular builder. This arguments should contain all information for successful object construction.
Benchmark
forms and passes these arguments toBackendRunnerFactory
.
- Returns
- Return type
- register_builder(backend_type, builder)
Registers new
BackendRunner
builder for backend withbackend_type
.- Parameters
backend_type (BackendType) – The type of the backend for which new builder will be registered.
builder (Callable) – Builder that wraps backend, model and input data into
BackendRunner
object.
- Return type
- class BackendRunner
Interface that is used by
BackendBenchmark
. Only one method needs to be implemented:run()
, which wraps inference call.
- class EnotBackendRunner(backend_instance, onnx_input)
Common implementation of
BackendRunner
interface for ENOT Lite backends.Do not override
run()
method, implementbackend_run()
instead.- __init__(backend_instance, onnx_input)
- Parameters
backend_instance (backend.Backend) – ENOT Lite backend with embedded model.
onnx_input (Dict[str, Any]) – Input for model inference (model is already wrapped in
backend_instance
).
- backend_run(backend, onnx_input)
Common implementation of how to infer ONNX model.
- Parameters
backend (backend.Backend) – ENOT Lite backend with embedded model.
onnx_input (Dict[str, Any]) – Model input.
- Returns
Prediction.
- Return type
Any
- class TorchCpuRunner(torch_model, torch_input)
Common implementation of
BackendRunner
interface for PyTorch on CPU.Do not override
run()
method, implementtorch_run()
instead.- Parameters
torch_input (
Any
) –
- __init__(torch_model, torch_input)
- Parameters
torch_model (torch.nn.Module) – PyTorch model.
torch_input (torch.Tensor or something suitable for
torch_model
) – Input fortorch_model
.
- torch_run(model, inputs)
Common implementation of how to infer PyTorch model.
- Parameters
model (torch.nn.Module) – PyTorch model.
inputs (Any) – Input for
model
.
- Returns
Prediction.
- Return type
Any
- class TorchCudaRunner(torch_model, torch_input, no_data_transfer)
Common implementation of
BackendRunner
interface for PyTorch on CUDA.Do not override
run()
method, implementtorch_run()
,torch_input_to_cuda()
,torch_output_to_cpu()
to extend this class.Why are we explicitly transfering data from CPU to CUDA and from CUDA to CPU?
In real-world application, data (images, sentences, etc) is on the CPU device (in RAM, hard drive or CPU-caches), in the moment when you started inference, input data should be transferred through north and south bridges on your motherboard to CUDA device (GPU) to perform computations more effectively and decrease model inference latency. When prediction is computed, the output data should be transferred back from CUDA to CPU for further processing. Sometimes the data transfer time can be comparable to the inference time, so it must be taken into account in the benchmarking.
The data transfer described above is done automatically for ENOT Lite backends. For PyTorch on CUDA we explicitly measure the data transfer time from CPU to CUDA and back from CUDA to CPU to obtain consistent results.
- Parameters
no_data_transfer (
bool
) –
- __init__(torch_model, torch_input, no_data_transfer)
- Parameters
torch_model (torch.nn.Module) – PyTorch model.
torch_input (torch.Tensor or something suitable for
torch_model
) – Input fortorch_model
.no_data_transfer (bool) – Whether to do data transfer for every run (from CPU to GPU and back from GPU to CPU) or not.
- torch_input_to_cuda(torch_input)
Common implementation of how to transfer PyTorch model input from CPU to CUDA.
- Parameters
torch_input (torch.Tensor) – Tensor on CPU device.
- Returns
Tensor on CUDA device.
- Return type
- torch_output_to_cpu(torch_output)
Common implementation of how to transfer PyTorch output (prediction) from CUDA to CPU.
- Parameters
torch_output (Union[torch.Tensor, Iterable]) – PyTorch output on CUDA device.
- Returns
Irrespective of the results, this function only transfers them to CPU.
- Return type
None
- Raises
RuntimeError: – If some part of
torch_output
is nottorch.Tensor
or Iterable In this case user should implement transfering of this object.
- torch_run(model, inputs)
Common implementation of how to infer PyTorch model.
- Parameters
model (torch.nn.Module) – PyTorch model.
inputs (Any) – Input for
model
.
- Returns
Prediction.
- Return type
Any