Benchmark module¶
Benchmark interface¶
- class Benchmark(batch_size, onnx_model=None, onnx_input=None, enot_backend_runner=<class 'enot_lite.benchmark.backend_runner.EnotBackendRunner'>, torch_model=None, torch_input=None, torch_cpu_runner=<class 'enot_lite.benchmark.backend_runner.TorchCpuRunner'>, torch_cuda_runner=<class 'enot_lite.benchmark.backend_runner.TorchCudaRunner'>, backends=Backends.ALL, warmup=50, repeat=50, number=50, inter_op_num_threads=None, intra_op_num_threads=None, openvino_num_threads=None)¶
Open extendable tool for benchmarking inference.
It supports
ENOT-LiteandPyTorchbackends out of the box, but can be extended for your own backends.It measures inference time of
ONNXmodels onENOT-Litebackends,PyTorchnative inference time and transforms it toFPS(frame-per-second, the bigger the better) metric.All benchmark source code is available in
benchmarkmodule.- Parameters
- __init__(batch_size, onnx_model=None, onnx_input=None, enot_backend_runner=<class 'enot_lite.benchmark.backend_runner.EnotBackendRunner'>, torch_model=None, torch_input=None, torch_cpu_runner=<class 'enot_lite.benchmark.backend_runner.TorchCpuRunner'>, torch_cuda_runner=<class 'enot_lite.benchmark.backend_runner.TorchCudaRunner'>, backends=Backends.ALL, warmup=50, repeat=50, number=50, inter_op_num_threads=None, intra_op_num_threads=None, openvino_num_threads=None)¶
- Parameters
batch_size (Optional[int]) – Batch size value. This value should equals to
onnx_inputandtorch_inputbatch sizes. Pass None if the model input does not contain batch size (for example natural language processing networks), in this casebatch_sizewill be 1.onnx_model (Optional[str]) – Path to
ONNXmodel for benchmarking onENOT-Litebackends. Omit this parameter to skip benchmarking ofENOT-Litebackends.onnx_input (Optional[Dict[str, Any]]) – Input for
ONNXmodel. Keys correspond to input names, values to input values.enot_backend_runner (Optional[Type[BackendRunner]]) –
BackendRunnersubclass that will be used forENOT-Litebackends. Default isEnotBackendRunner.torch_model (Optional[torch.nn.Module]) –
PyTorchmodel for native benchmarking. Omit this parameter to skip benchmarking ofPyTorchbackends.torch_input (Optional[Any]) – Input for
PyTorchmodel.torch_cpu_runner (Optional[Type[BackendRunner]]) –
BackendRunnersubclass that will be used forPyTorchbackends. Default isTorchCpuRunner.torch_cuda_runner (Optional[Type[BackendRunner]]) –
BackendRunnersubclass that will be used forPyTorchbackends. Default isTorchCudaRunner.backends (list of backend names or types, or
Backends) – Selects backends for benchmarking:Backends.CPU- all CPU backends,Backends.CUDA- all CUDA backends,Backends.ALL- all CPU and CUDA backends. Also you can specify backends by class type or class names, for example:[backend.OrtCpuBackend, backend.OrtCudaBackend]. Default isBackends.ALL.warmup (int) – Number of warmup iterations (see
BackendBenchmark). Default is 50.repeat (int) – Number of repeat iterations (see
BackendBenchmark). Default is 50.number (int) – Number of iterations in each
repeatiteration (seeBackendBenchmark). Default is 50.inter_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution of the graph (across nodes). Default is None (will be set by backend automatically). Affects on
CPUbackends only.intra_op_num_threads (Optional[int]) – Number of threads used to parallelize the execution within nodes. Default is None (will be set by backend automatically). Affects on
CPUbackends only.openvino_num_threads (Optional[int]) – Lenght of async task queue which is used in OpenVINO backend. Increase of this parameter can both improve performance and degrade it. Change it last to fine tune performance. Default is None (will be set by backend). Affects on
CPUbackends only.
Examples
ResNet-50benchmarking.>>> import numpy as np >>> import torch >>> from torchvision.models import resnet50 >>> from enot_lite.benchmark import Benchmark
Create
PyTorchResNet-50model.>>> resnet50 = resnet50() >>> resnet50.cpu() >>> resnet50.eval() >>> torch_input=torch.ones((8, 3, 224, 224)).cpu()
Export it to
ONNX.>>> torch.onnx.export( >>> model=resnet50, >>> args=torch_input, >>> f='resnet50.onnx', >>> opset_version=11, >>> input_names=['input'], >>> )
Configure
Benchmark.>>> benchmark = Benchmark( >>> batch_size=8, >>> onnx_model='resnet50.onnx', >>> onnx_input={'input': np.ones((8, 3, 224, 224), dtype=np.float32)}, >>> torch_model=resnet50, >>> torch_input=torch_input, >>> )
Run
Benchmarkand print results.>>> benchmark.run() >>> benchmark.print_results()
- print_results()¶
Prints table with benchmarking results and environment information.
- Return type
- property results: Dict¶
Benchmarking results.
- Returns
Keys are backend names, values are tuples with the following structure:
FPS, normalized time in ms per sample, mean time in ms per batch, standard deviation in ms. Value can be None if benchmarking failed.- Return type
Dict
Building blocks¶
- class BackendBenchmark(warmup, repeat, number)¶
Benchmarks inference time of backends.
This class is a building block of
Benchmark, it measures inference time for one backend.All components: inference backend, model and input data should be wrapped into an object that implements
BackendRunnerinterface to work withBackendBenchmark. We have already wrapped our andPyTorchbackends, but you can extend benchmark by adding and registering builder inBackendRunnerFactoryfor your own backend.- __init__(warmup, repeat, number)¶
To understand ctor parameters see
benchmark()method.- Parameters
warmup (int) – Number of warmup steps before benchmarking.
repeat (int) – Number of repeat steps (see
timeit.Timer).number (int) – Number of inference calls in each
repeatstep (seetimeit.Timer).
- benchmark(backend_runner)¶
Benchmarks backend using
BackendRunnerinterface.- There are two main steps:
warmup: calls
run()method ofBackendRunnerwarmuptimesbechmark: calls
run()methodnumber × repeattimes and stores execution time
The results of benchmarking are measured mean time per one batch (in ms) and standard deviation per one batch.
All measurements in
benchmarkstep are done with the help oftimeit.Timerobject.- Parameters
backend_runner (BackendRunner) – Backend, model and input data wrapped in
BackendRunnerinterface.- Returns
mean time per one batch (in ms), standard deviation per one batch (in ms).
- Return type
- class BackendRunnerFactory¶
Produces
BackendRunnerobjects.To extend
Benchmarkfor your own backend, create builder and register it with the help ofregister_builder(). Builder is a callable object that wraps your backend, model and input data intoBackendRunnerobject. You can see how we wrapped our andPyTorchbackends inenot_lite.benchmark.backend_runner_buildermodule.Use
FACTORYobject exported by this module to get instance ofBackendRunnerFactory.- __init__()¶
- create(backend_name, **kwargs)¶
Creates new
BackendRunnerobject by using registred builder forbackend_name.- Parameters
backend_name (str) – The name of the backend which factory should wrap and produce.
**kwargs – Arbitrary keyword arguments that will be passed to particular builder. This arguments should contain all information for successful object construction.
Benchmarkforms and passes these arguments toBackendRunnerFactory.
- register_builder(backend_name, builder)¶
Registers new
BackendRunnerbuilder for backend withbackend_namename.- Parameters
backend_name (str) – The name of the backend for which new builder will be registered.
builder (Callable) – Builder that wraps backend, model and input data into
BackendRunnerobject.
- class BackendRunner¶
Interface that is used by
BackendBenchmark. Only one method needs to be implemented:run(), which wraps inference call.
- class EnotBackendRunner(backend_instance, onnx_input)¶
Common implementation of
BackendRunnerinterface forENOT-Litebackends.Do not override
run()method, implementbackend_run()instead.- __init__(backend_instance, onnx_input)¶
- Parameters
backend_instance (backend.Backend) –
ENOT-Litebackend with embedded model.onnx_input (Dict[str, Any]) – Input for model inference (model is already wrapped in
backend_instance).
- backend_run(backend, onnx_input)¶
Common implementation of how to infer
ONNXmodel.- Parameters
backend (backend.Backend) –
ENOT-Litebackend with embedded model.onnx_input (Dict[str, Any]) – Model input.
- Returns
Prediction.
- Return type
Any
- class TorchCpuRunner(torch_model, torch_input)¶
Common implementation of
BackendRunnerinterface forPyTorchonCPU.Do not override
run()method, implementtorch_run()instead.- Parameters
torch_model (
Module) –torch_input (
Any) –
- __init__(torch_model, torch_input)¶
- Parameters
torch_model (torch.nn.Module) –
PyTorchmodel.torch_input (torch.Tensor or something suitable for
torch_model) – Input fortorch_model.
- torch_run(model, inputs)¶
Common implementation of how to infer
PyTorchmodel.- Parameters
model (torch.nn.Module) –
PyTorchmodel.inputs (Any) – Input for
model.
- Returns
Prediction.
- Return type
Any
- class TorchCudaRunner(torch_model, torch_input)¶
Common implementation of
BackendRunnerinterface forPyTorchonCUDA.Do not override
run()method, implementtorch_run(),torch_input_to_cuda(),torch_output_to_cpu()to extend this class.Why are we explicitly transfering data from CPU to CUDA and from CUDA to CPU?
In real-world application, data (images, sentences, etc) is on the CPU device (in RAM, hard drive or CPU-caches), in the moment when you started inference, input data should be transferred through north and south bridges on your motherboard to CUDA device (GPU) to perform computations more effectively and decrease model inference latency. When prediction is computed, the output data should be transferred back from CUDA to CPU for further processing. Sometimes the data transfer time can be comparable to the inference time, so it must be taken into account in the benchmarking.
The data transfer described above is done automatically for
ENOT-Litebackends. ForPyTorchon CUDA we explicitly measure the data transfer time fromCPUtoCUDAand back fromCUDAtoCPUto obtain consistent results.- __init__(torch_model, torch_input)¶
- Parameters
torch_model (torch.nn.Module) –
PyTorchmodel.torch_input (torch.Tensor or something suitable for
torch_model) – Input fortorch_model.
- torch_input_to_cuda(torch_input)¶
Common implementation of how to transfer
PyTorchmodel input fromCPUtoCUDA.- Parameters
torch_input (torch.Tensor) – Tensor on
CPUdevice.- Returns
Tensor on
CUDAdevice.- Return type
- torch_output_to_cpu(torch_output)¶
Common implementation of how to transfer
PyTorchoutput (prediction) fromCUDAtoCPU.- Parameters
torch_output (Union[torch.Tensor, Iterable]) –
PyTorchoutput onCUDAdevice.- Returns
Irrespective of the results, this function only transfers them to
CPU.- Return type
- Raises
RuntimeError: – If some part of
torch_outputis nottorch.TensororIterable. In this case user should implement transfering of this object.
- torch_run(model, inputs)¶
Common implementation of how to infer
PyTorchmodel.- Parameters
model (torch.nn.Module) –
PyTorchmodel.inputs (Any) – Input for
model.
- Returns
Prediction.
- Return type
Any