######### Benchmark ######### Here we have presented benchmark results for some selected neural networks. Our benchmark is an open simple benchmark that measures inference time of ``ONNX`` models on ``ENOT-Lite`` backends versus ``PyTorch`` native inference time and transforms it to ``FPS`` (frame-per-second, the bigger the better) metric. All values in tables below are given in ``FPS``. For natural language processing neural networks ``FPS = QPS``. Benchmarks: - :ref:`Benchmark_ResNet-50` - :ref:`Benchmark_MobileNetV2` - :ref:`Benchmark_MobileNetV2-SSD` - :ref:`Benchmark_YOLOv5s` - :ref:`Benchmark_ViT_patch16_224` - :ref:`Benchmark_BERT` .. _Benchmark_ResNet-50: ResNet-50 ========= :: input: (batch_size, 3, 224, 224) .. list-table:: ``batch_size = 1`` :widths: 25 25 25 25 25 25 :header-rows: 1 * - Device / Backend - TensorRT Float - TensorRT Float16 - TensorRT Int8 - ONNX CUDA - Torch CUDA * - RTX 3060 Ti - 473.4 - 1479.7 - 1849.6 - 340.6 - 209.1 * - RTX 2080 Ti - 514.1 - 1142.7 - 1463.7 - 317.4 - 215.7 * - GTX 1080 Ti - 439.5 - 438.9 - 882.7 - 282.5 - 231.2 .. list-table:: ``batch_size = 16`` :widths: 25 25 25 25 25 25 :header-rows: 1 * - Device / Backend - TensorRT Float - TensorRT Float16 - TensorRT Int8 - ONNX CUDA - Torch CUDA * - RTX 3060 Ti - 1179.1 - 3454.8 - 6381.4 - 973.8 - 624.3 * - RTX 2080 Ti - 1198.6 - 3220.8 - 4713.9 - 842.3 - 770.4 * - GTX 1080 Ti - 994.4 - 970.4 - 2620.5 - 935.1 - 595.6 .. _Benchmark_MobileNetV2: MobileNetV2 =========== :: input: (batch_size, 3, 224, 224) .. list-table:: ``batch_size = 1`` :widths: 25 25 25 25 25 25 :header-rows: 1 * - Device / Backend - TensorRT Float - TensorRT Float16 - TensorRT Int8 - ONNX CUDA - Torch CUDA * - RTX 3060 Ti - 1188.4 - 2524.5 - 2934.6 - 779.1 - 254.0 * - RTX 2080 Ti - 1134.6 - 1952.4 - 2287.7 - 658.3 - 203.0 * - GTX 1080 Ti - 1122.2 - 1123.9 - 1647.9 - 649.3 - 343.3 .. list-table:: ``batch_size = 16`` :widths: 25 25 25 25 25 25 :header-rows: 1 * - Device / Backend - TensorRT Float - TensorRT Float16 - TensorRT Int8 - ONNX CUDA - Torch CUDA * - RTX 3060 Ti - 3567.6 - 7117.5 - 11275.7 - 1855.5 - 1746.2 * - RTX 2080 Ti - 3171.4 - 5239.8 - 6434.3 - 2038.3 - 1855.3 * - GTX 1080 Ti - 3120.8 - 3113.4 - 6305.3 - 1411.6 - 1344.6 .. _Benchmark_MobileNetV2-SSD: MobileNetV2-SSD =============== :: input: (batch_size, 3, 224, 224) .. list-table:: ``batch_size = 1`` :widths: 25 25 25 25 25 25 :header-rows: 1 * - Device / Backend - TensorRT Float - TensorRT Float16 - TensorRT Int8 - ONNX CUDA - Torch CUDA * - RTX 3060 Ti - 411.1 - 597.4 - 622.7 - 105.7 - 119.3 * - RTX 2080 Ti - 369.6 - 451.4 - 445.8 - 107.3 - 79.1 * - GTX 1080 Ti - 421.2 - 419.3 - 483.5 - 159.6 - 126.2 .. list-table:: ``batch_size = 16`` :widths: 25 25 25 25 25 25 :header-rows: 1 * - Device / Backend - TensorRT Float - TensorRT Float16 - TensorRT Int8 - ONNX CUDA - Torch CUDA * - RTX 3060 Ti - 1520.1 - 2045.1 - 2419.5 - 211.9 - 230.6 * - RTX 2080 Ti - 1111.8 - 1349.5 - 1411.0 - 238.0 - 222.9 * - GTX 1080 Ti - 1485.7 - 1482.8 - 2128.6 - 275.3 - 256.7 .. _Benchmark_YOLOv5s: YOLOv5s ======= :: input: (batch_size, 3, 640, 640) .. list-table:: ``batch_size = 1`` :widths: 25 25 25 25 25 25 :header-rows: 1 * - Device / Backend - TensorRT Float - TensorRT Float16 - TensorRT Int8 - ONNX CUDA - Torch CUDA * - RTX 3060 Ti - 236.5 - 514.9 - 601.3 - 158.8 - 148.5 * - RTX 2080 Ti - 255.1 - 412.5 - 441.4 - 172.0 - 84.5 * - GTX 1080 Ti - 201.8 - 201.4 - 281.9 - 127.3 - 111.6 .. list-table:: ``batch_size = 16`` :widths: 25 25 25 25 25 25 :header-rows: 1 * - Device / Backend - TensorRT Float - TensorRT Float16 - TensorRT Int8 - ONNX CUDA - Torch CUDA * - RTX 3060 Ti - 316.4 - 637.9 - 777.9 - 196.0 - 120.7 * - RTX 2080 Ti - 331.6 - 573.8 - 649.4 - 243.6 - 126.5 * - GTX 1080 Ti - 268.9 - 268.1 - 440.4 - 170.0 - 138.4 .. _Benchmark_ViT_patch16_224: ViT === Vision Transformer (ViT), ``patch = 16``, ``resolution = 224``. :: input: (batch_size, 3, 224, 224) .. list-table:: ``batch_size = 1`` :widths: 25 25 25 25 25 25 :header-rows: 1 * - Device / Backend - TensorRT Float - TensorRT Float16 - TensorRT Int8 - ONNX CUDA - Torch CUDA * - RTX 3060 Ti - 175.2 - 318.1 - 310.6 - 172.9 - 175.7 * - RTX 2080 Ti - 159.6 - 373.2 - 374.9 - 175.3 - 132.6 * - GTX 1080 Ti - 123.1 - 122.5 - 122.3 - 135.5 - 108.3 .. list-table:: ``batch_size = 16`` :widths: 25 25 25 25 25 25 :header-rows: 1 * - Device / Backend - TensorRT Float - TensorRT Float16 - TensorRT Int8 - ONNX CUDA - Torch CUDA * - RTX 3060 Ti - 283.6 - 592.8 - 595.3 - 291.3 - 279.7 * - RTX 2080 Ti - 127.4 - 435.8 - 410.3 - 153.7 - 123.7 * - GTX 1080 Ti - 169.6 - 168.4 - 166.4 - 182.7 - 166.0 .. _Benchmark_BERT: BERT =========== :: input length: 1941 characters .. list-table:: :widths: 25 25 25 25 25 :header-rows: 1 * - Device / Backend - TensorRT Float - TensorRT Float16 - ONNX CUDA - Torch CUDA * - RTX 3060 Ti - 119.8 - 220.3 - 99.2 - 91.8 * - RTX 2080 Ti - 100.0 - 257.0 - 94.7 - 73.4 * - GTX 1080 Ti - 43.9 - 39.4 - 21.8 - 25.1